A Generic Semi-Supervised and Active Learning Framework for Biomedical Text Classification
Abstract
Biomedical text classification requires having training examples labeled by clinical specialists, a process that can be costly. To address this problem, active learning incrementally selects a subset of the most informative unlabeled examples, samples that are then labeled and used to train a given classifier, seeking to reduce the number of labeled samples. Nonetheless, the other unlabeled examples are not used by active learning, but incorporating semi-supervised techniques that use unlabeled samples could improve the representativeness of the data and the discriminatory power of the classifiers. This work proposes a generic semi-supervised learning framework for improving active learning and reducing the number of labeled training examples in biomedical text classification. The proposed framework combines manually annotated training examples selected by active learning and pseudo-labels obtained from a trained classifier. To evaluate the proposed framework, three biomedical datasets with textual information on obesity and smoking habit were used across different classification algorithms. The classification results show that the proposed framework can reduce the number of training examples that are manually labeled by clinical specialists by a 10% without affecting the performance of the classifiers. This performance is attributable to the ability of the classifiers to correctly select and label the training examples. Clinical relevance-We demonstrate the effectiveness of the proposed semi-supervised learning framework to reduce manual labeling efforts of biomedical texts by clinical specialists for the training of classifiers.
Más información
Título según SCOPUS: | ID SCOPUS_ID:85138127909 Not found in local SCOPUS DB |
Título de la Revista: | 2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) |
Volumen: | 2022-July |
Fecha de publicación: | 2022 |
Página de inicio: | 4445 |
Página final: | 4448 |
DOI: |
10.1109/EMBC48229.2022.9871846 |
Notas: | SCOPUS |