A Generic Semi-Supervised and Active Learning Framework for Biomedical Text Classification

Flores, Christopher; VERSCHAE-TANNENBAUM, RODRIGO ANDRES

Abstract

Biomedical text classification requires having training examples labeled by clinical specialists, a process that can be costly. To address this problem, active learning incrementally selects a subset of the most informative unlabeled examples, samples that are then labeled and used to train a given classifier, seeking to reduce the number of labeled samples. Nonetheless, the other unlabeled examples are not used by active learning, but incorporating semi-supervised techniques that use unlabeled samples could improve the representativeness of the data and the discriminatory power of the classifiers. This work proposes a generic semi-supervised learning framework for improving active learning and reducing the number of labeled training examples in biomedical text classification. The proposed framework combines manually annotated training examples selected by active learning and pseudo-labels obtained from a trained classifier. To evaluate the proposed framework, three biomedical datasets with textual information on obesity and smoking habit were used across different classification algorithms. The classification results show that the proposed framework can reduce the number of training examples that are manually labeled by clinical specialists by a 10% without affecting the performance of the classifiers. This performance is attributable to the ability of the classifiers to correctly select and label the training examples. Clinical relevance-We demonstrate the effectiveness of the proposed semi-supervised learning framework to reduce manual labeling efforts of biomedical texts by clinical specialists for the training of classifiers.

Más información

Título según SCOPUS: ID SCOPUS_ID:85138127909 Not found in local SCOPUS DB
Título de la Revista: 2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)
Volumen: 2022-July
Fecha de publicación: 2022
Página de inicio: 4445
Página final: 4448
DOI:

10.1109/EMBC48229.2022.9871846

Notas: SCOPUS