A Hybrid Method for Clinical Text Classification Based on Confident Predictions and Regular Expressions

Flores, Christopher A.; Verschae, Rodrigo

Abstract

Supervised algorithms allow clinical texts to be automatically organized based on their content. In this sense, supervised algorithm predictions must be accurate and confident to be used in clinical practice, considering the complex patterns in the texts. In this aspect, sequences of character strings known as regular expressions offer an alternative closer to natural language to represent complex patterns from texts, which can be automatically generated using sequence alignment algorithms. This paper proposes a hybrid method that combines the most confident predictions of a supervised algorithm and regular expressions for clinical text classification. Our method uses regular expressions to classify clinical texts when the predictions of a supervised algorithm are not confident in terms of predictive probability. To evaluate our method, we used three datasets with information on smoking and obesity status across supervised algorithms: Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB), and Bidirectional Encoder Representations from Transformers (BERT). The classification results indicate that the proposed method, on average, improved the performance of supervised algorithms on all performance metrics by up to 5%. Thus, we demonstrated the ability of our method to generate regular expressions representative of clinical texts as support in cases when the predictions of the supervised algorithms were not confident.

Más información

Título según SCOPUS: ID SCOPUS_ID:85189933830 Not found in local SCOPUS DB
Fecha de publicación: 2024
Página de inicio: 64
Página final: 69
DOI:

10.1109/ICAIIC60209.2024.10463358

Notas: SCOPUS