Identifying and Extracting Patient Smoking Status information from Clinical Narrative Texts in Spanish

Figueroa R.L.; Soto, D.A.; Pino, E. J.

Abstract

In this work we present a system to identify and extract patient's smoking status from clinical narrative text in Spanish. The clinical narrative text was processed using natural language processing techniques, and annotated by four people with a biomedical background. The dataset used for classification had 2,465 documents, each one annotated with one of the four smoking status categories. We used two feature representations: single word token and bigrams. The classification problem was divided in two levels. First recognizing between smoker (S) and non-smoker (NS); second recognizing between current smoker (CS) and past smoker (PS). For each feature representation and classification level, we used two classifiers: Support Vector Machines (SVM) and Bayesian Networks (BN). We split our dataset as follows: a training set containing 66% of the available documents that was used to build classifiers and a test set containing the remaining 34% of the documents that was used to test and evaluate the model. Our results show that SVM together with the bigram representation performed better in both classification levels. For S vs NS classification level performance measures were: ACC=85%, Precision=85%, and Recall=90%. For CS vs PS classification level performance measures were: ACC=87%, Precision=91%, and Recall=94%.

Más información

Título según WOS: Identifying and Extracting Patient Smoking Status information from Clinical Narrative Texts in Spanish
Título según SCOPUS: Identifying and extracting patient smoking status information from clinical narrative texts in Spanish
Título de la Revista: 2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)
Editorial: IEEE
Fecha de publicación: 2014
Página de inicio: 2710
Página final: 2713
Idioma: English
DOI:

10.1109/EMBC.2014.6944182

Notas: ISI, SCOPUS