Artificial Intelligence In Medicine Automatic support system for tumor coding in pathology reports in Spanish
Keywords: cancer, natural language processing, data mining, data warehousing, Electronic Health Records
Abstract
Pathology reports provide valuable information for cancer registries to understand, plan and implement strategies to mitigate the impact of cancer. However, coding key information from unstructured reports is done by experts in a time-consuming manual process. Here we report an automatic deep learningbased system that recognizes tumor morphology and topography mentions from free-text and suggests codes from the International Classification of Diseases for Oncology (ICD-O) in Spanish. This task was done by combining an in-house annotated corpus of tumor morphology and topography mentions, with the CANTEMIST (CANcer TExt Mining Shared Task – tumor named entity recognition) corpus, an open source dataset annotated with tumor morphology mentions. To create a Named Entity Recognition (NER) model, we applied transfer learning from state-of-the-art pre-trained language models. The mentions found with this model were subsequently coded using a search engine tailored to the ICDO codes. Our NER models obtained an F 1 score of 0.86 and 0.90 for tumor morphology and topography, respectively. The overall performance of our automatic coding system achieved an accuracy at five suggestions of 0.72 and 0.65 for tumor morphology and topography, respectively. Our results demonstrate the feasibility of implementing NLP tools in the routine of a cancer center to extract and code valuable information from pathology reports.
Más información
Título de la Revista: | ARTIFICIAL INTELLIGENCE IN MEDICINE |
Editorial: | Elsevier |
Fecha de publicación: | 2022 |
Idioma: | English |