Clinically correct report generation from chest x-rays using templates
Abstract
We address the task of automatically generating a medical report from chest X-rays. Many authors have proposed deep learning models to solve this task, but they focus mainly on improving NLP metrics, such as BLEU and CIDEr, which are not suitable to measure clinical correctness in clinical reports. In this work, we propose CNN-TRG, a Template-based Report Generation model that detects a set of abnormalities and verbalizes them via fixed sentences, which is much simpler than other state-of-the-art NLG methods and achieves better results in medical correctness metrics. We benchmark our model in the IU X-ray and MIMIC-CXR datasets against naive baselines as well as deep learning-based models, by employing the Chexpert labeler and MIRQI as clinical correctness evaluations, and NLP metrics as secondary evaluation. We also provide further evidence indicating that traditional NLP metrics are not suitable for this task by presenting their lack of robustness in multiple cases. We show that slightly altering a template-based model can increase NLP metrics considerably while maintaining high clinical performance. Our work contributes by a simple but effective approach for chest X-ray report generation, as well as by supporting a model evaluation focused primarily on clinical correctness metrics and secondarily on NLP metrics.
Más información
Título de la Revista: | In International Workshop on Machine Learning in Medical Imaging |
Fecha de publicación: | 2021 |
Página de inicio: | 654 |
Página final: | 663 |
Idioma: | english |
DOI: |
https://doi.org/10.1007/978-3-030-87589-3_67 |