A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements
Keywords: Text difficulty assessment Natural language processing Artificial intelligence Machine learning Educational systems
Abstract
In this paper we propose a data driven methodology to assess text complexity of Spanish school texts. We model the problem as a classification task, that can be solved in a data-driven fashion using machine learning techniques. We show empirically that the discriminative power of the classifier depends on school grade level. Our proposal includes multiple predictors that capture different dimensions of text complexity such as coherence and cohesion. We provide an importance analysis of predictors across several complexity levels. Finally, we assess the model performance using accuracy and correlation measurements. The proposed model achieves accuracies of 0.7.
Más información
Editorial: | Springer, Cham |
Fecha de publicación: | 2020 |
Año de Inicio/Término: | 2020 August |
Página de inicio: | 509 |
Página final: | 515 |
Idioma: | Inglés |
URL: | https://link.springer.com/chapter/10.1007%2F978-3-030-25629-6_79 |