A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements

Diego Palma

Keywords: Text difficulty assessment Natural language processing Artificial intelligence Machine learning Educational systems


In this paper we propose a data driven methodology to assess text complexity of Spanish school texts. We model the problem as a classification task, that can be solved in a data-driven fashion using machine learning techniques. We show empirically that the discriminative power of the classifier depends on school grade level. Our proposal includes multiple predictors that capture different dimensions of text complexity such as coherence and cohesion. We provide an importance analysis of predictors across several complexity levels. Finally, we assess the model performance using accuracy and correlation measurements. The proposed model achieves accuracies of 0.7.

Más información

Editorial: Springer, Cham
Fecha de publicación: 2020
Año de Inicio/Término: 2020 August
Página de inicio: 509
Página final: 515
Idioma: Inglés
URL: https://link.springer.com/chapter/10.1007%2F978-3-030-25629-6_79