A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements
Keywords: artificial intelligence, natural language processing, machine learning, educational systems, Text difficulty assessment
Abstract
In this paper we propose a data driven methodology to assess text complexity of Spanish school texts. We model the problem as a classification task, that can be solved in a data-driven fashion using machine learning techniques. We show empirically that the discriminative power of the classifier depends on school grade level. Our proposal includes multiple predictors that capture different dimensions of text complexity such as coherence and cohesion. We provide an importance analysis of predictors across several complexity levels. Finally, we assess the model performance using accuracy and correlation measurements. The proposed model achieves accuracies of 0.7.
Más información
Editorial: | Springer |
Fecha de publicación: | 2020 |
Página de inicio: | 509 |
Página final: | 515 |
Idioma: | Inglés |
URL: | https://link.springer.com/chapter/10.1007/978-3-030-25629-6_79 |
DOI: |
https://doi.org/10.1007/978-3-030-25629-6_79 |