A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements

Palma, D.; Soto, C.; Veliz, M.; Riffo, B.; Gutiérrez, A.

Keywords: artificial intelligence, natural language processing, machine learning, educational systems, Text difficulty assessment

Abstract

In this paper we propose a data driven methodology to assess text complexity of Spanish school texts. We model the problem as a classification task, that can be solved in a data-driven fashion using machine learning techniques. We show empirically that the discriminative power of the classifier depends on school grade level. Our proposal includes multiple predictors that capture different dimensions of text complexity such as coherence and cohesion. We provide an importance analysis of predictors across several complexity levels. Finally, we assess the model performance using accuracy and correlation measurements. The proposed model achieves accuracies of 0.7.

Más información

Editorial: Springer
Fecha de publicación: 2020
Página de inicio: 509
Página final: 515
Idioma: Inglés
URL: https://link.springer.com/chapter/10.1007/978-3-030-25629-6_79
DOI:

https://doi.org/10.1007/978-3-030-25629-6_79