Clasificación de Textos Multi-etiquetados con Modelo Bernoulli Multi-variado y Representación Dependiente de la Etiqueta

Alfaro, R; Allende, Hector

Keywords: Multi-etiqueta, clasificación de textos, representación de textos, transformación del problema, ponderación de términos

Abstract

The allocation of natural language texts to one or more predefined categories or classes based on their content is an important component and a recent need in many information organization and management tasks. Automatic text classification is the task of categorizing documents to a predefined set of classes by a computational method or model. Text representation for classification purposes has been traditionally approached using a vector space model due to its simplicity and good performance. On the other hand, multi-label automatic text classification has been typically addressed either by transforming the problem under study to apply binary techniques or by adapting binary algorithms to work with multiple labels. In this article, the objective is to evaluate a term-weighting factor in the Boolean model for text representation in multi-label classification, using a mix of two approaches: problem transformation and model adaptation. This term-weighting factor and the combination of approaches in the automatic text classification was tested with four different sets of textual data used in the specialized literature and compared with alternative techniques by means of three measures of evaluation. The results present improvements of more than 10% in the performance of the classifiers, attributed to our proposal, in all the cases analyzed.

Más información

Título de la Revista: Revista signos - Estudios de lingüística
Volumen: 53(104)
Editorial: Instituto de Literatura y Ciencias del Lenguaje, Pontificia Universidad Católica de Valparaíso
Fecha de publicación: 2020
Página de inicio: 549
Página final: 567
Idioma: Español