Multi-label Text Classification with Multi-variate Bernoulli Model and Label Dependent Representation

Alfaro A, Rodrigo; Allende O, Hector

Abstract

The allocation of natural language texts to one or more predefined categories or classes based on their content is an important component and a recent need in many information organization and management tasks. Automatic text classification is the task of categorizing documents to a predefined set of classes by a computational method or model. Text representation for classification purposes has been traditionally approached using a vector space model due to its simplicity and good performance. On the other hand, multi-label automatic text classification has been typically addressed either by transforming the problem under study to apply binary techniques or by adapting binary algorithms to work with multiple labels. In this article, the objective is to evaluate a term-weighting factor in the Boolean model for text representation in multi-label classification, using a mix of two approaches: problem transformation and model adaptation. This term-weighting factor and the combination of approaches in the automatic text classification was tested with four different sets of textual data used in the specialized literature and compared with alternative techniques by means of three measures of evaluation. The results present improvements of more than 10% in the performance of the classifiers, attributed to our proposal, in all the cases analyzed.

Más información

Título según WOS: Multi-label Text Classification with Multi-variate Bernoulli Model and Label Dependent Representation
Título de la Revista: Revista signos - Estudios de lingüística
Volumen: 53
Número: 104
Editorial: Instituto de Literatura y Ciencias del Lenguaje, Pontificia Universidad Católica de Valparaíso
Fecha de publicación: 2020
Página de inicio: 549
Página final: 567
DOI:

10.4067/S0718-09342020000300549

Notas: ISI