PDI - Resultado de Búsqueda

Keywords: Academic discourse NaÃ¯ve Bayes Support Vector Machine Vectorial model

Abstract

The aim of this research is to classify, using and comparing two automatic classification methods, the academic texts included in the PUCV-2006 Corpus belonging to the Fondecyt 1060440 research project. The methods are based on shared lexical-semantic content words present in a corpus of academic texts used in four professional carriers at the Pontificia Universidad CatÃ³lica de ValparaÃso, Chile. The research corpus, nowadays, is constituted by 652 texts with 96.288.874 words. For our purposes, we use a sample of 216 texts (30.886.081 words) divided, as following: 26 used in Construction Engineering, 31 used in Chemistry, 64 used Social Work, and 95 used in Psychology. The classification methods compared in this research are Multinomial NaÃ¯ve Bayes and Support Vector Machine, both permits to identify a small group of shared words that permit, according statistical weights, to classify a new text into the four disciplinary areas. The results allows us to establish that Support Vector Machine classify in a efficient way academic texts, with high precision and recall values. With this method we are able to identify automatically the disciplinary domain, with a high percentage of accuracy (93,9%), of a new academic text in a query. We project to use this method as part of a more detailed multidimensional analysis of the PUCV-2006 Corpus.

Más información

Título según SCOPUS:	Academic text classification based on lexical-semantic content Clasificación de textos académicos en función de su contenido léxico-semántico
Título de la Revista:	REVISTA SIGNOS
Volumen:	40
Número:	63
Editorial:	PONTIFICIA UNIVERSIDAD CATÓLICA DE VALPARAÍSO<BR> INSTITUTO DE LITERATURA Y CIENCIAS DEL LENGUAJE
Fecha de publicación:	2007
Página de inicio:	239
Página final:	271
Idioma:	eng
Notas:	SCOPUS

Academic text classification based on lexical-semantic content Clasificación de textos académicos en función de su contenido léxico-semántico

Abstract

Más información