Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models

Figueroa, A; Atkinson J.

Keywords: model, systems, models, information, substitution, trees, agents, speech, training, world, language, strategy, data, software, part, linguistics, test, selective, dependency, web, ranking, processing, corpus, wide, natural, of, Computational, (mathematics), material, Lexical, Question, answering, n-Gram, sparseness, F-score

Abstract

" A crucial step in the answering process of definition questions, such as ""Who is Gordon Brown?"", is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus. © 2010 Springer-Verlag. "

Más información

Título de la Revista: ENTERPRISE, BUSINESS-PROCESS AND INFORMATION SYSTEMS MODELING
Volumen: 45
Editorial: SPRINGER-VERLAG BERLIN
Fecha de publicación: 2010
Página de inicio: 297
Página final: 310
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-77952781169&partnerID=q2rCbXpz