Text Clustering with Named Entities: A Model, Experimentation and Realization

Tru H. Cao, Thao M. Tang, Cuong K. Chau

Keywords: Fuzzy Cluster, Cluster Quality, Name Entity, Hard Cluster, Text Cluster

Abstract

Named entities often occur in web pages, in particular news articles, and are important to what the web pages are about. They have ontological features, namely, their aliases, types, and identifiers, which are hidden from their textual appearance. In this chapter, for text searching and clustering, we propose an extended Vector Space Model with multiple vectors defined over spaces of entity names, types, name-type pairs, identifiers, and keywords. Both hard and fuzzy text clustering experiments of the proposed model on selected data subsets of Reuters-21578 are conducted and evaluated. The results prove that a weighted combination of named entities and keywords are significant to clustering quality. Implementation and demonstration of text clustering with named entities in a semantic search engine are also presented.

Más información

Editorial: Springer Berlin Heidelberg
Fecha de publicación: 2012
Página de inicio: 267
Página final: 287
Idioma: English
URL: https://doi.org/10.1007/978-3-642-23166-7_10