Hypergeometric language model and Zipf-like scoring function for web document similarity retrieval

Bravo-Marquez, F; L'Huillier G.; Ríos S.A.; Velásquez J.D.

Keywords: model, search, generation, information, its, law, world, evaluation, language, query, probabilistic, linguistics, retrieval, web, wide, Functions, Computational, meta, engines, scoring, Towers, Document, Customizable, Zipf

Abstract

The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models are used as a random query generator for the given document. Queries are submitted to a customizable set of Web search engines. Once all results obtained are gathered, its evaluation is determined by a proposed scoring function based on the Zipf law. Results obtained showed that the proposed methodology for query generation and scoring procedure solves the problem with acceptable levels of precision. © 2010 Springer-Verlag.

Más información

Título de la Revista: LEARNING AND INTELLIGENT OPTIMIZATION, LION 15
Volumen: 6393
Editorial: SPRINGER INTERNATIONAL PUBLISHING AG
Fecha de publicación: 2010
Página de inicio: 303
Página final: 308
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-78449276081&partnerID=q2rCbXpz