Hypergeometric language model and Zipf-like scoring function for web document similarity retrieval
Keywords: model, search, generation, information, its, law, world, evaluation, language, query, probabilistic, linguistics, retrieval, web, wide, Functions, Computational, meta, engines, scoring, Towers, Document, Customizable, Zipf
Abstract
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models are used as a random query generator for the given document. Queries are submitted to a customizable set of Web search engines. Once all results obtained are gathered, its evaluation is determined by a proposed scoring function based on the Zipf law. Results obtained showed that the proposed methodology for query generation and scoring procedure solves the problem with acceptable levels of precision. © 2010 Springer-Verlag.
Más información
Título de la Revista: | BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II |
Volumen: | 6393 |
Editorial: | SPRINGER INTERNATIONAL PUBLISHING AG |
Fecha de publicación: | 2010 |
Página de inicio: | 303 |
Página final: | 308 |
URL: | http://www.scopus.com/inward/record.url?eid=2-s2.0-78449276081&partnerID=q2rCbXpz |