Matchsimile: a flexible approximate matching tool for searching proper names

Navarro G.; BaezaYates, R

Abstract

We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.

Más información

Título según WOS: Matchsimile: a flexible approximate matching tool for searching proper names
Título según SCOPUS: Matchsimile: A flexible approximate matching tool for searching proper names
Título de la Revista: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY
Volumen: 54
Número: 1
Editorial: John Wiley & Sons Inc.
Fecha de publicación: 2003
Página de inicio: 3
Página final: 15
Idioma: English
URL: http://doi.wiley.com/10.1002/asi.10178
DOI:

10.1002/asi.10178

Notas: ISI, SCOPUS