No mining, no meaning: Relating documents across repositories with ontology-driven information extraction

Codocedo V.; Astudillo H.

Keywords: systems, information, metal, recovery, extraction, creation, language, languages, theory, semantics, nlp, analysis, query, ontology, linguistics, processing, human-in-the-loop, metadata, natural, Indexing, (of, information), Ontological, anchoring

Abstract

Far from eliminating documents as some expected, the Internet has lead to a proliferation of digital documents, without a centralized control or indexing. Thus, identifying relevant documents becomes simultaneously more important and much harder, since what users require may be dispersed across many documents and many repositories. This paper describes Ontologic Anchoring, a technique to relate documents in domain ontologies, using named entity recognition (a natural-language processing approach) and semantic annotation to relate individual documents to elements in ontologies. This approach allows document retrieval using domain-level inferences, and integration of repositories with heterogeneous media, languages and structure. Ontological anchoring is a two-way street: ontologies allow semantic indexing of documents, and simultaneously new documents enrich ontologies. The approach is illustrated with an initial deployment for heritage documents in Spanish. © 2008 ACM.

Más información

Título de la Revista: DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING
Editorial: ASSOC COMPUTING MACHINERY
Fecha de publicación: 2008
Página de inicio: 110
Página final: 118
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-59249084068&partnerID=q2rCbXpz