No mining, no meaning: Relating documents across repositories with ontology-driven information extraction
Keywords: systems, information, metal, recovery, extraction, creation, language, languages, theory, semantics, nlp, analysis, query, ontology, linguistics, processing, human-in-the-loop, metadata, natural, Indexing, (of, information), Ontological, anchoring
Abstract
Far from eliminating documents as some expected, the Internet has lead to a proliferation of digital documents, without a centralized control or indexing. Thus, identifying relevant documents becomes simultaneously more important and much harder, since what users require may be dispersed across many documents and many repositories. This paper describes Ontologic Anchoring, a technique to relate documents in domain ontologies, using named entity recognition (a natural-language processing approach) and semantic annotation to relate individual documents to elements in ontologies. This approach allows document retrieval using domain-level inferences, and integration of repositories with heterogeneous media, languages and structure. Ontological anchoring is a two-way street: ontologies allow semantic indexing of documents, and simultaneously new documents enrich ontologies. The approach is illustrated with an initial deployment for heritage documents in Spanish. © 2008 ACM.
Más información
Título de la Revista: | DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING |
Editorial: | ASSOC COMPUTING MACHINERY |
Fecha de publicación: | 2008 |
Página de inicio: | 110 |
Página final: | 118 |
URL: | http://www.scopus.com/inward/record.url?eid=2-s2.0-59249084068&partnerID=q2rCbXpz |