ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Bahmani, Zeinab; Bertossi, Leopoldo; Vasiloglou, Nikolaos; Beierle, C; Dekhtyar, A

Abstract

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL - an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.

Más información

Título según WOS: ID WOS:000366122500027 Not found in local WOS DB
Título de la Revista: BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II
Volumen: 9310
Editorial: SPRINGER INTERNATIONAL PUBLISHING AG
Fecha de publicación: 2015
Página de inicio: 399
Página final: 414
DOI:

10.1007/978-3-319-23540-0_27

Notas: ISI