Similarity joins and clustering for SPARQL

Bustos, Benjamin

Abstract

The SPARQL standard provides operators to retrieve exact matches on data, such as graph patterns, filters and grouping. This work proposes and evaluates two new algebraic operators for SPARQL 1.1 that return similarity-based results instead of exact results. First, a similarity join operator is presented, which brings together similar mappings from two sets of solution mappings. Second, a clustering solution modifier is introduced, which instead of grouping solution mappings according to exact values, brings them together by using similarity criteria. For both cases, a variety of algorithms are proposed and analysed, and use-case queries that showcase the relevance and usefulness of the novel operators are presented. For similarity joins, experimental results are provided by comparing different physical operators over a set of real world queries, as well as comparing our implementation to the closest work found in the literature, DBSimJoin, a PostgreSQL extension that supports similarity joins. For clustering, synthetic queries are designed in order to measure the performance of the different algorithms implemented.

Más información

Título según WOS: Similarity joins and clustering for SPARQL
Título de la Revista: SEMANTIC WEB
Volumen: 15
Número: 5
Editorial: IOS Press
Fecha de publicación: 2024
Página de inicio: 1701
Página final: 1732
DOI:

10.3233/SW-243540

Notas: ISI