Space/time-efficient RDF stores based on circular suffix sorting

Cerdeira-Pena, Ana; Farina, Antonio

Abstract

The resource description framework (RDF) has gained popularity as a format for the standardized publication and exchange of information in the Web of Data. In this paper, we introduce RDFCSA, a compressed representation of RDF datasets that in addition supports efficient querying. RDFCSA regards the triples of the RDF store as short circular strings and applies suffix sorting on those strings, so that triple-pattern queries reduce to prefix searching on the string set. The RDF store is then represented compactly using a compressed suffix array (CSA), a proved technology in text indexing that efficiently supports prefix searches. Our experiments show that RDFCSA is competitive with state-of-the-art alternatives. It compresses the raw data to 60% of its size, close to the most compact alternatives. While most alternatives perform better in some kinds of triple-patterns than in others, RDFCSA features fast and consistent query times, a few microseconds per result in all cases. This enables efficiently supporting join queries by using either merge- or chaining-join strategies over the triple patterns coupled with some specific optimizations such as variable filling. Our experiments on binary joins show that RDFCSA is faster than the alternatives in most cases.

Más información

Título según WOS: Space/time-efficient RDF stores based on circular suffix sorting
Título de la Revista: JOURNAL OF SUPERCOMPUTING
Volumen: 79
Número: 5
Editorial: Springer
Fecha de publicación: 2023
Página de inicio: 5643
Página final: 5683
DOI:

10.1007/s11227-022-04890-w

Notas: ISI