Optimal Joins Using Compressed Quadtrees

Arroyuelo, Diego; Navarro, Gonzalo; Reutter, Juan L.; Rojas-Ledesma, Javiel

Abstract

--- - "Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality one either needs to build completely new indexes or must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that is typically one or two orders of magnitude more than what is required to store the raw data." - We show that worst-case optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of any significant extra storage. Our representation is a compressed quadtree for the static indexes and a quadtree built on the fly that shares subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, which simulates navigation of the quadtree of the output, and show that the running time of this algorithm is worst-case optimal in data complexity. - We implement our index and compare it experimentally with state-of-the-art alternatives. Our experiments show that our index uses even less space than what is needed to store the data in raw form (and replaces it) and one or two orders of magnitude less space than the other indexes. At the same time, our query algorithm is competitive in time, even sharply outperforming other indexes in various cases. - Finally, we extend our framework to evaluate more expressive queries from relational algebra, including not only joins and intersections but also unions and negations. To obtain optimality on those more complex formulas, we introduce a lazy version of qdags we dub lqdags, which allow us navigate over the quadtree representing the output of a formula while only evaluating what is needed from its components. We show that the running time of our query algorithms on this extended set of operations is worst-case optimal under some constraints. Moving to full relational algebra, we also show that lqdags can handle selections and projections. While worst-case optimality is no longer guaranteed, we introduce a partial materialization scheme that extends results from Deep and Koutris regarding compressed representation of query results.

Más información

Título según WOS: Optimal Joins Using Compressed Quadtrees
Título de la Revista: ACM TRANSACTIONS ON DATABASE SYSTEMS
Volumen: 47
Número: 2
Editorial: ASSOC COMPUTING MACHINERY
Fecha de publicación: 2022
DOI:

10.1145/3514231

Notas: ISI