TU Berlin

Database Systems and Information Management GroupPublications

Logo FG DIMA-new  65px

Page Content

to Navigation


Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models
Citation key KieferHBM17
Author Martin Kiefer, Max Heimel, Sebastian BreƟ, Volker Markl
Year 2017
DOI 10.14778/3151106.3151112
Journal Proceedings of the VLDB Endowment, Volume 10, No. 13. 2017 (to be presented in VLDB 2018)
Volume 2016-2017
Abstract Accurately predicting the cardinality of intermediate plan operations is an essential part of any modern relational query optimizer. The accuracy of said estimates has a strong and direct impact on the quality of the generated plans, and incorrect estimates can have a negative impact on query performance. One of the biggest challenges in this field is to predict the result size of join operations. Kernel Density Estimation (KDE) is a statistical method to estimate multivariate probability distributions from a data sample. Previously, we introduced a modern, self-tuning selectivity estimator for range scans based on KDE that outperforms state-of-the-art multidimensional histograms and is efficient to evaluate on graphics cards. In this paper, we extend these bandwidth-optimized KDE models to estimate the result size of single and multiple joins. In particular, we propose two approaches: (1) Building a KDE model from a sample drawn from the join result. (2) Efficiently combining the information from base table KDE models. We evaluated our KDE-based join estimators on a variety of synthetic and real-world datasets, demonstrating that they are superior to state-of-the art join estimators based on sketching or sampling.
Link to original publication Download Bibtex entry


Quick Access

Schnellnavigation zur Seite über Nummerneingabe