Page Content
Publications
Citation key | KumaigorodskiLM21 |
---|---|
Author | Alexander Kumaigorodski, Clemens Lutz, Volker Markl |
Year | 2021 |
Journal | BTW |
Note | Reproducibility Badge |
Abstract | Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices. |
Back [3]
2/A1-1.pdf
blications/parameter/en/maxhilfe/?no_cache=1&tx_sib
ibtex_pi1%5Bdownload_bibtex_uid%5D=10449770&tx_sibi
btex_pi1%5Bcontentelement%5D=tt_content%3A126920
blications/parameter/en/maxhilfe/
g_data_management_report/parameter/en/maxhilfe/
Zusatzinformationen / Extras
Quick Access:
Schnellnavigation zur Seite über Nummerneingabe
Auxiliary Functions
Copyright TU Berlin 2008