direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content


Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing
Citation key KumaigorodskiLM21
Author Alexander Kumaigorodski, Clemens Lutz, Volker Markl
Year 2021
Journal BTW
Note Reproducibility Badge
Abstract Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.
Link to original publication Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions