direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments


Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing
Zitatschlüssel KumaigorodskiLM21
Autor Alexander Kumaigorodski, Clemens Lutz, Volker Markl
Jahr 2021
Journal BTW
Notiz Reproducibility Badge
Zusammenfassung Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.
Link zur Originalpublikation Download Bibtex Eintrag

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe