TU Berlin

Fachgebiet Datenbanksysteme und InformationsmanagementPublikationen

Logo FG DIMA-new  65px

Inhalt

zur Navigation

Publikationen

Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing
Zitatschlüssel KumaigorodskiLM21
Autor Kumaigorodski, Alexander and Lutz, Clemens and Markl, Volker
Jahr 2021
Journal BTW
Notiz Reproducibility Badge
Zusammenfassung Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.
Link zur Originalpublikation Download Bibtex Eintrag

Navigation

Direktzugang

Schnellnavigation zur Seite über Nummerneingabe