direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments


Towards Unsupervised Data Quality Validation on Dynamic Data
Zitatschlüssel RedyukMS20
Autor Sergey Redyuk, Volker Markl, Sebastian Schelter
Jahr 2020
Journal presented at International Workshop on Explainability for Trustworthy ML Pipelines (ETMLP)
Notiz A recording of the presentation is available here: https://www.youtube.com/watch?v=Xhq8X64RA1Q

Presentation slides are available here: https://www.redaktion.tu-berlin.de/fileadmin/fg131/Conferences/Presentations/Redyuk_ETMLP-2020.pdf
Zusammenfassung Validating the quality of data is crucial for establishing the trustworthiness of data pipelines. State-of-the-art solutions for data validation and error detection require explicit domain expertise (e.g., in the form of rules or patterns) or manually labeled examples. In real-world applications, domain knowledge is often incomplete, data changes over time, which limits the applicability of existing solutions. We propose an unsupervised approach for detecting data quality degradation early and automatically. We will present the approach, its key assumptions, and preliminary results on public data to demonstrate how data quality can be monitored without manually curated rules and constraints.
Link zur Publikation Download Bibtex Eintrag

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe