TU Berlin

Database Systems and Information Management GroupWS20/21

Logo FG DIMA-new  65px

Page Content

to Navigation

Talks DIMA Research Seminar

Talks  WS20/21
Talk/Location
Lecturer/Subject
Mo 12.10.2020 4:00 - 4:45 pm

tu-berlin.zoom.us/j/63307405395
Carsten Binnig (TU Darmstadt)

"DeepDB - Learn from Data, not from Queries!"

Mo 19.10.2020  4.00 - 4:45
tu-berlin.zoom.us/j/62575161702


Themis Palpanas (French University Institute)
"
Scalable Machine Learning on Large Sequence Collections"
Mo 26.10.2020 5:30 - 6:15 pm
tu-berlin.zoom.us/j/69632757863
Alekh Jindal

"Optimizing Cloud Query Engines at Microsoft"

Mo. 02.11.2020 4:00 - 4:45 pm

tu-berlin.zoom.us/j/67805032796
Peter Boncz (CWI / VU University Amsterdam) “Fast Random Access String Compression”

Alekh Jindal, Gray Systems Lab (GSL), Microsoft

Title:   Optimizing Cloud Query Engines at Microsoft

Abstract: Cloud query engines have become increasingly complex making the job of a query optimizer incredibly difficult. This is due to more complicated decision making, more complex query plans seen, and more tedious objective functions in cloud workloads. As a result, production cloud query optimizers are often far from optimal. In this talk, we describe a learning platform for optimizing cloud query workloads at Microsoft. We present a micromodel approach for handling the scale and complexity of cloud workloads by characterizing them into smaller subsets and learning a large number of specialized models over them. The micromodel approach can scale to very large training inputs and yields smaller lightweight models that could be scored with efficiently within the query optimizer. We describe our journey towards productization, using learned cardinality as a concrete example, via performance over very large production workloads and illustrate the various challenges involved in deployment.

Bio: Alekh Jindal is a Principle Scientist at Gray Systems Lab (GSL), Microsoft and manages the Redmond site of the lab. His research focusses on improving the performance of large-scale data-intensive systems. Earlier, he was a postdoc associate in the Database Group at MIT CSAIL. Alekh received his PhD from Saarland University, working on flexible and scalable data storage for traditional databases as well as for MapReduce. In the past 10 years, Alekh has served as a chair, PC member and reviewer at top-tier conferences in the field including SIGMOD, VLDB, ICDE, and SOCC. He received best paper awards at VLDB 2014 and CIDR 2011.

 

Zoom-Meeting beitreten

https://tu-berlin.zoom.us/j/69632757863?pwd=YnE2c3d5VG1oZHFzTm5XUFA1QmJ1Zz09

 

Meeting-ID: 696 3275 7863

Kenncode: 901600

Schnelleinwahl mobil

+496950502596,,69632757863#,,,,,,0#,,901600# Deutschland

+496971049922,,69632757863#,,,,,,0#,,901600# Deutschland

 

Einwahl nach aktuellem Standort

+49 695 050 2596 Deutschland

+49 69 7104 9922 Deutschland

+49 30 5679 5800 Deutschland

Meeting-ID: 696 3275 7863

Kenncode: 901600

Ortseinwahl suchen: https://tu-berlin.zoom.us/u/cdMvW5Kr6h

 

Über SIP beitreten

sip:69632757863.901600@fr.zmeu.us

 

Über H.323 beitreten

213.19.144.110 (Amsterdam

Niederlande)

213.244.140.110 (Deutschland)

162.255.37.11

162.255.36.11

221.122.88.195

115.114.131.7

115.114.115.7

103.122.166.55

209.9.211.110

64.211.144.160

69.174.57.160

Themis Palpanas (French University Institute)

Title:  Scalable Machine Learning on Large Sequence Collections

 

Abstract:  There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to analyze very large collections of sequences, or data series. Examples of such applications come from scientific, manufacturing and social domains, where in several cases they need to apply machine learning techniques for knowledge extraction. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. However, no existing data management solution (such as relational databases, column stores, array databases, and time series management systems) can offer native support for sequences and the corresponding operators necessary for complex analytics.

In this talk, we argue for the need to study the theory and foundations for sequence management of big data sequences, and to build corresponding systems that will enable scalable management and analytics of very large sequence collections. We describe recent efforts in designing techniques for indexing and analyzing truly massive collections of data series that will enable scientists to run complex analytics on their data. Finally, we present open research directions in the area of big sequence management.

 

 

Bio: Themis Palpanas is Senior Member of the French University Institute (IUF), a distinction that recognizes excellence across all academic disciplines, and professor of computer science at the University of Paris (France), where he is director of the Data Intelligence Institute of Paris (diiP), and director of the data management group, diNo. He received the BS degree from the National Technical University of Athens, Greece, and the MSc and PhD degrees from the University of Toronto, Canada. He has previously held positions at the University of California at Riverside, University of Trento, and at IBM T.J. Watson Research Center, and visited Microsoft Research, and the IBM Almaden Research Center.

His interests include problems related to data science (big data analytics and machine learning applications). He is the author of nine US patents, three of which have been implemented in world-leading commercial data management products. He is the recipient of three Best Paper awards, and the IBM Shared University Research (SUR) Award. He is currently serving on the VLDB Endowment Board of Trustees, as an Editor in Chief for the BDR Journal, Associate Editor in the TKDE, and IDA journals, as well as on the Editorial Advisory Board of the IS journal, and the Editorial Board of the TLDKS Journal. He has served as General Chair for VLDB 2013, Associate Editor for VLDB 2019 and 2017, Research PC Vice Chair for ICDE 2020, and Workshop Chair for EDBT 2016, ADBIS 2013, and ADBIS 2014, General Chair for the PDA@IOT International Workshop (in conjunction with VLDB 2014), and General Chair for the Event Processing Symposium 2009.

 Zoom-Meeting beitreten

https://tu-berlin.zoom.us/j/62575161702?pwd=WmZjT284REZkVm1YM3hRWU5id2VGQT09

Meeting-ID: 625 7516 1702

Kenncode: 577174

Schnelleinwahl mobil

+493056795800,,62575161702#,,,,,,0#,,577174# Deutschland

+496950502596,,62575161702#,,,,,,0#,,577174# Deutschland

 

Einwahl nach aktuellem Standort

+49 30 5679 5800 Deutschland

+49 695 050 2596 Deutschland

+49 69 7104 9922 Deutschland

Meeting-ID: 625 7516 1702

Kenncode: 577174

Ortseinwahl suchen: https://tu-berlin.zoom.us/u/cc7lfK878U

Carsten Binnig (TU Darmstadt)

Title:  DeepDB - Learn from Data, not from Queries!

 

Abstract: 

The typical approach for learned DBMS components is to capture the behavior by running a representative set of queries and use the observations to train a machine learning model. This workload-driven approach, however, has two major downsides. First, collecting the training data can be very expensive, since all queries need to be executed on potentially large databases. Second, training data has to be recollected when the workload and the data changes. To overcome these limitations, we take a different route: we propose to learn a pure data-driven model that can be used for different tasks such as query answering, cardinality estimation, or even as an index. This data-driven model also supports ad-hoc queries and updates of the data without the need of full retraining when the workload or data changes.The results of our empirical evaluation demonstrate that our data-driven approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries.

 

 

Bio:

Carsten Binnig is a Full Professor in the Computer Science department at TU Darmstadt and an Adjunct Associate Professor in the Computer Science department at Brown University. Carsten received his PhD at the University of Heidelberg in 2008. Afterwards, he spent time as a postdoctoral researcher in the Systems Group at ETH Zurich and at SAP working on in-memory databases. Currently, his research focus is on the design of data management systems for modern hardware as well as modern workloads such as interactive data exploration and machine learning. His work has been awarded with a Google Faculty Award, as well as multiple best paper and best demo awards for his research.

 

Zoom-Meeting beitreten

https://tu-berlin.zoom.us/j/63307405395?pwd=b0N3aHhMQ1dqOUNHeEVhWmNVVzlndz09

 

Meeting-ID: 633 0740 5395

Kenncode: 997357

Schnelleinwahl mobil

+493056795800,,63307405395#,,,,,,0#,,997357# Deutschland

+496950502596,,63307405395#,,,,,,0#,,997357# Deutschland

 

Einwahl nach aktuellem Standort

+49 30 5679 5800 Deutschland

+49 695 050 2596 Deutschland

+49 69 7104 9922 Deutschland

Meeting-ID: 633 0740 5395

Kenncode: 997357

Ortseinwahl suchen: https://tu-berlin.zoom.us/u/cbdRG68VVD

 

Über SIP beitreten

sip:63307405395.997357@fr.zmeu.us

 

Über H.323 beitreten

213.19.144.110 (Amsterdam

Niederlande)

213.244.140.110 (Deutschland)

162.255.37.11

162.255.36.11

221.122.88.195

115.114.131.7

115.114.115.7

103.122.166.55

209.9.211.110

64.211.144.160

69.174.57.160

Kenncode: 997357

Meeting-ID: 633 0740 5395

Peter Boncz (CWI / VU University Amsterdam)

Title: Fast Random Access String Compression

Abstract:  Strings are prevalent in real-world data sets. They often occupy a large fraction of the data and are slow to process. In this work, we present Fast Static Symbol Table (FSST), a lightweight compression scheme for strings. On text data, FSST offers decompression and compression speed similar to or better than the best speed-optimized compression methods, such as LZ4, yet offers significantly better compression factors. Moreover, its use of a static symbol table allows random access to individual, compressed strings, enabling lazy decompression and query processing on compressed data.

We believe these features will make FSST a valuable piece in the standard compression toolbox.

Bio: Peter Boncz holds appointments as tenured researcher at CWI and professor at VU University Amsterdam. His academic background is in core database architecture, with the MonetDB the systems outcome of his PhD -- MonetDB much later won the 2016 ACM SIGMOD systems award. He has a track record in bridging the gap between academia and commercial application, receiving the Dutch ICT Regie Award 2006 for his role in the CWI spin-off company Data Distilleries. In 2008 he co-founded Vectorwise around the analytical database system by the same name which pioneered vectorized query execution. He is co-recipient of the 2009 VLDB 10 Years Best Paper Award, and in 2013 received the Humboldt Research Award for his research on database architecture, and is a fellow at TU Munich. He also works on graph data management, founding in 2013 the Linked Database Benchmark Council (LDBC), a benchmarking organization for graph database systems.

 

Zoom-Meeting beitreten

https://tu-berlin.zoom.us/j/67805032796?pwd=NTZsME1vRnBlTWE1OGg4eTI4YTBtZz09

 

Meeting-ID: 678 0503 2796

Kenncode: 791137

Schnelleinwahl mobil

+496950502596,,67805032796#,,,,,,0#,,791137# Deutschland

+496971049922,,67805032796#,,,,,,0#,,791137# Deutschland

 

Einwahl nach aktuellem Standort

+49 695 050 2596 Deutschland

+49 69 7104 9922 Deutschland

+49 30 5679 5800 Deutschland

Meeting-ID: 678 0503 2796

 

Meeting-ID: 678 0503 2796

Kenncode: 791137

Ortseinwahl suchen: https://tu-berlin.zoom.us/u/cdQEnjynoR

 

 

 

 

Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe