Inhalt des Dokuments
Termine DIMA Kolloquium
Prof. Stefano Ceri, Politecnico di Milano
Genomic Data Management as Research Enabler
ABSTRACT und Bio:
Stefano Ceri is professor of Database Systems at the Dipartimento
di Elettronica, Informazione e Bioingegneria (DEIB) of Politecnico di
Milano. He was visiting professor at the Computer Science Department
of Stanford University (1983-1990), chairman of the Computer Science
Section of DEI (1992-2004), chairman of LaureaOnLine in Computer
Engineering (2004-2008), director of Alta Scuola Politecnica (ASP) of
Politecnico di Milano and Politecnico di Torino (2010-2013),
co-founder (2001) and shareholder (2001-current), WebRatio
He is the recipient of the ACM-SIGMOD “Edward T. Codd Innovation Award” (New York, June 26, 2013), an ACM Fellow and member of the Academia Europaea. With an H index 57,he is listed among the Top Italian Scientists. He wrote over 250 journal and conference articles. He is co-editor in chief of the book series “Data Centric Systems and
Applications” (Springer-Verlag). He is author of three US Patents and ten books in English, including core textbooks on Computer Science and Data Management.
Everybody is cordially welcome! Please, forward this invitation to interested colleagues.
Prof. Wil van der Aalst, Technische Universiteit Eindhoven
"Process Mining: Data Science in Action"
Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). This technology has become available only recently, but it can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational, understanding the browsing behavior of customers using a booking site, analyzing failures of a baggage handling system, and improving the user interface of an X-ray machine. All of these applications have in common that dynamic behavior needs to be related to process models. Hence, prof. Wil van der Aalst refers to it in his talk as "data science in action". Process mining provides not only a bridge between data mining and business process management; it also helps to address the classical divide between "business" and "IT". Evidence-based business process management based on process mining helps to create a common ground for business process improvement and information systems development. In his talk prof. Van der Aalst also discusses approaches to deal with "Big Event Data". In particular, he will show that interesting process mining can be distributed in unexpected ways.
Prof.dr.ir. Wil van der Aalst is a full professor of Information Systems at the Technische Universiteit Eindhoven (TU/e). He is also the Academic Supervisor of the International Laboratory of Process-Aware Information Systems of the National Research University, Higher School of Economics in Moscow. Moreover, since 2003 he has a part-time appointment at Queensland University of Technology (QUT). At TU/e he is the scientific director of the Data Science Center Eindhoven (DSC/e). His personal research interests include workflow management, process mining, Petri nets, business process management, process modeling, and process analysis. Wil van der Aalst has published more than 170 journal papers, 17 books (as author or editor), 370 refereed conference/workshop publications, and 60 book chapters. Many of his papers are highly cited (he one of the most cited computer scientists in the world and has an H-index of 112 according to Google Scholar) and his ideas have influenced researchers, software developers, and standardization committees working on process support. He has been a co-chair of many conferences including the Business Process Management conference, the International Conference on Cooperative Information Systems, the International conference on the Application and Theory of Petri Nets, and the IEEE International Conference on Services Computing. He is also editor/member of the editorial board of several journals, including Computing, Distributed and Parallel Databases, Software and Systems Modeling, the International Journal of Business Process Integration and Management, the International Journal on Enterprise Modelling and Information Systems Architectures, Computers in Industry, Business & Information Systems Engineering, IEEE Transactions on Services Computing, Lecture Notes in Business Information Processing, and Transactions on Petri Nets and Other Models of Concurrency. In 2012, he received the degree of doctor honoris causa from Hasselt University. In 2013, he was appointed as Distinguished University Professor of TU/e and was awarded an honorary guest professorship at Tsinghua University. He is also a member of the Royal Netherlands Academy of Arts and Sciences (Koninklijke Nederlandse Akademie van Wetenschappen), Royal Holland Society of Sciences and Humanities (Koninklijke Hollandsche Maatschappij der Wetenschappen) and the Academy of Europe (Academia Europaea).
Everybody is cordially welcome! Please, forward this invitation to interested colleagues.
Holger Pirk, CWI
Title: Database Cracking: Fancy Scan, Not
Poor Man‘s Sort!
Abstract: Database Cracking is an appealing approach to adaptive indexing: on every range-selection query, the data is partitioned using the supplied predicates as pivots. The core of database cracking is, thus, pivoted partitioning. While pivoted partitioning, like scanning, requires a single pass through the data it tends to have much higher costs due to lower CPU efficiency. In this paper, we conduct an in-depth study of the reasons for the low CPU efficiency of pivoted partitioning. Based on the findings, we develop an optimized version with significantly higher (single-threaded) CPU efficiency. We also develop a number of multi-threaded implementations that are effectively bound by memory bandwidth. Combining all of these optimizations we achieve an implementation that has costs close to or better than an ordinary scan on a variety of systems ranging from low-end (cheaper than $300) desktop machines to high-end (above $60,000) servers.
Holger Pirk just finished his PhD thesis in the Database Architectures group at CWI in Amsterdam and expects to defend it early 2015. He received his master‘s degree (Diplom) in computer science from Humboldt-Universität zu Berlin in 2010. His research interests lie in analytical query processing on memory-resident data. In particular, Holger studies storage schemes and processing models for modern hardware. He is currently visiting HU Berlin, but will shortly join the Big Data Group at MIT CSAIL.
Sebastian Breß, TU Dortmund
"Efficient Query Optimization in Co-Processor-accelerated Databases"
Over the last decade, the database community has explored how to
efficiently implement query processing algorithms on graphics
processing units and other co-processors. However, while these
algorithms often outperform their CPU counterparts, it is still highly
challenging to accelerate a complete database engine. There are three
major challenges for this: 1) The DBMS needs to handle multiple
heterogeneous processors in a uniform way. 2) The query optimizer
needs to efficiently place database operators across the available
processors. 3) The system needs to minimize the communication overhead
between the processors.
In this talk, I will discuss how we approached these challenges in HyPE, our hardware-oblivious query optimizer. I will present details of the complete query optimization pipeline, starting from cost estimation without detailed information about the hardware, to query optimization heuristics for heterogeneous co-processor environments. Furthermore, I will discuss concurrent query processing on co-processors and how we avoid resource contention. Finally, I will point out open problems and possible future directions.
In 2010, Sebastian obtained his Bachelor‘s degree in Computer Science from the University of Magdeburg in Germany, followed by his Master‘s degree in 2012. At the moment, he is a PhD student at the Dortmund University of Technology, working on query optimization for heterogeneous processor environments. In particular, he is focusing on efficient co-processor utilization and hardware-oblivious query optimization. In the course of his research, he developed two systems: 1) The hardware-oblivious query optimization framework HyPE, and 2) CoGaDB, a GPU-accelerated column store that targets OLAP workloads. In 2013, he helped to organize the German database conference “BTW” in Magdeburg. Furthermore, he is also co-organizing the “German community meetup for GPUs in databases” in Dortmund as well as the “International Workshop on Data (Co-)Processing on Heterogeneous Hardware“ (DAPHNE), which is a co-located workshop at EDBT/ICDT 2015.
Part 1 + 2, Zbigniew Jerzak, SAP Innovation Center Potsdam
TITLE: Cloud-based Data Stream Processing
In this talk we present the results of recent research about the cloud enablement of data streaming systems. We illustrate, based on both industrial as well as academic prototypes, new emerging uses cases and research trends. Specifically, we focus on novel approaches for (1) scalability and (2) fault-tolerance in large scale distributed streaming systems. In general, new fault-tolerance mechanisms strive to be more robust and at the same time introduce less overhead. Novel load balancing approaches focus on elastic scaling over hundreds of instances based on the data and query workload. Finally, we present open challenges for the next generation of cloud-based data stream processing engines.
Zbigniew Jerzak is a senior researcher at the SAP Innovation Center in Potsdam, Germany. Zbigniew‘s current professional and research interests cover the whole spectrum of high velocity big data analysis. Zbigniew has been responsible for a number of projects focusing on horizontal and vertical scalability as well as fault-tolerance for complex event processing systems; query optimization across heterogeneous data management engines; and real-time visualization and analytics of large data volumes.
Zbigniew received his diploma in computer science from the Silesian University of Technology, Gliwice, Poland (2003). Shortly thereafter Zbigniew joined the graduate program at the TU Dresden where he received his PhD in 2009 for his dissertation on distributed publish/subscribe systems. Since 2009 Zbigniew is a member of scientific staff at the SAP.
Zbigniew is a current and past member of the following scientific program committees: DEBS 2012, DEBS 2013, DEBS 2014, DEBS 2015, CloudDP 2013, CloudDP 2014, CloudDP 2015, VLDB 2015. Zbigniew is also a co-chair of the DEBS Grand Challenge series and a reviewer for the Elsevier Journal of Parallel and Distributed Computing and Elsevier Information Systems. Zbigniew has published over 20 papers in international conferences. He is a recipient of the VLDB 2014 best paper award for the paper „M4: A Visualization-Oriented Time Series Data Aggregation“.
Mihnea Andrei, SAP
Title: Darwinian evolution: 3 implementations of snapshot isolation in SAP HANA
The talk quickly presents the HANA column store, then focuses on 3
historical version of snapshot isolation implementation, presenting
for each what was working well and why we evolved to the next one.
MS in computer science in 1988; the Bucharest Polytechnic Institute, Automatic Control and Computers engineering school; Prof. Cristian Giumale
DEA in Machine Learning in 1990; Universite Paris 6; Prof. Jean-Gabriel Ganascia
Joined Sybase in 1993; currently working at SAP, which has acquired Sybase in 2010.
Worked on the core engine of several RDBMs (Sybase ASE and IQ; SAP HANA): query optimization, Abstract Plans (optimizer hints), query compilation and execution, eager-lazy aggregation, shared-disk and shared-nothing scale-out, database store, transaction processing.
Marcel Kornacker, tech lead at Cloudera
Title: "Impala: A Modern, Open-Source SQL Engine for Hadoop"
The Cloudera Impala project is pioneering the next generation of
Hadoop capabilities: the convergence of fast SQL queries with the
capacity, scalability, and flexibility of a Hadoop cluster. With
Impala, the Hadoop community now has an open-sourced codebase that
helps users query data stored in HDFS and Apache HBase in real time,
using familiar SQL syntax. In contrast with other SQL-on-Hadoop
initiatives, Impala‘s operations are fast enough to do interactively
on native Hadoop data rather than in long-running batch jobs. Now you
have the freedom to discover relationships and explore what-if
scenarios on Big Data datasets. By taking advantage of Hadoop‘s
infrastructure, Impala lets you avoid traditional data warehouse
obstacles like rigid schema design and the cost of expensive ETL jobs.
This talk starts out with an overview of Impala from the user‘s
perspective, followed by a presentation of Impala‘s architecture and
implementation. It concludes with a summary of Impala‘s benefits when
compared with the available SQL-on-Hadoop alternatives.
Marcel Kornacker is a tech lead at Cloudera for new products
development and creator of the Cloudera Impala project. Following his
graduation in 2000 with a PhD in databases from UC Berkeley, he held
engineering positions at several database-related start-up companies.
Marcel joined Google in 2003 where he worked on several ads serving
and storage infrastructure projects, then became tech lead for the
distributed query engine component of Google‘s F1 project.
Mohamed Khafagy, Ph.D., Fayoum University, Egypt
“An optimized tool to Increase the effectiveness and potential of
writing SQL in Hadoop „
With the fast increase in the size of data in large systems and with increasing needs of analysis this data the need of optimized tool that support running advanced SQL Query and enhance the performance of SQL is very necessary so we built an optimized tool that support writing Advanced SQL Query(sub query-Intersect-Union-Minus, Extra), enhance the performance of join operation, Building an Index to improve the performance of Star schema and reuse of intermediate date of previous SQL in Multi-session. Our tool improve running any SQL Query without any change in HIVE or Hadoop. We evaluate and test our tool using TPC-Benchmark and we show that our tool enhance the performance of running SQL query in Hadoop..
Mohamed Khafagy is a staff member of computer science Department in the faculty of computers and information Fayoum University Egypt. Mohamed receive his Ph.D. in computer science in 2009. Mohamed had Oracle Certified master in Database Administration in 2003. He also works at Oracle Egypt as consultant and trainer. he is a scientific consultant of Fayoum University Mohamed worked as project manager of many Projects and he also the manager of the E-Learning Center in Fayoum University. Mohamed works as postdoctoral in DIMA group in Technique University Berlin in 2012. Mohamed establish the first Bigdata Research group in Egypt in 2013. He has many publications in the area of Cloud computing and database and his research interest is Cloud computing, Database, Parallel programming, Big Data Analysis and Map Reduce.
Home page: www.mkhafagy.com