direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

General Notes, Dr. Ralf-Detlef Kutsche, Academic Director

Firstly, I would like to mention that in my position of „Academic Director“ at DIMA, I take the responsibility for Prof. Markl to coordinate all Master’s and Bachelor’s theses at DIMA research group, which means: 

  • The formal procedure of applying and registering for a thesis (with a proposal coming from you, the candidate and your potential advisor) must appear on my desk. Comments, additional advice, etc. will come to you in due time.
  • In case of industrial collaboration, there must be a meeting with me, in order to clarify the rules and conditions we impose at DIMA and the DFKI/IAM department of Prof. Markl.
  • The final defenses of each thesis will be done in our DIMA MSc/BSc colloquium (typically we have one Friday afternoon per month) under my direction.
  • In all these cases, please contact me directly, In general, this will be the case after you have reached an agreement with your chosen advisor from the DIMA / DFKI groups. If you are completely lost in „idea space“ then you can ask for an appointment with general guidance!


  • by email (ralf-detlef.kutsche@tu-berlin.de), or
  • in my office hours (EN-726, Tuesday, 12-13, by appointment).

Secondly, from time to time, I also take the responsibility to be the advisor of theses, deeply related to my research areas, i.e. preferably being one of my students in the INFMOD (Advanced Information Modeling, late Bachelor’s or early Master’s class, depending on your course of study!), or in the very advanced Master’s class AIM-1 / HDIS (Advanced Information Management 1 – Heterogeneous and Distributed Information Systems).

Researchers and Theses Opportunities

Dr. Kaustubh Beedkar


Research Area: "Geo-Distributed Data Analysis"

Topic Area: Constraint-aware Query Processing for Geo-Distributed Data Analysis

Many large organizations today have a global footprint and operate data centers that produce large amounts of data at different locations around the globe. Analyzing such geographically distributed data as a whole is essential to derive valuable insights. Typically, geo-distributed data analysis is carried out either by first communicating all data to a central location where analytics is performed or by a distributed execution strategy that minimizesdata communication. However, legal constraints arising from regulationspertaining to data sovereignty and data movement (e.g., prohibition of the

transfer of certain data across national borders) pose serious limitations to existing approaches. In this context, our research explores

  1. various possibilities for declaratively specifying legal constraints and
  2. methods and algorithms to automatically derive distributed execution strategies under such constraints.

Please arrange a meeting with me to discuss concrete thesis opportunities. Students are also encouraged to propose their own topic in the ambit of above research problem.

Prerequisites: Strong programming skills (preferably in Java), knowledge in query planning and execution in DBMS,

(nice to have) taken IDB-PRA, DBT, or other database lab courses and seminars.

Christoph Boden


Research Area: "Benchmarking Data Processing Systems for Machine Learning Workloads"

Topic Area: Evaluating Deep Learning Frameworks for Machine Learning Workloads

In light of the tremendous successes achieved by applying deep artificial neural networks [1], novel machine learning systems including TensorFlow [2] and MXNet [3] that that efficiently support the training of such networks have been proposed and developed. On the other hand the distributed systems and database community developed systems for the processing and analysis of massive data sets (commonly referred to as “Big Data Analyics” systems) including Apache Flink [4], Apache Spark [5] or Apache SystemML [6] that are also popular choices to execute scalable machine learning algorithms in practice. The goal of this work is to develop, implement and perform experimental evaluations to benchmark these systems for various popular machine learning algorithms in order to assess their suitability for the task of efficiently executing machine learning workloads.

[1] Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. DOI: 10.1038/nature14539
[2] https://www.tensorflow.org/
[3] https://mxnet.incubator.apache.org/
[4] https://spark.apache.org/
[5] https://flink.apache.org/
[6] https://systemml.apache.org/

Dr. Alexander Borusan


Research Area: „Data Streams Management in Embedded Information Systems“

Topic Area: Data Stream Modeling and Processing

Typical applications of embedded information systems (automotive, avionics, manufacturing control) include two main tasks: monitoring and controlling. Many sources of such applications produce data continuously as a stream. A data stream is an ordered sequence of data that can be read only once and should be processed in real time. From the point of view of a data stream management, several tasks need to be solved: modelling (data stream models), processing (data structuring and data reduction), quering (types of queries), scheduling, and storaging. Additionaly data streams analysis becomes one of the important tasks in the last decade.

Data Streams Modeling:

Architectures and models of real time streaming in embedded information systems (automotive, avionics)

Data Streams Processing:

Taxonomy and comparison of the data reduction techniques for the data streaming in the automotive applications

Dr. Sebastian Breß


Research Area: „Data Management on Modern Hardware“

Topic Area: Data Management on Modern Hardware

Modern Hardware influences data management in virtually every aspect. With growing main memory sizes to the Terabyte scale, we can keep databases in-memory. This shifts the performance bottleneck from disk IO to main memory access and computation. This has a number of consequences: Database operators need to be tuned to be cached efficiently, and must be multi threaded to make efficient use of modern processors, including new processor architectures such as GPUs or MICs. Query execution needs to be made CPU and cache efficient. Transaction processing needs can be optimized using hardware transactions and non-volatile memory.

We also encourage students to propose own topics in the field of data management on modern hardware.

Prequisites: Strong programming skills in C/C++, deep knowledge in database implementation techniques
Nice to have:
Knowledge in LLVM, CUDA or OpenCL.


Detailed Topics proposals:

Collaborative Query Processing on Heterogenous Processors

Here, you would extend CoGaDB and evaluate the performance impact of collaborative query processing. The most important related work is the Morsel Paper (the paper proposes a strategy to make a database NUMA aware, but the same idea can be applied to heterogenenous processors with dedicated memory).


Prototype a High Performance Stream Processing System

Hardcode 4-5 streaming queries in C and compare their performance to Apache Flink and Apache Storm. The goal is to find out whether it is beneficial to write a specialized code generator for streaming systems. If yes, prototype a simple code generator to support some easy streaming queries.

Bonaventura Del Monte


Research Area: „State Management for Distributed Data Stream Processing“

Topic Area: Improvement of the End-to-End Management of Large State in Distributed Stream Processing Engines

I offer Master's theses related to my research area, which seeks to enhance the end-to-end management of large state in distributed stream processing engines. Resource elasticity, faultolerance, load balancing, robust execution of stateful query, and query plans optimization are first-class citizens of my research agenda.

If you are interested in a topic in this area, please, provide me with some information about you, your interests, your programming skills, and your CV.


Detailed available topics:

End-to-End Management of Large Scale Streaming Experiments.

Executing distributed experiments is a tedious, error-prone process, especially when stream processing engines (SPEs) are involved. When benchmarking the behaviors of a SPE, the researchers are always after a number of metrics, which are in the form of time series. In a distributed setup, the number of metrics grows linearly with the number of nodes. Furthermore, faulty behaviors might show up at some point of the experiment. The goal of this thesis is to design and develop a framework that allows researchers to define and run large scale experiments involving few systems (e.g., SPEs, data generators, and a number of third-party systems), gather all the metrics the users are interested in from those systems, and provide a GUI that help them to analyze the results.

Requisites:   strong programming skills in Java (and C++), good understanding of the JVM and its memory model, and good knowledge of Apache Flink APIs

Nice to have: good knowledge in one (or more) of the following topics: Apache Flink internals, network programming, and distributed systems

Behrouz Derakshan


Research Area: „Optimization of Machine Learning Workloads“

Topic Areas: Deployment and End-to-End Optimization of Machine Learning Pipeline

The life cycle of machine learning applications does not end with training a model. After the training, both the pipeline and the model are used to answer prediction queries (typically in real-time).
Maintaining the quality of the deployed model requires further training.
Current methods involve offline retraining of a new model, which is time-consuming and adds extra overhead when redeploying the model.
A promising approach is to continuously train the deployed model, using a combination of online and batch training methods.
Continuous training can maintain the quality while reduces the overhead of retraining and redeployment.

Topic Area: End-to-End Optimization of Machine Learning Pipelines

Designing machine learning pipelines and models is an iterative process. First, a user designs a pipeline with a set of parameters (called hyper-parameters). Based on the evaluation result of the model, the user forms a hypothesis on how to improve the quality of the model. Based on the hypothesis, the user modifies the pipeline. This process continues until the evaluation result of the model is satisfactory. Even for a single user working on a single data set, this process is time-consuming. In real-world use cases, multiple users (data scientists) are working on a collection of data sets to design and training pipelines that result in high-quality models.
My goal is to combine the existing database optimization techniques, such as materialized view selection and multi-query optimization with new and novel optimization techniques to speed up the design and execution of the machine learning pipelines.

Detailed Topic Proposals:

Hyperparameter Optimization for Large-Scale Machine Learning

In this thesis, you will implement advanced hyperparameter search methods on scalable data processing platforms such as Apache Spark. One drawback of the advanced hyperparameter search methods is the long execution time. The goal of the thesis is to utilize efficient sampling, parallelization to speed up the process.

Gabor Gévay


Research Area: „Embedded Domain-Specific Languages for Data Analytics“

Topic Area: Compiling SystemML Programs to a Single Dataflow

Compiling SystemML Programs to a Single Dataflow

Distributed dataflow frameworks have become a mainstream approach for large-scale data analytics in recent years, as the successes of Apache Spark [1] and Apache Flink [2] show. The dataflow programming model allows users to express their programs as a directed graph, where the data flows on edges, and nodes perform computations on this data. A common architecture involves a client program, which runs on a single computer and submits dataflow jobs to the system, which executes these dataflow jobs on a cluster of machines.

While many data analytics algorithms involve iterations or other control flow, incorporating efficient and easy-to-use control flow constructs into the dataflow model has proven to be challenging. Older dataflow systems do not support control flow inside their dataflows. In these systems, control flow must be executed in the client program, which submits new dataflow jobs after every control flow decision (e.g., in every step of a loop).

Newer systems, such as Apache Flink [3], Naiad [4], or Tensorflow [5], employ a different approach to control flow. They allow for building cyclic dataflows, and incorporate control flow inside these cyclic dataflows. This has various performance advantages over launching new dataflow jobs after every control flow decision: 1. the overhead of launching new dataflows is eliminated, 2. various optimizations are made possible, such as loop invariant hoisting and loop pipelining.

However, most of these newer dataflow systems have inconvenient APIs for building these cyclic dataflows that involve control flow. In a recent paper [6], we address this problem by introducing a system called Labyrinth, which allows the user to express control flow by easy-to-use imperative control flow constructs, and compiles programs to a single cyclic dataflow job, thereby allowing also for efficient execution.

SystemML [7] is a well-known system for machine learning. It features an easy-to-use R-like language with imperative control flow constructs and compiles to MapReduce or Spark. An MSc thesis could be written about compiling SystemML programs to Labyrinth, and exploring the performance advantages of Labyrinth for machine learning workloads.


[1] Zaharia, M. et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (2012), USENIX Association.

[2] Carbone, Paris, et al. "Apache flink: Stream and batch processing in a single engine." Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36.4 (2015).

[3] Ewen, Stephan, et al. "Spinning fast iterative data flows." Proceedings of the VLDB Endowment 5.11 (2012): 1268-1279.

[4] Murray, Derek G., et al. "Naiad: a timely dataflow system." Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013.

[5] Yu, Yuan, et al. "Dynamic control flow in large-scale machine learning." Proceedings of the Thirteenth EuroSys Conference. ACM, 2018.

[6] Gévay, Gábor E., et al. "Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows." arXiv preprint arXiv:1809.06845 (2018).

[7] Boehm, Matthias, et al. "SystemML: Declarative machine learning on spark." Proceedings of the VLDB Endowment 9.13 (2016): 1425-1436.

Dr. Holmer Hemsen


Research Area: „Scalable Signalprocessing / Industrie 4.0“

Topic Areas: Data Analytics of Massive Time Series, Intelligent and Scalable Resource Management for Industrie 4.0

Data Analytics of Massive Time Series
A time series is a set of observations each recorded at a specific time. Examples of time series are manifold, e.g. electrocardiography curves, stock market data, seismic measurements, network load. Time series analysis comprises a wide range of methods, such as, anomaly and outlier detection, forecasting, and pattern recognition. The focus of this topic area is on research of methods for analysis of massive and/or multi-dimensional time series data.

Intelligent and Scalable Resource Management for Industrie 4.0
The goal of Industrie 4.0 is to digitalize, automate and optimize industrial production systems. In many cases this involves upgrading of conventional production systems into cyber-physical systems, often by utilizing Internet of Things (IoT) technology. The focus of this research topic is on methods for scalable optimization of production lines and intelligent forecasting of consumable resources to calculate optimal dynamic maintenance strategies.

Prerequisites: Strong programming skills in Java, Scala or Python; Good writing skills; Preferable knowledge of Apache Flink



Jeyhun Karimov


Research Area: „Benchmarking & Concurrent Query Processing“

Topic Areas: Benchmarking Data Processing Systems, Concurrent Query Processing

Benchmarking Data Processing Systems

With the development of big data systems in recent years, a variety of benchmarks have been proposed both from industry and academia. 

The goal of benchmarks is to develop a novel benchmarking methodology to evaluate and compare the workloads on a set of systems, which eventually lead to a technology improvement.

I offer topics to benchmark data processing systems, such as graph processing and stream processing systems.


Concurrent Query Processing

In the last decade, many distributed data processing engines were developed to perform continuous queries on massive online data. The central design principle behind these engines is to handle queries with a query-at-a-time model – optimizing each query separately. With the adoption multi-tenant clouds,  it is essential to enable new optimization frameworks that shares the data and computation among available queries. 

I offer topics related concurrent query optimization (single- or multi-objective). 

Martin Kiefer


Research Area: „Approximate Data Analysis Using Modern Hardware“

Topic Areas: Data Stream Summarization Using Custom Hardware (FPGAs), Improving Query Optimization Using Modern Hardware

My research investigates the combination of data approximation techniques and modern hardware architectures to increase the efficiency data analysis.


Data Stream Summarization Using Custom Hardware (FPGAs)

Power-efficienct data analysis is an increasingly important problem in the era of big data: The amount of available data continues to increase exponentially and for economic and environmental reasons, we need to ensure that the energy demands required to analyze the data do not grow exponentially as well. I am approaching this problem for data stream analysis by combining the potential of stream summarization techniques and custom hardware on FPGAs.

Improving Query Optimization Using Modern Hardware

The query optimizer is at the heart of state-of-the-art relational database systems. It derives different execution plans for a given query and selects the cheapest one based on statistics and a cost model. However, this has to be done in a very tight time budget, since query optimization delays query execution. I am investigating how modern hardware can help with this task. In particular, I’m improving the statistics available to the query optimizer by using bandwidth-optimized kernel density models as learning selectivity estimators on GPUs.


I offer thesis topics based on current research questions, student interests, and student skills. Students with an interest in modern hardware are preferred, but I may also provide thesis topics without a hardware focus. Skills in C/C++, OpenCL, VHDL, or Python programming might be useful.

If you are interested in my research topics, we can arrange a meeting for a discussion. Please include a CV in your request.

Andreas M. Kunft


Research Area: „Mixed Linear and Relational Algebra Pipelines“

Topic Area: Deeply-Embedded DSL with Abstract Data Types (ADT) for (Distributed) Collections and Matrices

Today's data analysis pipelines go beyond pure linear algebra and often include data generation and transformation steps (ETL) that are best defined using relational algebra operators.
In contrast, current systems either provide each domain as separate library, limiting optimizations to each library's domain in isolation (e.g., Python scikit-learn), or they map operations of the foreign domain on top of their domain (e.g., Spark, Tensorflow).

I conduct research based on a deeply-embedded DSL with abstract data types (ADT) for (distributed) collections and matrices. Therefore, Data analysts can express complete pipelines in a single language. Both ADTs are explicitly reflected in a common intermediate representation including control flow. Based on this IR, I experiment with new ways of holistic optimizations over both ADTs.

Offered Thesis:

I offer thesis in the described area based on my current topics and the student's interests.

For further information, please contact me via email including your CV, programming skills, and interests.

Dr. Ralf-Detlef Kutsche


Research Area: „Model Based Software and Data Integration“, focussing on „Semantic Concepts / Semantic Interoperability“

Topic Area: Model Based Methods for the Development of Heterogeneous and Distributed Information Systems in Large Scale

Since the 80’s of the last century, there is a huge discussion on the quality of software and information systems (which today is absolutely a top issue for our modern data world, as we can see from statements like „Data is the gold of modern times“ in the „big data“, „data analytics“, and „data science“ fields, where Prof. Markl and his groups play a fundamental role in the world with the BBDC – Berlin Big Data Center, with the university spin-off Data Artisans (Apache FLINK), and with several International Master’s programmes and tracks like Erasmus Mundus IT4BI and BDMA, EIT Digital ICT-Innovation Master track „Data Science“, TUB local Master track „Data Analytics“) and many other activities.

Unfortunately, two Gardner studies of 1985 and 2005 show the same desastrous result: Only approx. 25% of the software a projects started come to a successful end in time and in budget, another 25% after some time delay, budget overriding, and maybe even reduced functionality and performance. Remaining 50% die on the way, or never are even started properly after some initial planning! 

Model based methods (in earlier times following the MDA (Model Driven Architecture) ideas of the OMG (Object Management Group, an international stardardisation and management consortium of almost all active large companies in the world) since many years promise to improve as well the quality as also to reduce the cost of software dramatically (up to 70%) by models applied to all (!) phases of the whole software process – in our case for the development of (potentially, but in  most cases) Heterogeneous and Distributed Information Systems (HDIS) in large scale!

Applying models (e.g. UML models in simple cases, but better: ‚domain specific modeling languages‘) and semantic concepts (e.g. ontologies, formal logic and semantics, metadata stardards and (meta-) thesauri can support these methods significantly, as the results of two very large industrial R&D projects (among many others) under my scientific guidance show (e.g. BiZYCLE, 2007-2010, and BIZWARE, 2010-2013, funded by the German ministry of research BMBF).

Candidates being interested in these topics should have an excellent background in databases and information systems, in software engineering and software architecture, in formal methods and mathematics or theoretical computer science, particularly logic formalisms and languages, and, of course, in modeling with classical modeling languages in the UML family, with E/R (anyway known from each DBS couse), with BPMN or any other process/workflow modeling language, and should be interested in application domains (like health care, my main application area since 30 years, automotive industry, business intelligence, and, very relevant for the future, the energy sector!)

In case you fulfill these requirements, and you participated in my classes, or you can prove your knowledge gained from other universities in these fields, please apply for a thesis under or in my office hours (Tue, 12-13, during semester time, by appointment).

Clemens Lutz


Research Area: „Iterative Algorithms on Modern Hardware“

Topic Area: Iterative Machine Learning Algorithms on Modern Hardware

Many machine learning algorithms are structured as iterative algorithms. However, algorithms frequently do not use the full processing capabilities of modern hardware, such as GPUs and CPUs with vector instructions.

As iterative algorithms repeatedly access data, these algorithms offer unique opportunities for optimization. In some cases, more efficient use of hardware achieves order-of-magnitude speedups.

My research goal is to adapt algorithms and software frameworks to achieve modern hardware’s potential of short data to knowledge times. This involves tuning software for cache efficiency, data transfer efficiency, and vectorization. Ideally, software would support the user by automating these tasks.


Prerequisites Strong programming skills in C/C++, interest in low-level software optimizatio. Nice to have: Knowledge in GPU programming; databases, compilers, or operating systems research.


I offer thesis topics based on current research problems in the field of data analysis on modern hardware. Also, I encourage students to propose their own thesis topic.

Please contact me via E-mail to discuss your ideas for a thesis. Include a short text about your skills and research interests, and attach your CV.



Detailed topic proposals:

Dynamic Data Restructuring for Vectorized Processing

Machine learning algorithms process data stored as multi-dimensional matrices. For cache efficiency, data are accessed in blocks, e.g. as in block matrix multiplication. This works well if we run only a single operator. However, block-wise data access typically uses the entire cache. Thus, if we run multiple
operators at the same time (i.e. operator fusion), block-wise access prevents other algorithms from caching data.

A potential solution is to restructure data at runtime into a format ideal for both the processor and algorithm. The cost of the restructuring process would be amortized over multiple algorithm iterations. The goal of the thesis is to model the costs and validate it using a working implementation.


Sampling-Based Algorithms on GPUs

Machine learning algorithms such as stochastic gradient descent or out-ofcore k-means process data samples instead of the entire data set. However, when naively implemented, such sampling-based approaches are inherently limited by the memory latency. On GPUs, they are additionally limited by
the latency of the PCI-e bus.

In contrast to PCI-e, modern GPU interconnects such as NVLink provide high throughput. The goal of the thesis is to model the cost of interconnect latency for a GPU sampling algorithm. The model shall then be validated using a working implementation. The implementation can incorporate techniques
developed by the student to optimize access latency from the GPU.

Dr. Alireza Mahdiraji


Research Area: „Approximate Query Processing on Data Streams“

Topic Area: Distributed Summarization Data Structures

Many real-world applications (e.g., traffic monitoring, cluster health monitoring, web log analysis, online services) generate data streams at unprecedented rate and volume. Traditional query processing over such massive amounts of streaming data often results in high latency and increased computational cost. This overhead is even more pronounced for query processing over distributed data streams. On the other hand, Approximate Query Processing (AQP) provides approximate answers to queries at a fraction of cost of the original queries and is a mean to achieve interactive response times (sub-second latencies) when faced with voluminous data. Interactive query response times (at the cost of accuracy) is useful for many tasks like exploratory analytics, big-data visualization, or trend analysis. In particular, AQP techniques utilize data synopses (or summaries), much smaller representations of the data, used to quickly answer queries at the cost of accuracy. Examples of such synopses are using samples, histograms, wavelets, and sketches.

Our research focuses on developing methods for efficient construction and maintenance of data synopses for large amounts of streaming data that is generated in a distributed fashion.

Alexander Renz-Wieland


Research Area: „Large-Scale Machine Learning“

Topic Area: Large-Scale Machine Learning

Training machine learning (ML) models on a cluster instead of a single machine increases the amount of available compute and memory, but requires communication among cluster nodes for synchronizing model parameters. For some ML models, this synchronization can become the dominating part of the training process, such that using more computers does not result in the intended speed-up.

To avoid much of this communication, researchers developed algorithms that create and exploit parameter locality. That is, at a given point in time each of the workers updates only a subset of the model parameters. These subsets typically change throughout the training process, i.e., workers update different subsets throughout training. Such algorithms exist for multiple types of ML models. The locality can stem from the training algorithm, the ML model, or the training data.

ML developers typically need to implement such locality-exploiting algorithms from scratch, i.e., they have to know about low-level details of distributed computing. We are developing a system that allows researchers and practitioners to implement such algorithms without detailed knowledge of distributed computing. Our approach is to make the state-of-the-art architecture for distributed ML, so-called parameter servers, usable and efficient for locality-exploiting algorithms.  

I offer multiple thesis topics related to this line of work. For example, theses can work on aspects of the system or apply the system to specific ML models.

If you are interested, don't hesitate to contact me to arrange a meeting. Please provide me with some information about you, your interests, your prior experience, your programming skills, and your CV.

Viktor Rosenfeld


Research Area: „Adapting Data Processing Code to Different Processor Types Without Manual Tuning“

Topic Area: Data Processing on Heterogeneous Processors

In the last decade, processors have become increasingly diverse, parallelized, and specialized for specific tasks. For example, in addition to multi-core CPUs, there are GPUs, Intel Xeon Phis, and FPGAs. Often, developers have to write program code that is specific to a particular processor to fully exploit its resources.

In my research, I study how the database can adapt its operator code automatically to the processor that it’s currently running on. In essence, my goal is to write a database system that learns how to rewrite itself until it runs as fast as possible on any given processor. To this end, I work a lot with OpenCL which is a programming standard that enables users to run the same program on different types of processors such as CPUs, GPUs, etc.

Prerequisites: Strong programming skills in C/C++, interest in low-level programming, interest in processor architecture.
Nice to have: Knowledge in GPU programming (e.g., OpenCL and/or CUDA); interest in automatic tuning.

I offer to mentor both bachelor and master thesis in the context of data processing on heterogeneous processors. I encourage students to develop their own ideas. The proposals below can be used as a starting point.

Please contact me via email to discuss your ideas for a thesis. Be sure to include a short text about your skills and interests, and attach your CV.


Detailed topic proposals:

Evaluation of Hash-Based Grouped Aggregation Algorithms on GPUs

Hash-based grouped aggregation has been studied extensively on multi-core CPUs. In general, one of three algorithms works best, depending on the group cardinality. However, as the cardinality is not always known in advance, there are also algorithms that do not assume prior knowledge and degrade gracefully to large cardinalities.

The goal of this work is to port and adapt these algorithms, which are written for multi-core CPUs, to GPUs using OpenCL as a target language. The algorithms should be integrated into an existing test suite. A thorough evaluation of these algorithms and a comparison with existing algorithms is also part of this thesis.


Implicit Vectorization on Intel CPUs

Modern CPUs use so-called SIMD instructions to apply the same instruction (such as addition) to multiple data items in a single cycle. Without exploiting SIMD capabilities, the resources of modern CPUs go to waste. However, SIMD instructions are generally not portable from one processor generation to the next.

The Intel OpenCL compiler tries to use SIMD instructions automatically through a process called auto-vectorization. The goal of this work is to take an existing vectorized formulation of data processing operators, and reformulate them in a way that they can be efficiently vectorized by the Intel OpenCL compiler. A thorough evaluation of these operators and a comparison with existing native SIMD versions is also part of this thesis.

Juan Soto


Research Area: „Data Analysis / Data Analytics“

Topic Area: Exploratory Data Analysis, Numerics in Data Analytics

Exploratory Data Analysis

An Analysis of Current Approaches/Solutions for Big Data Problems and Devising Novel Technique

Numerics in Data Analytics

A Closer Look at Software Quality in Existing Big Data Analytics Libraries: Challenges and Pitfalls.

Jonas Traub


Research Area: „On-Demand Data Stream Processing“

Topic Area: On-Demand Data Gathering in the Internet of Things

Real-time sensor data enables diverse applications such as smart metering, traffic monitoring, and sport analysis. In the Internet of Things, billions of sensor nodes form a sensor cloud and offer data streams to analysis systems. However, it is impossible to transfer all available data with maximal frequencies to all applications. Therefore, it is required to produce and process data streams which are tailored to the data demand of applications. My research goal is to optimize communication costs in the IoT while maintaining the desired accuracy and latency of stream processing jobs.


I offer thesis topics based on current research questions, student interests, and student skills.

Please contact me to arrange a meeting for a discussion about concrete thesis opportunities. Please provide me with some information about you, your interests, your programming skills, and your CV.

Dr. Steffen Zeuch


Research Area: „Query Optimization and Execution on Modern CPUs“

Topic Area: Query Optimization and Execution on Modern CPUs

Over the last decades, database system have been migrated from disk to memory architectures such as RAM, Flash, or NVRAM. Research has shown that this migration fundamentally shifts the performance bottleneck upwards in the memory hierarchy. Whereas disk-based database systems were largely dominated by disk bandwidth and latency, in-memory database systems mainly depend on the efficiency of faster memory components, e. g., RAM, caches, and registers.

With respect to hardware, the clock speed per core reached a plateau due to physical limitations. This limit caused hardware architects to devote an increasing number of available on-chip transistors to more processors and larger caches. However, memory access latency improved much slower than memory bandwidth. Nowadays, CPUs process data much faster than transferring data from main memory into caches. This trend creates the so-called Memory Wall which is the main challenge for modern main memory database systems.

To encounter these challenges and enable the full potential of the available processing power of modern CPUs for database systems, we propose theses to reduce the impact of the Memory Wall.

We also encourage students to propose own topics in the field of query optimization and processing on modern CPUs.

Requirements: Strong programming skills in C/C++, deep knowledge in database implementation techniques, good understanding of computer architecture
Optional Knowledge in LLVM, Vtune, MPI, OpenMP

Zusatzinformationen / Extras