direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Termine DIMA Kolloquium

Termine WS19/20
11.00 Uhr
MA 004
BBDC Talk: "Agency + Automation", Jeffrey Heer (University of Washington/Trifacta)
10.00 Uhr
DFKI Projektbuero Berlin, 4th Floor, Room: Weizenbaum, Alt-Moabit 91c, 10559 Berlin
Nantia Makrynioti, Athens University of Economics and Business
"Declarative specification and automatic compilation of machine learning algorithms"
16.00 Uhr
EN 719
Fabio Porto, The National Laboratory of Scientific Computing (LNCC) Rio de Janeiro, Brazil
"Managing and Analysing Simulation Data"
16.00 Uhr
EN 719
Katarzyna Juraszek, TU Berlin
"Extended Kalman Filter for Large Scale Vessels Trajectory Tracking in Distributed Stream Processing Systems"
16.00 Uhr
EN 719
Ilin Tolovski,
"System for cloud execution, semantic annotation, storage and querying of data mining experiments"
16.00 Uhr
EN 719
Alireza Zarei, Karlsruhe Institute for Technology
"A distributed marketplace of data and storage for the next-generation Internet"
16.00 Uhr
Flavio Clesio, MyHammer AG
"Job Recommendations in a Marketplace with Multiple Stakeholders"
16.00 Uhr

Zongxiong Chen
"Hysteretic neural networks, stream processing system on modern hardware, and research agenda"
12.00 Uhr
Chen Xu, East China Normal University
"Exploiting Incremental Evaluation for Efficient Distributed Matrix Computation"

Nantia Makrynioti, Athens University of Economics and Business

Title: Declarative specification and automatic compilation of machine learning algorithms

Declarative programming is usually summarised in the phrase "describing what needs to be done, instead of telling the program how to do it". As the adoption of data science grows rapidly, a need has emerged for democratising data analysis tasks by making their development more approachable and less tedious through high-level languages. Inspired by the success of the declarative paradigm in relational database systems, researchers have recently started exploring whether the use of declarative languages in the machine learning (ML) domain can provide a productivity leap for developers.

In this talk I will give a brief overview of efforts in the area of declarative data analytics and machine learning and describe the design of sql4ml, a system that aims at democratising  ML tasks for database users. It allows the user to express ML models in SQL following the "model + solver" approach, where there is a description of the objective function (a.k.a. loss or cost function) of an ML model and a solver that provides the optimal solution for it. Sql4ml translates the SQL code defining the model to an appropriate representation for training inside an ML framework. After training, the computed solution is stored back to the database, which allows for more robust model management and generation of future predictions inside the database.


Short Bio:

Nantia Makrynioti is a PhD student in Computer Science at the Athens University of Economics and Business supervised by Professor Vasilis Vassalos. Her research focuses on integrating machine learning functionality with relational databases, which aligns with her interests in declarative machine learning (a paradigm well-known from databases applied on the area of machine learning). In the context of this effort, she has also worked with the LogicBlox team on expressing and optimising machine learning problems using the company's relational platform.

In the past, she did research on the interesting topic of sentiment analysis, which resulted in the development of a related component for a commercialised platform in Greece.

She holds a BSc in Computer Science from the University of Ioannina and a MSc in Information Systems from her current University.


Jeffrey Heer (University of Washington/Trifacta)

Location: Hörsaal: MA 004, Straße des 17. Juni 136, 10623 Berlin

Agency + Automation

Much contemporary rhetoric regards the prospects and pitfalls of using artificial intelligence techniques to automate an increasing range of tasks, especially those once considered the purview of people alone. These accounts are often wildly optimistic, understating outstanding challenges while turning a blind eye to the human labor that undergirds and sustains ostensibly “automated” services. This long-standing focus on purely automated methods unnecessarily cedes a promising design space: one in which computational assistance augments and enriches, rather than replaces, people’s intellectual work. This tension between agency and automation poses vital challenges for design and engineering. In this talk we will consider the design of interactive systems that enable rich, adaptive collaboration among people and computational agents. We seek to balance the often complementary strengths and weaknesses of each, while promoting human control and skillful action. We will review case studies in three arenas—data wrangling, exploratory visualization, and natural language translation—that integrate proactive computational support into interactive systems. To improve outcomes and support learning by both people and machines, I will describe the use of shared representations of tasks augmented with predictive models of human capabilities and actions.

Jeffrey Heer is the Jerre D. Noe Endowed Professor of Computer Science & Engineering at the University of Washington, where he directs the Interactive Data Lab and conducts research on data visualization, human-computer interaction, and social computing. The visualization tools developed by Jeff and his collaborators (Vega, D3.js, Protovis, Prefuse) are used by researchers, companies, and thousands of data enthusiasts around the world. Jeff's research papers have received awards at the premier venues in Human-Computer Interaction and Visualization (ACM CHI, ACM UIST, IEEE InfoVis, IEEE VAST, EuroVis). Other honors include MIT Technology Review's TR35 (2009), a Sloan Fellowship (2012), an Allen Distinguished Investigator Award (2014), a Moore Foundation Data-Driven Discovery Investigator Award (2014), and the ACM Grace Murray Hopper Award (2016). Jeff holds B.S., M.S., and Ph.D. degrees in Computer Science from UC Berkeley, whom he then "betrayed" to join the Stanford faculty (2009–2013). He is also a co-founder of Trifacta, a provider of interactive tools for scalable data transformation.



Fabio Porto, The National Laboratory of Scientific Computing (LNCC) Rio de Janeiro, Brazil

TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin

Managing and Analysing Simulation Data

The increasing processing power of HPC systems has enabled the development of realistic simulations of phenomena in different areas, such as oil and gas, engineering, medicine, and meteorology. As simulation quality improves and HPC systems approach exaflop performance, scientists use of simulation output evolve to complex data analytics tasks. Unfortunately, data management systems have completely neglected the domain of numerical simulations leading scientists to express complex analysis using ad-hoc programs on top of proprietary file formats or libraries, such as NETCDF and HDF5. In this talk, we present the work we have being developing on data management to support numerical simulations. We will first discuss a technique to answer spatial queries about the uncertainty in simulation results. Next, we will present the SAVIME (Simulation & Visualization in-memory) system, a multidimensional array DBMS designed with the following principles: to incur in minimum data ingestion overhead;  to support complex data structures, such as meshes, data geometry and simulation metadata; to support data visualization; and  to offer users a declarative query interface and query optimization.
Fabio Porto is a Senior Researcher at the Brazilian National Laboratory of Scientific Computing (LNCC). He is the founder of the Data Extreme Lab (DEXL) and Co-director of the National Institute of Science and Technology on Data Science. He conducted doctoral studies at PUC-Rio and a doctoral research stay abroad at INRIA and went on to earn his PhD in Informatics from PUC-Rio in 2001. Between 2004-2007, he was a Postdoc at EPFL. His main research interests involve Big Data analytical algorithms; dataflow optimization and the confluence of Machine Learning and databases. He has more than 80 research papers published in international conferences and scientific journals, including PVLDB, SIGMOD, SSDBM, and ICDE. He was the General Chair of both VLDB 2018 and SBBD 2015, the Brazilian Symposium on Databases. Since 2018 he has been a member of the SBBD Steering Committee and a member of both SBC (the Brazilian Computer Society) and ACM.

Katarzyna Juraszek, TU Berlin

Title: Extended Kalman Filter for Large Scale Vessels Trajectory Tracking in Distributed Stream Processing Systems

Abstract. The growing number of vehicle data being constantly reported by a variety of remote sensors, such as Automatic Identication Systems (AIS), requires new data analytics methods that can operate at high data rates and are highly scalable. Based on a real-life data set from maritime transport, we propose a large scale vessels trajectory tracking application implemented in the distributed stream processing system Apache Flink. By implementing a state-space model (SSM) – the Extended Kalman Filter (EKF) – we firstly demonstrate that an implementation of SSMs is feasible in modern distributed data flow systems and secondly we show that we can reach a high performance by leveraging the inherent parallelization of the distributed system. In our experiments we show that the distributed tracking system is able to handle a throughput of several hundred vessels per ms. Moreover, we show that the latency to predict the position of a vessel is well below 500 ms on average, allowing for real-time applications. 

Bio. Katarzyna Juraszek has recently graduated with a B.S in Computer Science from Technische Universität Berlin. Her final thesis was focusing on developing an implementation of the Extended Kalman Filter algorithm and putting it into use within stream-processing framework Flink. After working on this subject with other co-authors from DFKI, this work was accepted to ECML PKDD conference in Würzburg, Germany this year, giving her the opportunity to present it there in a form of poster. She holds a Master's degree in Network and Information Economics as well as Business Intelligence from Maastricht University in the Netherlands. In the past 4 years she was working in the field of data analytics and data engineering at Zalando in Berlin, where she was working with Big Data problems in a more commercial setting.

Ilin Tolovski

System for cloud execution, semantic annotation, storage and querying of data mining experiments

Living in a data-driven society makes conducting different types of computational experiments a widespread practice in many organisations (e.g., industry, academia). This results in a production of computational models and experimental results at a higher volume than ever. Here arises the challenge of proper representation and storage of experimental setups, and results. Not handling this task adequately can significantly strain the time, computational and financial resources of an organisation. Having the complete workflow of a computational experiment represented by a semantic resource and stored accordingly can allow quick access to the experimental results, their verification and reproduction, as well as, reusability of its outputs (e.g., produced models).

We address this issues by creating FAIR (Findable, Accessible, Interoperable, Reusable) repositories of semantically annotated experiments in the domains of process-based modelling of dynamical systems and predictive data mining. To this end, we developed SemanticHub, a system that allows remote execution and automatic annotation of computational experiments. The system has two integrated machine learning software packages, ProBMoT and CLUS, which users can utilise to run experiments that will be semantically annotated and stored in our FAIR repositories. The annotations are stored on our servers, where the users can query completed experiments and explore their results through a dedicated UI. SemanticHub provides a structured view and open access to a repository of completed experiments and their results, allowing users to verify, and reproduce experimental results, as well as reuse the produced models. 

Ilin Tolovski was born on 17th of December, 1994 in Skopje, North Macedonia, where he finished his primary and secondary education. In July 2017, he obtained his BSc degree in Computer Technologies and Engineering at the Faculty of Electrical Engineering and Information Technologies at the Ss. Cyril and Methodius University in Skopje, North Macedonia. During the undergraduate studies, he completed internships in software development at InPay S.A., Warsaw, Poland and the Faculty of Computer Science at WH Zwickau, Germany. In September 2019, he obtained his MSc degree in Information and Communication Technologies at the Jozef Stefan International Postgraduate School in Ljubljana, Slovenia. During his stay at Jozef Stefan Institute, he worked mostly on the development of a system for remote execution, semantic annotation, storage and querying of machine learning experiments and models. He also worked on knowledge representation in domains such as, neurodegenerative diseases, process-based modelling of dynamical systems, and predictive data mining. His research is published at several conferences in the area of knowledge discovery, and intelligent systems.

Alireza Zarei, Karlsruhe Institute for Technology

Title :
A distributed marketplace of data and storage for the next-generation Internet

Abstract :
Our existing infrastructures pose several obstacles toward accessibility, usability and discovery of knowledge. This partly derives from restrictions of these infrastructures, but also outsourcing resources and functions to third parties. Internet is an infrastructure with low accessibility, weak usability and high centralization of data and storage. This is partly because of the restrictions of its architecture TCP/IP, but also the reliance of each internet user on roles of Internet Service Providers. Approaches like Information Centric Networking (ICN) support a decentralized communication architecture for the Internet, however they still rely on roles of centralized ISPs.

In order to achieve real distribution and distribute the roles of ISPs among Internet users, we need to build mechanisms which ensure trust among them. In this talk, we introduce Information Centric Incentivised Networking (ICIN), a solution based on Information Centric Networking (ICN) which incentivizes data and storage with help of smart contracts to ensure the required trust among users. We present the characteristics of data and storage and how the users exchange them with each other in a fair marketplace. We will describe how to make transactions for small data chunks with a minimum processing overload and how our approach can provide a framework for exchange of other resources and services.

Short bio :
Alireza Zarei is a research assistant at Karlsruhe Institute for Technology and a member of GHOST IoT project. He works on understanding human computer interactions in the context of Internet of Things to provide a smart and usable framework with acceptable privacy and security. He is graduated from University of Göttingen with a Master degree in Computer Science and was a member of ICN2020 project to develop future applications of information centric networking, computing and storage in the area of Internet of Things. His research interest lies in understanding smart systems and how different agents and users interact with each other. He is interested in discovering system behavior and evolution, predicting the future behavior of a system in relation with real events and dynamic enhancement of the prediction mechanism and evolutionary learning.

Flavio Clesio, MyHammer AG

Title: Job Recommendations in a Marketplace with Multiple Stakeholders

Abstract: The focus of this work is on build a framework of a recommendation and matching algorithm in the context of Online Job marketplace. Considering all characteristics of a marketplace given by Banerjee et. al. (2017), we consider also that Job Recommendations has the following characteristics: (i) explicit intention of a Job Seeker to perform matches or non-optimal matches outside their preference, (ii) for the job Poster have the best pool of candidates to perform the best decision as possible, (iii) the indivisibility of a job, i.e. it’s a zero-sum game where only a single candidate can have the job, (iv) the jobs most of the time has some expiration date where the Job Platform not only needs to bring the best match between job seeker and the job placer, but this match can occur in a timely way for all jobs, (v) job platforms do not have the full track if the job seeker was accepted by the job poster. Said that, the main objective will be to ensure that all jobs placed receives job seekers, and the latest receives the job listings in a finite preference ranked list, (vi) job platforms needs to ensure not only the best matching between the job seekers and job posters but this platform needs to consider also their own interests in terms of economics. Their business model can be based in facilitate liquidity or placing a cost in other stakeholders. 

Bio: Flavio Clesio is a Machine Learning Engineer and Data Scientist at MyHammer AG in Berlin. He obtained his master’s degree in the field of Applied Computational Intelligence in exotic Credit Derivatives as Non-Performing Loans. His current

research focuses on recommendations in Job marketplace with multiple-stakeholders, Natural Processing Language/Text Classification for German language, Computer Vision for German Handwerkskarte and Gewerbeanmeldung recognition and Security and Countermeasures in Machine Learning development. In addition, he worked in several distinct industries like Financial Markets, Revenue Assurance in Telecommunications, analysis and experimentation in user behavior in mobile platforms and Data Pipelining in real time for Food Delivery in a global platform that attended more than 42 countries. Nowadays he’s working in scalable Machine Learning production systems for projects in Job Matching and Recommendation in German market and applied Deep Learning for document recognition. Flavio has taught a number of courses at Universities in some subjects as Big Data Platforms (Cassandra, Spark and Spark MLLib), Data Warehousing and Scalable ETL systems, Multidimensional Data Warehousing modelling and also Strategic Information Management. Some of his recent industry work has been published at top industry conferences including Strata Data in Singapore, Spark Conference in Dublin, Papis.io (Real world applied ML conference), Redis Summit and several other local meetups for Google Developer Groups, Facebook Developer Circles and in the Data Council chapter in Berlin.

Base References (in order of importance for this research)

Mehrotra, Rishabh. "Recommendations in a Marketplace" in RecSys 2019 tutorials. Link

Abdollahpouri, Himan, Gediminas Adomavicius, Robin Burke, Ido Guy, Dietmar Jannach, Toshihiro Kamishima, Jan Krasnodebski, and Luiz Pizzato. "Beyond Personalization: Research Directions in Multistakeholder Recommendation." arXiv preprint arXiv:1905.01986 (2019).PDF [1]

Mehrotra, Rishabh, et al. "Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems." In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2243-2251. ACM, 2018.PDF [2]

Burke, Robin D., Himan Abdollahpouri, Bamshad Mobasher, and Trinadh Gupta. "Towards Multi-Stakeholder Utility Evaluation of Recommender Systems." In UMAP (Extended Proceedings). 2016. PDF [3]

Banerjee S, Gollapudi S, Kollias K, Munagala K (2017) Segmenting two-sided markets. Proceedings of the 26th International Conference on World Wide Web, 63{72}

Zongxiong Chen


Hysteretic neural networks, stream processing system on modern hardware, and research agenda


This presentation consists of three parts.

First, I will present my master thesis work and introduce Hysteretic neural networks (HNN).

HNNs are a new architecture of recurrent neural networks (RNN), which is designed to approximate dynamic systems based on hysteresis.

In my work, I showed that HNN models hysteretic systems better than the state-of-the-art approaches such as LSTM. Furthermore, HNNs can capture the micro-loops inside hysteretic behaviors, whereas LSTM fails.

In the second part of my presentation, I will present my work on benchmarking stream processing systems (SPSs) for modern hardware. SPSs such as Streambox and Saber achieve high performance by exploiting the parallelism and memory hierarchy of modern multicore hardware.

In my presentation, I will compare the architecture of both systems and present our evaluation results.

Finally, I will give an overview of my future research agenda.

Chen Xu, East China Normal University

Title: Exploiting Incremental Evaluation for Efficient Distributed Matrix Computation

Abstract: Distributed matrix computation is common in large-scale data processing and machine learning applications. Iterative-convergent algorithms involving matrix computation share a common property, parameters converge non-uniformly. This property can be exploited to eliminate computational redundancy. Unfortunately, existing systems with an underlying distributed matrix computation, like SystemML, do not fully do so. In this presentation, I will talk about IMAC, an incremental matrix computation prototype, which incorporates both full matrix evaluation as well as incremental evaluation, to leverage non-uniform convergence. IMAC builds and improves upon SystemML, and our experiments show that IMAC outperforms SystemML by an order of magnitude.

Bio: Chen Xu is currently an associate professor at School of Data Science and Engineering, East China Normal University (ECNU), Shanghai. From 2014 to 2018, Chen conducted postdoctoral research at Database Systems and Information Management (DIMA) Group, Technische Universität Berlin. Chen got his PhD degree in 2014 from ECNU. During his study, he served as a research intern at Data & Knowledge Engineering (DKE) Group, The University of Queensland, in 2011. His research interest is large-scale data management.


Thursday, February 13th from 16:30 – 17:30 in room EN 719

------ Links: ------

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe

Copyright TU Berlin 2008