Page Content
Talks DIMA Research Seminar
Talk/Location | Lecturer/Subject |
---|---|
17.09.2019 11 am MA 004 | BBDC Talk: "Agency +
Automation", Jeffrey Heer (University of Washington/Trifacta),
|
14.10.2019 10 pm DFKI Projektbuero Berlin, 4th Floor, Room: Weizenbaum, Alt-Moabit 91c, 10559 Berlin | Nantia Makrynioti, Athens University of Economics and
Business "Declarative specification and automatic compilation of machine learning algorithms" |
18.11.2019 4 pm EN 719 | Fabio Porto, The National Laboratory of
Scientific Computing (LNCC) Rio de Janeiro, Brazil "Managing and Analysing Simulation Data" |
25.11.2019 4 pm EN 719 | Katarzyna Juraszek, TU Berlin "Extended Kalman Filter for Large Scale Vessels Trajectory Tracking in Distributed Stream Processing Systems" |
04.12.2019 4 pm EN 719 | Ilin Tolovski, "System for cloud execution, semantic annotation, storage and querying of data mining experiments" |
16.12.2019 4 pm EN 719 | Alireza Zarei, Karlsruhe Institute for Technology "A distributed marketplace of data and storage for the next-generation Internet" |
20.01.2019 4 pm EN 719 | Flavio Clesio, MyHammer AG "Job Recommendations in a Marketplace with Multiple Stakeholders" |
03.02.2020 4 pm EN 719 | Zongxiong Chen "Hysteretic neural networks, stream processing system on modern hardware, and research agenda" |
11.02.2020 12 am EN 719 | Chen Xu, East China Normal University "Exploiting Incremental Evaluation for Efficient Distributed Matrix Computation" |
Nantia Makrynioti, Athens University of Economics and Business
Title: Declarative specification
and automatic compilation of machine learning algorithms
Abstract:
Declarative programming is usually
summarised in the phrase "describing what needs to be done,
instead of telling the program how to do it". As the adoption of
data science grows rapidly, a need has emerged for democratising data
analysis tasks by making their development more approachable and less
tedious through high-level languages. Inspired by the success of the
declarative paradigm in relational database systems, researchers have
recently started exploring whether the use of declarative languages in
the machine learning (ML) domain can provide a productivity leap for
developers.
In this talk I will give a brief overview of
efforts in the area of declarative data analytics and machine learning
and describe the design of sql4ml, a system that aims at
democratising ML tasks for database users. It allows the user to
express ML models in SQL following the "model + solver"
approach, where there is a description of the objective function
(a.k.a. loss or cost function) of an ML model and a solver that
provides the optimal solution for it. Sql4ml translates the
SQL code defining the model to an appropriate representation for
training inside an ML framework. After training, the computed solution
is stored back to the database, which allows for more robust model
management and generation of future predictions inside the
database.
Short Bio:
Nantia Makrynioti is a PhD student in Computer Science at the
Athens University of Economics and Business supervised by Professor
Vasilis Vassalos. Her research focuses on integrating machine learning
functionality with relational databases, which aligns with her
interests in declarative machine learning (a paradigm well-known from
databases applied on the area of machine learning). In the context of
this effort, she has also worked with the LogicBlox team on expressing
and optimising machine learning problems using the company's
relational platform.
In the past, she did research on the
interesting topic of sentiment analysis, which resulted in the
development of a related component for a commercialised platform in
Greece.
She holds a BSc in Computer Science from the
University of Ioannina and a MSc in Information Systems from her
current University.
Jeffrey Heer (University of Washington/Trifacta)
Location: Hörsaal: MA 004, Straße des 17. Juni 136, 10623 Berlin
Title:
Agency + Automation
Abstract:
Much contemporary rhetoric regards
the prospects and pitfalls of using artificial intelligence techniques
to automate an increasing range of tasks, especially those once
considered the purview of people alone. These accounts are often
wildly optimistic, understating outstanding challenges while turning a
blind eye to the human labor that undergirds and sustains ostensibly
“automated” services. This long-standing focus on purely automated
methods unnecessarily cedes a promising design space: one in which
computational assistance augments and enriches, rather than replaces,
people’s intellectual work. This tension between agency and
automation poses vital challenges for design and engineering. In this
talk we will consider the design of interactive systems that enable
rich, adaptive collaboration among people and computational agents. We
seek to balance the often complementary strengths and weaknesses of
each, while promoting human control and skillful action. We will
review case studies in three arenas—data wrangling, exploratory
visualization, and natural language translation—that integrate
proactive computational support into interactive systems. To improve
outcomes and support learning by both people and machines, I will
describe the use of shared representations of tasks augmented with
predictive models of human capabilities and actions.
Bio:
Jeffrey Heer is the Jerre D. Noe Endowed
Professor of Computer Science & Engineering at the University of
Washington, where he directs the Interactive Data Lab and conducts
research on data visualization, human-computer interaction, and social
computing. The visualization tools developed by Jeff and his
collaborators (Vega, D3.js, Protovis, Prefuse) are used by
researchers, companies, and thousands of data enthusiasts around the
world. Jeff's research papers have received awards at the premier
venues in Human-Computer Interaction and Visualization (ACM CHI, ACM
UIST, IEEE InfoVis, IEEE VAST, EuroVis). Other honors include MIT
Technology Review's TR35 (2009), a Sloan Fellowship (2012), an Allen
Distinguished Investigator Award (2014), a Moore Foundation
Data-Driven Discovery Investigator Award (2014), and the ACM Grace
Murray Hopper Award (2016). Jeff holds B.S., M.S., and Ph.D. degrees
in Computer Science from UC Berkeley, whom he then
"betrayed" to join the Stanford faculty (2009–2013). He is
also a co-founder of Trifacta, a provider of interactive tools for
scalable data transformation.
Fabio Porto, The National Laboratory of Scientific Computing (LNCC) Rio de Janeiro, Brazil
Location:
TU Berlin, EN
building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587
Berlin
Title:
Managing and
Analysing Simulation Data
Abstract:
The increasing processing power of HPC systems has enabled the
development of realistic simulations of phenomena in different areas,
such as oil and gas, engineering, medicine, and meteorology. As
simulation quality improves and HPC systems approach exaflop
performance, scientists use of simulation output evolve to complex
data analytics tasks. Unfortunately, data management systems have
completely neglected the domain of numerical simulations leading
scientists to express complex analysis using ad-hoc programs on top of
proprietary file formats or libraries, such as NETCDF and HDF5. In
this talk, we present the work we have being developing on data
management to support numerical simulations. We will first discuss a
technique to answer spatial queries about the uncertainty in
simulation results. Next, we will present the SAVIME (Simulation &
Visualization in-memory) system, a multidimensional array DBMS
designed with the following principles: to incur in minimum data
ingestion overhead; to support complex data structures, such as
meshes, data geometry and simulation metadata; to support data
visualization; and to offer users a declarative query interface
and query optimization.
Bio:
Fabio Porto is a Senior Researcher at the Brazilian National
Laboratory of Scientific Computing (LNCC). He is the founder of the
Data Extreme Lab (DEXL) and Co-director of the National Institute of
Science and Technology on Data Science. He conducted doctoral studies
at PUC-Rio and a doctoral research stay abroad at INRIA and went on to
earn his PhD in Informatics from PUC-Rio in 2001. Between 2004-2007,
he was a Postdoc at EPFL. His main research interests involve Big Data
analytical algorithms; dataflow optimization and the confluence of
Machine Learning and databases. He has more than 80 research papers
published in international conferences and scientific journals,
including PVLDB, SIGMOD, SSDBM, and ICDE. He was the General Chair of
both VLDB 2018 and SBBD 2015, the Brazilian Symposium on Databases.
Since 2018 he has been a member of the SBBD Steering Committee and a
member of both SBC (the Brazilian Computer Society) and ACM.
Katarzyna Juraszek, TU Berlin
Title: Extended Kalman Filter for Large Scale Vessels Trajectory Tracking in Distributed Stream Processing Systems
Abstract. The growing number of vehicle data being constantly reported by a variety of remote sensors, such as Automatic Identication Systems (AIS), requires new data analytics methods that can operate at high data rates and are highly scalable. Based on a real-life data set from maritime transport, we propose a large scale vessels trajectory tracking application implemented in the distributed stream processing system Apache Flink. By implementing a state-space model (SSM) – the Extended Kalman Filter (EKF) – we firstly demonstrate that an implementation of SSMs is feasible in modern distributed data flow systems and secondly we show that we can reach a high performance by leveraging the inherent parallelization of the distributed system. In our experiments we show that the distributed tracking system is able to handle a throughput of several hundred vessels per ms. Moreover, we show that the latency to predict the position of a vessel is well below 500 ms on average, allowing for real-time applications.
Bio. Katarzyna Juraszek has recently graduated with a B.S in Computer Science from Technische Universität Berlin. Her final thesis was focusing on developing an implementation of the Extended Kalman Filter algorithm and putting it into use within stream-processing framework Flink. After working on this subject with other co-authors from DFKI, this work was accepted to ECML PKDD conference in Würzburg, Germany this year, giving her the opportunity to present it there in a form of poster. She holds a Master's degree in Network and Information Economics as well as Business Intelligence from Maastricht University in the Netherlands. In the past 4 years she was working in the field of data analytics and data engineering at Zalando in Berlin, where she was working with Big Data problems in a more commercial setting.
Ilin Tolovski
Title:
System for cloud execution, semantic annotation, storage and
querying of data mining experiments
Abstract:
Living in a data-driven society
makes conducting different types of computational experiments a
widespread practice in many organisations (e.g., industry, academia).
This results in a production of computational models and experimental
results at a higher volume than ever. Here arises the challenge of
proper representation and storage of experimental setups, and results.
Not handling this task adequately can significantly strain the time,
computational and financial resources of an organisation. Having the
complete workflow of a computational experiment represented by a
semantic resource and stored accordingly can allow quick access to the
experimental results, their verification and reproduction, as well as,
reusability of its outputs (e.g., produced models).
We address this issues by creating FAIR (Findable, Accessible, Interoperable, Reusable) repositories of semantically annotated experiments in the domains of process-based modelling of dynamical systems and predictive data mining. To this end, we developed SemanticHub, a system that allows remote execution and automatic annotation of computational experiments. The system has two integrated machine learning software packages, ProBMoT and CLUS, which users can utilise to run experiments that will be semantically annotated and stored in our FAIR repositories. The annotations are stored on our servers, where the users can query completed experiments and explore their results through a dedicated UI. SemanticHub provides a structured view and open access to a repository of completed experiments and their results, allowing users to verify, and reproduce experimental results, as well as reuse the produced models.
Bio:
Ilin Tolovski was born on 17th of
December, 1994 in Skopje, North Macedonia, where he finished his
primary and secondary education. In July 2017, he obtained his BSc
degree in Computer Technologies and Engineering at the Faculty of
Electrical Engineering and Information Technologies at the Ss. Cyril
and Methodius University in Skopje, North Macedonia. During the
undergraduate studies, he completed internships in software
development at InPay S.A., Warsaw, Poland and the Faculty of Computer
Science at WH Zwickau, Germany. In September 2019, he obtained his MSc
degree in Information and Communication Technologies at the Jozef
Stefan International Postgraduate School in Ljubljana, Slovenia.
During his stay at Jozef Stefan Institute, he worked mostly on the
development of a system for remote execution, semantic annotation,
storage and querying of machine learning experiments and models. He
also worked on knowledge representation in domains such as,
neurodegenerative diseases, process-based modelling of dynamical
systems, and predictive data mining. His research is published at
several conferences in the area of knowledge discovery, and
intelligent systems.
Alireza Zarei, Karlsruhe Institute for Technology
Title :
A distributed
marketplace of data and storage for the next-generation Internet
Abstract :
Our existing infrastructures pose
several obstacles toward accessibility, usability and discovery of
knowledge. This partly derives from restrictions of these
infrastructures, but also outsourcing resources and functions to third
parties. Internet is an infrastructure with low accessibility, weak
usability and high centralization of data and storage. This is partly
because of the restrictions of its architecture TCP/IP, but also the
reliance of each internet user on roles of Internet Service Providers.
Approaches like Information Centric Networking (ICN) support a
decentralized communication architecture for the Internet, however
they still rely on roles of centralized ISPs.
In order to achieve real distribution and distribute the roles of ISPs among Internet users, we need to build mechanisms which ensure trust among them. In this talk, we introduce Information Centric Incentivised Networking (ICIN), a solution based on Information Centric Networking (ICN) which incentivizes data and storage with help of smart contracts to ensure the required trust among users. We present the characteristics of data and storage and how the users exchange them with each other in a fair marketplace. We will describe how to make transactions for small data chunks with a minimum processing overload and how our approach can provide a framework for exchange of other resources and services.
Short bio :
Alireza Zarei is a research
assistant at Karlsruhe Institute for Technology and a member of GHOST
IoT project. He works on understanding human computer interactions in
the context of Internet of Things to provide a smart and usable
framework with acceptable privacy and security. He is graduated from
University of Göttingen with a Master degree in Computer Science and
was a member of ICN2020 project to develop future applications of
information centric networking, computing and storage in the area of
Internet of Things. His research interest lies in understanding smart
systems and how different agents and users interact with each other.
He is interested in discovering system behavior and evolution,
predicting the future behavior of a system in relation with real
events and dynamic enhancement of the prediction mechanism and
evolutionary learning.
Flavio Clesio, MyHammer AG
Title: Job Recommendations in a Marketplace with Multiple Stakeholders
Abstract: The focus of this work is on build a framework of a recommendation and matching algorithm in the context of Online Job marketplace. Considering all characteristics of a marketplace given by Banerjee et. al. (2017), we consider also that Job Recommendations has the following characteristics: (i) explicit intention of a Job Seeker to perform matches or non-optimal matches outside their preference, (ii) for the job Poster have the best pool of candidates to perform the best decision as possible, (iii) the indivisibility of a job, i.e. it’s a zero-sum game where only a single candidate can have the job, (iv) the jobs most of the time has some expiration date where the Job Platform not only needs to bring the best match between job seeker and the job placer, but this match can occur in a timely way for all jobs, (v) job platforms do not have the full track if the job seeker was accepted by the job poster. Said that, the main objective will be to ensure that all jobs placed receives job seekers, and the latest receives the job listings in a finite preference ranked list, (vi) job platforms needs to ensure not only the best matching between the job seekers and job posters but this platform needs to consider also their own interests in terms of economics. Their business model can be based in facilitate liquidity or placing a cost in other stakeholders.
Bio: Flavio Clesio is a Machine Learning Engineer and Data Scientist at MyHammer AG in Berlin. He obtained his master’s degree in the field of Applied Computational Intelligence in exotic Credit Derivatives as Non-Performing Loans. His current
research focuses on recommendations in Job marketplace with multiple-stakeholders, Natural Processing Language/Text Classification for German language, Computer Vision for German Handwerkskarte and Gewerbeanmeldung recognition and Security and Countermeasures in Machine Learning development. In addition, he worked in several distinct industries like Financial Markets, Revenue Assurance in Telecommunications, analysis and experimentation in user behavior in mobile platforms and Data Pipelining in real time for Food Delivery in a global platform that attended more than 42 countries. Nowadays he’s working in scalable Machine Learning production systems for projects in Job Matching and Recommendation in German market and applied Deep Learning for document recognition. Flavio has taught a number of courses at Universities in some subjects as Big Data Platforms (Cassandra, Spark and Spark MLLib), Data Warehousing and Scalable ETL systems, Multidimensional Data Warehousing modelling and also Strategic Information Management. Some of his recent industry work has been published at top industry conferences including Strata Data in Singapore, Spark Conference in Dublin, Papis.io (Real world applied ML conference), Redis Summit and several other local meetups for Google Developer Groups, Facebook Developer Circles and in the Data Council chapter in Berlin.
Base References (in order of importance for this research)
Mehrotra, Rishabh. "Recommendations in a Marketplace" in RecSys 2019 tutorials. Link
Abdollahpouri, Himan, Gediminas Adomavicius, Robin Burke, Ido Guy, Dietmar Jannach, Toshihiro Kamishima, Jan Krasnodebski, and Luiz Pizzato. "Beyond Personalization: Research Directions in Multistakeholder Recommendation." arXiv preprint arXiv:1905.01986 (2019).PDF [1]
Mehrotra, Rishabh, et al. "Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems." In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2243-2251. ACM, 2018.PDF [2]
Burke, Robin D., Himan Abdollahpouri, Bamshad Mobasher, and Trinadh Gupta. "Towards Multi-Stakeholder Utility Evaluation of Recommender Systems." In UMAP (Extended Proceedings). 2016. PDF [3]
Banerjee S, Gollapudi S, Kollias K, Munagala K (2017) Segmenting two-sided markets. Proceedings of the 26th International Conference on World Wide Web, 63{72}
Zongxiong Chen
Title:
Hysteretic neural networks, stream processing system on modern hardware, and research agenda
Abstract:
This presentation consists of three parts.
First, I will present my master thesis work and introduce Hysteretic neural networks (HNN).
HNNs are a new architecture of recurrent neural networks (RNN), which is designed to approximate dynamic systems based on hysteresis.
In my work, I showed that HNN models hysteretic systems better than the state-of-the-art approaches such as LSTM. Furthermore, HNNs can capture the micro-loops inside hysteretic behaviors, whereas LSTM fails.
In the second part of my presentation, I will present my work on benchmarking stream processing systems (SPSs) for modern hardware. SPSs such as Streambox and Saber achieve high performance by exploiting the parallelism and memory hierarchy of modern multicore hardware.
In my presentation, I will compare the architecture of both systems and present our evaluation results.
Finally, I will give an overview of my future research agenda.
Chen Xu, East China Normal University
Title: Exploiting Incremental Evaluation for Efficient Distributed Matrix Computation
Abstract: Distributed matrix computation is common in large-scale data processing and machine learning applications. Iterative-convergent algorithms involving matrix computation share a common property, parameters converge non-uniformly. This property can be exploited to eliminate computational redundancy. Unfortunately, existing systems with an underlying distributed matrix computation, like SystemML, do not fully do so. In this presentation, I will talk about IMAC, an incremental matrix computation prototype, which incorporates both full matrix evaluation as well as incremental evaluation, to leverage non-uniform convergence. IMAC builds and improves upon SystemML, and our experiments show that IMAC outperforms SystemML by an order of magnitude.
Bio: Chen Xu is currently an associate professor at School of Data Science and Engineering, East China Normal University (ECNU), Shanghai. From 2014 to 2018, Chen conducted postdoctoral research at Database Systems and Information Management (DIMA) Group, Technische Universität Berlin. Chen got his PhD degree in 2014 from ECNU. During his study, he served as a research intern at Data & Knowledge Engineering (DKE) Group, The University of Queensland, in 2011. His research interest is large-scale data management.
Thursday, February 13th from 16:30 – 17:30 in room EN 719
g%2Fpdf%2F1905.01986.pdf&sa=D&sntz=1&usg=AF
QjCNGpTnuRWx3p_3GeAA5_Z17eMVnAtg
rotra.com%2Fpapers%2FCIKM2018-marketplace-mehrotra.pdf&
amp;sa=D&sntz=1&usg=AFQjCNFkkD8db0iCffmsnS5WPY9
DYh_SMQ
anticscholar.org%2Fe0b6%2F50b586095f9dd88108563cfb5b09a
4fa2237.pdf&sa=D&sntz=1&usg=AFQjCNGKvjxKUbA
uBaj0cppkxuwOE403Sw