Inhalt des Dokuments
Termine DIMA Kolloquium
Termin/Ort | Dozent/Thema |
---|---|
11.04.2018 11.30 Uhr EN 719 | Prof. Themis Palpanas, Senior Member of
the French University Institute (IUF) France "End-to-End Entity Resolution for Structured and Semi-Structured Data" |
06.02.2018 17.00 Uhr Smart Data Forum Salzufer 6, Eingang Otto-Dibelius-Strasse, 10587 Berlin | Prof. Rudolf Bayer,
Ph.D., TU München "C-chain: the simple, scalable, transparent blockchain" |
12.12.2017 16.00 Uhr EN 719 | Dominik Moritz, University of
Washington "Vega-Lite: A Grammar of Interactive Graphics" |
05.12.2017 14.15 Uhr Volkswagen-Universitätsbibliothek, Room BIB 014, Fasanenstr. 88, 10623 Berlin | Prof. Dr.
Francesca Bugiotti, CentraleSupelec, Paris-Saclay University &
Moditha Hewasinghage, UPC, Barcelona "Modeling Methodology for a uniform access to NoSQL systems (short intro for the students)" & "Modeling Strategies for Storing Data in Distributed Heterogeneous NoSQL databases (long pres about the details of his Master‘s thesis)" |
04.12.2017 16.00 Uhr EN 719 | Prof. Dr. Francesca Bugiotti,
CentraleSupelec, Paris-Saclay University &
Moditha Hewasinghage, UPC, Barcelona "Database Design for NoSQL Systems (long research pres) & "Modeling Strategies for Storing Data in Distributed Heterogeneous NoSQL databases (short pres, about his IT4BI Master‘s thesis)" |
27.11.2017 16.00 Uhr EN 719 | Dr. Kaiwen Zhang, TU
München "Deconstructing Blockchains: Concepts, Applications, and Systems" |
22.11.2017 16.00 Uhr EN 719 | Dr. Britta Meixner, CWI Amsterdam "Enhance, Enjoy, Engage: Improving the Video Playback Experience" |
02.11.2017 15.00 Uhr EN 719 | Prof. Guillaume Pierre, University of Rennes
1 "From data centers to fog computing: the evaporating cloud" |
Prof. Themis Palpanas, Senior Member of the French University Institute (IUF) France
Title: End-to-End Entity Resolution for Structured and Semi-Structured Data
Abstract:
Entity Resolution (ER) lies at the core of data integration, with a
bulk of research focusing on both its effectiveness and time
efficiency. Initially, most relevant works were crafted for structured
(relational) data that are described by a schema of well-known quality
and meaning. With the advent of Big Data, though, these early
schema-based approaches became inapplicable, as the scope of ER moved
to semi-structured data collections, which abound in noisy,
semi-structured, voluminous and highly heterogeneous information.
In this talk, we take a close look on the entire ER workflow (from
schema matching to entity clustering), covering both the schema-based
and schema-agnostic cases. We will highlight recent works that
significantly boost the efficiency of the overall workflow, especially
meta-blocking, which cuts down on the computational cost by discarding
comparisons that are repeated or lack sufficient evidence for
producing duplicates. We will conclude with a brief demonstration of
JedAI, our open-source reference toolbox for ER, which incorporates
most of the state of the art techniques in the area.
Short bio:
Themis Palpanas is
Senior Member of the Institut Universitaire de France (IUF), a
distinction that recognizes excellence across all academic
disciplines, and professor of computer science at the Paris Descartes
University (France), where he is director of diNo, the data management
group. He received the BS degree from the National Technical
University of Athens, Greece, and the MSc and PhD degrees from the
University of Toronto, Canada. He has previously held positions at the
University of Trento, and at IBM T.J. Watson Research Center, and
visited Microsoft Research, and the IBM Almaden Research Center.
His interests include problems related to data science (big data
analytics and machine learning applications). He is the author of nine
US patents, three of which have been implemented in world-leading
commercial data management products. He is the recipient of three Best
Paper awards, and the IBM Shared University Research (SUR) Award.
He is curently serving on the VLDB Endowment Board of Trustees, as
an Editor in Chief for the BDR Journal, Associate Editor for VLDB
2019, Associate Editor in the TKDE, and IDA journals, as well as on
the Editorial Advisory Board of the IS journal, and the Editorial
Board of the TLDKS Journal. He has served as General Chair for VLDB
2013, Associate Editor for VLDB 2017, and Workshop Chair for EDBT
2016, ADBIS 2013, and ADBIS 2014, General Chair for the PDA@IOT
International Workshop (in conjunction with VLDB 2014), and General
Chair for the Event Processing Symposium 2009.
Prof. Themis Palpanas, Senior Member of the French University
Institute (IUF) France
http://www.mi.parisdescartes.fr/~themisp/
[1]
Prof. Rudolf Bayer, Ph.D., TU München
Title: C-chain: the simple, scalable, transparent blockchain
Abstract:
Summary:
The talk questions some of the fundamental design
decisions of blockchain, namely:
1. Proof of work
2. blockchain datastructure
3. miner
4. consensus
and argues, that they must be replaced by other desing decisions
resulting in the C-chain method, which avoids all disadvantages of
blockchain, in particular:
1. C-chain is easy to
understand and to use
2. C-chain scales perfectly
3.
guarantees immediate final settlement
4. has very low
transaction costs
At the end of the talk there will be a
handson experiment of the C-chain demonstrator, bring an Android
device.
short bio:
Rudolf
Bayer ist emeritierter Professor für Informatik an der TU München.
Er studierte Mathematik in München und promovierte an der University
of Illinois 1966. Nach Aufenthalten am Boeing Research Lab in Seattle
und als Associate Professor an der Purdue University begründete er
1972 den Lehrstuhl für Datenbanksysteme an der TU München. Bekannt
ist er vor allem für die Erfindung und Weiterentwicklung des B-Baums
und des UB-Baums. Er leitete mehrere Forschungsgruppen und viele
Projekte.
Er erhielt 1999 das Bundesverdienstkreuz und 2001 den
SIGMOD Innovations Award der ACM.
Dominik Moritz, University of Washington
Title: "Vega-Lite: A Grammar of Interactive Graphics"
Abstract:
Vega-Lite is a declarative format
for rapidly creating interactive visualizations. The simplest form of
a Vega-Lite specification describes a single view–a mapping between
data values and the visual properties for a single mark type. These
single views can be composed of more complex layered and multi-view
displays, or made interactive through a novel grammar of interaction.
With Vega-Lite, a diverse range of interactive visualizations–from
brushing & linking a scatterplot matrix, to cross-filtering and
interactive index charts–can be built with only a few dozen lines of
JSON. In these concise specifications, users can omit low-level
details such as scale, axes, and legends properties as well as event
handling logic, letting the Vega-Lite compiler infer sensible
defaults. Under the hood, Vega-Lite leverages Vega’s
high-performance dataflow architecture and cross-platform renderers
for both SVG and Canvas.
Bio:
Dominik is a PhD student in Computer Science at the University of
Washington. He is advised by Bill Howe from the eScience Institute and
the Database Group and Jeffrey Heer from the Interactive Data Lab.
Before coming to the US, Dominik has completed his undergraduate
studies at Hasso-Plattner-Institute in Germany. In his research, he
combines large-scale systems for data analysis with interactive data
visualization to enable novel insights into large multi-dimensional
data.
Dominik is a co-author of various libraries
and tools in the Vega stack, including Vega-Lite, Voyager, and
Polestar. He has worked for the Open Knowledge Foundation, Google, and
Microsoft Research and has been awarded fellowships by the German
National Academic Foundation and the Fulbright Committee.
When he is not working on research or coding, Dominik
likes to travel, sail, hike in the mountains around Seattle, or bake
bread.
Prof. Dr. Francesca Bugiotti, CentraleSupelec, Paris-Saclay University & Moditha
Location:
Volkswagen-Universitätsbibliothek, Room BIB 014, Fasanenstr. 88,
10623 Berlin
Title:
Modeling
Methodology for a uniform access to NoSQL systems (short intro for the
students)
Abstract:
The absence of
a schema in NoSQL databases can disorient traditional database
specialists and can make the design activity in this context a leap of
faith.
In this context traditional notions
related to data modeling are still useful: first because data models
provide a basis to the definition of generic approaches to logical and
physical design;
second because the presence of a unified data
modelling technique is the first step to provide uniform access to
multiple NoSQL heterogeneous systems.
Bio:
Dr. Ing. Francesca Bugiotti holds a
position as assistant professor at CentraleSupélec in Paris. She
received her „Dr. Ing.“ degree in Computer Engineering from
Università „Roma Tre“ (under supervision of prof. Paolo Azteni)
in 2012, with a thesis on heterogeneity in databases. She worked as an
intern and as a post-doc at Inria Saclay studying the problem of
indexing RDF datasets in a cloud infrastructure and studying efficient
data storage mechanisms for heterogeneous data in the cloud, supported
by Inria in connection with the KIC EIT ICT Labs Europa activity on
scalable cloud-based data management.
Her research activity
focuses on heterogeneous data integration, conceptual models, NoSQL
storage systems integration, NoSQL data model characteristics and
query expressive power.
Title:
Modeling Strategies for Storing Data in Distributed Heterogeneous
NoSQL databases (long pres about the details of his Master‘s thesis,
same abstract as Dec 04)
Abstract:
Data management has become an essential functionality of modern
information systems.
With the birth of the digital
environments, the volume of data generated and available has grown up
giving start to the Big Data era. NoSQL systems has been introduced to
handle this large volume of data with providing availability,
scalability, and efficiency. There is a considerable heterogeneity
among the various NoSQL systems: different data models, different
APIs, different implementations. Moreover, data modeling for NoSQL
systems is not formalized mainly due to the flexible semi structured
nature of their models. Recent research results have shown how
modeling decisions impact the quality requirements such as scalability
and performance.
In this work we propose HerM (Heterogeneous
Distributed Model), a NoSQL data modeling approach which supports the
usage of multiple heterogeneous NoSQL systems in a distributed
environment. We define the conceptual elements necessary for data
modeling and we identify optimized data distribution patterns. We also
map HerM into a physical model that increases performances for
distributed Joins.
We implemented a flexible framework, where we
deployed our proposed modeling strategies. The framework provides a
transparent interface to access the underlying heterogeneous systems
in an efficient manner and provides the ability to easily configure
different use cases. We provide a detailed evaluation of our framework
comparing native MongoDB implementation on different scenarios for a
large dataset considering performance and stability.
Bio:
MSc Moditha Hewasinghage is a PhD student
at Universitat Politècnica de Catalunya (UPC) Barcelona and
Université libre de Bruxelles (ULB) in the IT4BI-DC program, under
the supervision of prof. Alberto Abelló and prof. Esteban Zimányi.
Modhita received his Bachelor’s degree in Computer Science from
University of Colombo, School of Computing. He worked as a Senior
Software Engineer for 99X Technology, Sri Lanka. He was a part of
IT4BI program and successfully completed the masters in
CentraleSupelec in Paris in 2017. His master thesis was “Modelling
strategies for storing data in distributed heterogeneous NoSQL
databases” under the supervision of ass. prof. Francesca Bugiotti
and prof. Nacéra Bennacer.
His research
activity involves conceptual modelling and heterogeneous data
integration.
Prof. Dr. Francesca Bugiotti, CentraleSupelec, Paris-Saclay University & Moditha Hewasinghage, UPC, Barcelona
Title:
Database
Design for NoSQL Systems (long research pres)
Abstract:
The heterogeneity of NoSQL data
models led to a little use of traditional modeling techniques, as
opposed to what has happened with databases for decades. Although
NoSQL databases are claimed to be flexible and without a static schema
the design of data organization requires important decisions, to map
data to the modeling elements (collections, documents, tables,
columns, keys, key-value pairs) available in the target datastore.
These decisions are significant, because of their impact on the above
major quality requirements.
An effective design methodology for
NoSQL systems supporting those quality requirements criticall for
next-generation Web applications can be indeed devised. The presented
approach is based on NoAM (NoSQL Abstract Model), a novel abstract
data model for NoSQL databases, which is used to specify a
system-independent representation of the application data and which
exploits the commonalities of the various NoSQL datastores.
Bio:
Dr. Ing. Francesca Bugiotti holds a
position as assistant professor at CentraleSupélec in Paris. She
received her „Dr. Ing.“ degree in Computer Engineering from
Università „Roma Tre“ (under supervision of prof. Paolo Azteni)
in 2012, with a thesis on heterogeneity in databases. She worked as an
intern and as a post-doc at Inria Saclay studying the problem of
indexing RDF datasets in a cloud infrastructure and studying efficient
data storage mechanisms for heterogeneous data in the cloud, supported
by Inria in connection with the KIC EIT ICT Labs Europa activity on
scalable cloud-based data management.
Her research activity
focuses on heterogeneous data integration, conceptual models, NoSQL
storage systems integration, NoSQL data model characteristics and
query expressive power.
Title:
Modeling Strategies
for Storing Data in Distributed Heterogeneous NoSQL databases (short
pres, about his IT4BI Master‘s thesis)
Abstract:
Data management has become an
essential functionality of modern information systems.
With the
birth of the digital environments, the volume of data generated and
available has grown up giving start to the Big Data era. NoSQL systems
has been introduced to handle this large volume of data with providing
availability, scalability, and efficiency. There is a considerable
heterogeneity among the various NoSQL systems: different data models,
different APIs, different implementations. Moreover, data modeling for
NoSQL systems is not formalized mainly due to the flexible semi
structured nature of their models. Recent research
results have shown how modeling decisions impact the quality
requirements such as scalability and performance.
In this work
we propose HerM (Heterogeneous Distributed Model), a NoSQL data
modeling approach which supports the usage of multiple heterogeneous
NoSQL systems in a distributed environment. We define the conceptual
elements necessary for data modeling and we identify optimized data
distribution patterns. We also map HerM into a physical model that
increases performances for distributed Joins.
We implemented a
flexible framework, where we deployed our proposed modeling
strategies. The framework provides a transparent interface to access
the underlying heterogeneous systems in an efficient manner and
provides the ability to easily configure different use cases. We
provide a detailed evaluation of our framework comparing native
MongoDB implementation on different scenarios for a large dataset
considering performance and stability.
Bio:
MSc Moditha Hewasinghage is a PhD student
at Universitat Politècnica de Catalunya (UPC) Barcelona and
Université libre de Bruxelles (ULB) in the IT4BI-DC program, under
the supervision of prof. Alberto Abelló and prof. Esteban Zimányi.
Modhita received his Bachelor’s degree in Computer Science from
University of Colombo, School of Computing. He worked as a Senior
Software Engineer for 99X Technology, Sri Lanka. He was a part of
IT4BI program and successfully completed the masters in
CentraleSupelec in Paris in 2017. His master thesis was “Modelling
strategies for storing data in distributed heterogeneous NoSQL
databases” under the supervision of ass. prof. Francesca Bugiotti
and prof. Nacéra Bennacer.
His
research activity involves conceptual modelling and heterogeneous data
integration.
Dr. Kaiwen Zhang, TU München
Title:
Deconstructing Blockchains: Concepts, Applications,
and Systems
Abstract:
Popularly known for
powering cryptocurrencies such as Bitcoin and Ethereum, blockchains is
seen as a disruptive technology capable of impacting a wide variety of
domains, ranging from finance to governance, by offering superior
security, reliability, and transparency in a decentralized manner. In
this tutorial presentation, we first study the original Bitcoin design
from an academic perspective. We then take a comprehensive look at all
aspects related to blockchains by deconstructing the system into 6
layers: Application, Modeling, Contract, System, Data, and Network. We
will review potential applications which can benefit from blockchains,
and describe the associated research challenges. Finally, we will
conclude with a report on ongoing research, providing a decentralized
messaging service using blockchains.
Dr. Britta Meixner, CWI Amsterdam
Title:
Enhance, Enjoy,
Engage: Improving the Video Playback Experience
Abstract:
Current web technologies make it
simpler than ever to both stream videos
and create complex
constructs of interlinked videos with additional
information or
parallel presentations of contents. We show typical use
cases
for these types of videos. While a highly enjoyable presentation
is offered, additional data may lead to excessive waiting times
interrupting the playback. In this presentation, we show solutions
for
both, traditional linear videos and hypervideos, which
reduce startup
delays, stalling events, and quality switches
during playback. We show
how using the HTML5 <video> tag,
Media Source Extensions (MSE), or DASH
can be used and improved
to accomplish this goal and satisfy the user's
expectations.
Kurzbio:
I am a Researcher at Centrum
Wiskunde & Informatica (CWI) in Amsterdam.
I received my
Ph.D. (German Dr. rer. nat) degree (magna cum laude) from the
University of Passau, Germany, in 2014. The title of my thesis is
“Annotated Interactive Non-linear Video - Software Suite, Download
and Cache Management.” I am an award winner of the 2015 Award
“Women + Media Technology,” granted by Germany’s public
broadcasters ARD and ZDF (ARD/ZDF Förderpreis “Frauen +
Medientechnologie” 2015). My Ph.D. also was presented a Honorable
Mention for the SIGMM Outstanding Ph.D. Thesis Award in 2015. The
paper “Download and Cache Management for HTML5 Hypervideo Players”
was awarded with the Hypertext Ted Nelson Newcomer Award in 2016. My
research interests are hypermedia and video streaming.
I am a
reviewer for Springer Multimedia Tools and Applications (MTAP)
Journal, the ACM TOMM Journal, and other journals. I am/was an
Associate Chair at ACM TVX (2015-2017), an Area Chair at ACM
Multimedia 2017, a member of the organization committee of TVX
2017-2019 and MMSys 2018, and I served as a PC member for several
other conferences. From 2014 to
2016 I was a co-organizer of the
“International Workshop on Interactive Content Consumption
(WSICC)” at ACM TVX.
Prof. Guillaume Pierre, University of Rennes 1
Title:
"From data centers to fog computing: the evaporating
cloud"
Abstact:
Cloud computing data
centers are composed of very powerful computing nodes connected by
reliable backbone networks. However, these resources are concentrated
in a small number of a data centers. The latency between an end user
and the closest available cloud data center comes in the range of
20-150 ms. A number of latency-sensitive applications (e.g., augmented
reality) require extremely low end-to-end latencies and therefore
cannot make use of traditional cloud platforms. Fog computing
therefore aims to complement traditional cloud infrastructures with
additional resources located extremely close to the user, within a
couple of network hops. This requires one to distribute machines in a
very large number of geographical locations so computation capacity is
always available in immediate proximity of any end user. In this
presentation I will discuss the application scenarios where fog
computing is or isn‘t useful, and the architectural challenges one
needs to face when designing the next-generation fog computing
architectures.