Erietta Liarou'S TALK ON DB SEMINAR

 
Title: MonetDB/DataCell: Online Analytics in a Streaming Column-store

Abstract
Numerous applications nowadays require online analytics over high rate streaming data. For example, emerging applications over mobile data can exploit the big mobile data streams for advertising and traffic control. In addition, the recent and continuously expanding massive cloud infrastructures require continuous monitoring to remain in good state and prevent fraud attacks. Similarly, scientific databases create data at massive rates daily or even hourly. The need to handle queries that remain active for a long time (continuous queries) and quickly analyze big data that are coming in a streaming mode and combine it with existing data brings a new processing paradigm that can not be exclusively handled by the existing database or data stream technology. Database systems do not qualify for continuous query processing, while data stream systems are not built to scale for big data analysis.
 
For this new problem we need to combine the best of both worlds. In DataCell, we design streaming functionalities in a modern relational database kernel which targets big data analytics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. The major challenge then becomes the efficient support for specialized stream features, e.g., multi-query processing and incremental window-based processing as well as exploiting standard DBMS functionalities in a streaming environment such as indexing. In this presentation we will discuss the main components in the design of DataCell.
 
Furthermore, we will briefly discuss the vision of approximate and interactive query processing of big streaming data which envisions to allow database operators and algorithms to autonomously decide when to ignore input data, when to answer a different question than what the user asked for or when to reply with a question rather with an answer.
 
Bio
Erietta Liarou is a doctoral candidate in the Dutch National Research Center for Mathematics and Computer Science (CWI) and in the University of Amsterdam. She works with prof. Martin Kersten in the Database Architectures group of CWI. The main focus of her Ph.D research is to extend database column-store architectures for efficient on-line analytics. She also works in distributed query processing, semantic web and scientific databases. She has obtained her Diploma and Master's in Computer Engineering from Technical University of Crete, Greece, in 2004 and 2006 respectively. In 2010 she was a research intern with the System S group in IBM Research, Watson, USA. Erietta co-authored the paper that won the VLDB 2011 Challenges and Visions best paper award for approximate and interactive query processing over big scientific data.