DANIEL DEUTCH'S TALK ON DB SEMINAR

 

Title: Learning XML Generators
 
Abstract
We will consider the problem of, given a corpus of XML document and its schema, finding a probabilistic generative model that maximizes the likelihood of observing the corpus. As we will demonstrate, practical solutions to this problem are applicable to testing, explanation of the corpus, and even auto-completion of XML documents.
 
We will consider the problem first in the absence of constraints and then with integrity constraints such as key, inclusion, and domain constraints. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider generators that perform, while generating documents, tests of schema satisfiability; these tests prevent generation of a document violating the constraints but, as we will see, they are computationally expensive. We also study a class of generators that may generate an invalid document and, when this is the case, restarts and tries again. Then, we consider the injection of data values into the structure, to obtain generating these values. Joint work with Serge Abiteboul, Yael Amsterdamer, Tova Milo and Pierre Senellart
 
Bio
Daniel Deutch is an Assistant Professor in the Computer Science Department of Ben Gurion University. He has received his PhD degree in Computer Science from Tel Aviv University in 2010 and was a Postdoc at the University of Pennsylvania (UPenn), and at the INRIA research institute. His research interests include, among other areas, web data management, data provenance, and inference in database systems. During his PhD studies Daniel has received a number of awards for his research, including the Israeli Ministry of Science Eshkol grant and ICDT best student paper award. Daniel has been a member of  the program committee of various international conferences and workshops (including WWW, ICDT, PODS).  Daniel's research is funded by grants from the US-Israel Binational Science Foundation and the Israeli Ministry of Science.