Abstract: Researchers in databases, AI, and machine learning, have all proposed representations of probability distributions over relational databases (possible worlds). In a tuple-independent probabilistic database, the possible worlds all have distinct probabilities, because the tuple probabilities are distinct. In AI and machine learning, however, one typically learns highly symmetric distributions, where large numbers of symmetric databases get assigned identical probability. This symmetry helps with generalizing from data. In this talk I discuss what happens to standard database notions of data and combined complexity when considering AI-style symmetric probabilistic databases. The question proves to be a fertile ground for database theory, with interesting connections to counting complexity and 0-1 laws.
Abstract: Data mining arose as a merge of several areas such as databases, statistics and artificial intelligence, and has been growing steadily in the last 20 years. Recently, the popularization of the concepts of "data science" and "big data" accelerated the process. In this seminar we try to answer the question whether data mining is cause or consequence of these recent developments through an integrated view of four key components of data mining research and development, nominally models, algorithms, systems and applications, and how they are employed in scenarios such as internet and web. We will also discuss some trends related to knowledge and information discovery from massive data.
Abstract: Workflows centered around data have become pervasive in a wide variety of applications, including health-care management, e-commerce, business processes, scientific workflows, and e-government. Such workflows are often very complex and involve numerous interacting actors. They are prone to costly bugs, whence the need for static analysis in order to verify critical properties. Analysis tools are also needed to facilitate the integration, interoperation and evolution of workflows, and to provide runtime assistance to participating actors. This talk will present an overview of recent research carried out with collaborators at UC San Diego and INRIA on the analysis of data-centric workflows, an area of growing interest in both academia and industry.