26th British National Conference on Databases

BNCOD 2009

7th-9th July 2009
University of Birmingham

Dataspace: The Final Frontier

Tutorial 1: The iMeMex Dataspace Management System: Architecture, Concepts, and Lessons Learned

Speaker: Jens Dittrich (Saarland University, Germany)

Abstract

The iMeMex Project was one of the first systems trying to build a so-called dataspace management system. This tutorial presents the core concepts of iMeMex. We discuss system design concepts, dataspace modelling, dataspace indexing, dataspace query processing, and pay-as-you-go information integration. We will present some important lessons learned from this project and also discuss ongoing and open research challenges.

Biographical information

Jens Dittrich is an Associate Professor of Computer Science at Saarland University (Germany). He received his Diploma and PhD from the University of Marburg (Germany). He held positions at SAP AG (Germany) and ETH Zurich (Switzerland). His research interests are in the area of information systems and databases in particular new system architectures for information management, indexing, data warehousing, and main memory databases.

Web-site: http://infosys.cs.uni-sb.de


Tutorial 2: Conditional Dependencies: A Principled Approach to Improving Data Quality

Speakers: Wenfei Fan, Floris Geerts, Xibei Jia (University of Edinburgh, UK)

Abstract

Real life data is often dirty: inconsistent, inaccurate, incomplete and/or stale. Dirty data is estimated to cost US industry billions of dollars a year. There is no reason to believe that the scale of the problem is any different in the UK. This talk presents a recent approach for detecting and repairing errors in real-life data. It is based on conditional dependencies, an extension of database dependencies by enforcing bindings of semantically related data values. In contrast to traditional database dependencies that were developed for improving the quality of schema, conditional dependencies yield a theory for improving the quality of the data. Based on the theory, practical techniques have been developed for cleaning dirty data, which effectively reduce human efforts and improve data quality.

Biographical information

Wenfei Fan is the Professor of Web Data Management in the School of Informatics, University of Edinburgh, and a Research Scientist at Bell Laboratories, Alcatel-Lucent. He received his PhD from the University of Pennsylvania, and his MS and BS from Peking University. He is a recipient of the Roger Needham Award in 2008, the Chang Jiang Scholar Award in 2007, the Outstanding Overseas Young Scholar Award in 2003, the Career Award in 2001, the ICDE Best Paper Award in 2007, and the Best Paper of the Year Award from Computer Networks in 2002. His current research interests include data quality, data integration, integrity constraints, distributed query processing, Web services and XML.

Floris Geerts is a Research Fellow in the School of Informatics, University of Edinburgh. He received his PhD from Hasselt University and was a postdoctoral researcher at the University of Helsinki. He is the recipient of a postdoctoral fellowship of the Fund of Scientific Research Flanders and received Best Paper Awards at ICDM 2001 and ICDE 2007. His current research interests include data quality, data provenance, and Web services.

Xibei Jia is a Research Fellow in the School of Informatics, University of Edinburgh. He received his PhD from University of Edinburgh, and his MS from Peking University. Prior to his PhD, he worked for SUN Microsystems. He is a recipient of the Royal Society of Edinburgh Enterprise Fellowship in 2008 and the ICDE Best Paper Award in 2007. His current research interests include data quality, data integration and XML security.