26th British National Conference on Databases

BNCOD 2009

7th-9th July 2009
University of Birmingham

Dataspace: The Final Frontier

Accepted Papers

The lists are ordered alphabetically by first author surname.

Full Papers

  • Compacting XML Structures Using a Dynamic Labeling Scheme
    Ramez Alkhatib and Marc H. Scholl.

    Due to the growing popularity of XML as a data exchange and storage format, the need to develop efficient techniques for storing and querying XML documents has emerged. A common approach to achieve this is to use labeling techniques. However, their main problem is that they either do not support updating XML data dynamically or impose huge storage requirements. On the other hand, with the verbosity and redundancy problem of XML, which can lead to increased cost for processing XML documents, compaction of XML documents has become an increasingly important research issue. In this paper, we propose an approach called CXDLS combining the strengths of both labeling and compaction techniques. Our approach exploits repetitive consecutive subtrees and tags for compacting the structure of XML documents by taking advantage of the ORDPATH labeling scheme. In addition it stores the compacted structure and the data values separately. Using our proposed approach, it is possible to support efficient query and update processing on compacted XML documents and to reduce storage space dramatically. Results of a comprehensive performance study are provided to show the advantages of CXDLS.

  • A Data Privacy Taxonomy
    Ken Barker, Maryam Majedi, Kambiz Ghazinour, Mina Askari, Mishtu Banerjee, Brenan Mackas, Sampson Pun and Adepele Williams.

    Privacy has become increasingly important to the database community which is reflected by a noteworthy increase in research papers appearing in the literature. While researchers often assume that their definition of “privacy” is universally held by all readers, this is rarely the case; so many papers addressing key challenges in this domain have actually produced results that do not consider the same problem, even when using similar vocabularies. This paper provides an explicit definition of data privacy suitable for ongoing work in data repositories such as a DBMS or for data mining. The work contributes by briefly providing the larger context for the way privacy is defined legally and legislatively but primarily provides a taxonomy capable of thinking of data privacy technologically. We then demonstrate the taxonomy’s utility by illustrating how this perspective makes it possible to understand the important contribution made by researchers to the issue of privacy. The unavoidable conclusion of this paper is that privacy is indeed multifaceted so no single current research effort adequately addresses the true breadth of the issues necessary to fully understand the scope of this important issue.

  • The Use of the Binary-Relational Model in Industry, a Practical Approach
    Victor Gonzalez-Castro, Lachlan MacKinnon and Maria del Pilar Angeles.

    In recent years there has been a growing interest in the research community in the utilisation of alternative data models that abandon the relational record storage and manipulation structure. The authors have already reported experimental considerations of the behaviour of n-ary Relational, Binary-Relational, Associative and Transrelational models within the context of Data Warehousing [1], [2], [3] to address issues of storage efficiency and combinatorial explosion through data repetition. In this paper we present the results obtained during the industrial usage of Binary-Relational model based DBMS within a reference architectural configuration. These industrial results are similar to the ones obtained during the experimental stage of this research at the University laboratory [4] where improvements on query speed, data load and considerable reductions on disk space are achieved. These industrial tests considered a wide set of industries: Manufacturing, Government, Retail, Telecommunications and Finance.

  • An Alternative Data Warehouse Reference Architectural Configuration
    Victor Gonzalez-Castro, Lachlan Mhor MacKinnon and Maria del Pilar Angeles.

    In the last few years the amount of data that is keep online on computing systems is growing at accelerated rate. These data is frequently managed within data warehouses but the current data warehouse architectures based on n-ary-Relational DBMSs are becoming to their limits in order to efficiently manage such large amounts of data. Some DBMS are able to load huge amounts of data but when querying it, the response times become unacceptable for the business users. In this paper we describe an alternative data warehouse reference architectural configuration (ADW) which addresses many of the problems that organisations are facing. ADW includes the use of a Binary-Relational DBMS, and by doing this the data density is increased, it reduces data sparsity, and dramatically improves query response time and also improvements in data loading and backup and restore tasks are achieved.

  • A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces
    Zhongtian He, Jun Hong and David Bell.

    Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed. These approaches make use of different types of information about schemas, including structures, linguistic features and data types etc, to measure different types of similarity between the attributes of two schemas. They then combine different types of similarity and use combined similarity to select a collection of attribute correspondences for every source attribute. Thresholds are usually used for filtering out likely incorrect attribute correspondences, which have to be set manually and are matcher and domain dependent. A selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes. In this paper, we propose a new prioritized collective selection strategy that has two distinct characteristics. First, this strategy clusters a set of attribute correspondences into a number of clusters and collectively selects attribute correspondences from each of these clusters in a prioritized order. Second, it introduces use of a null correspondence for each source attribute, which represents the option that the source attribute has no attribute correspondence. By considering this option, our strategy does not need a threshold to filter out likely incorrect attribute correspondences. Our experimental results show that our approach is highly effective.

  • Dimensions of Dataspaces
    Cornelia Hedeler, Khalid Belhajjame, Alvaro A.A. Fernandes, Suzanne Embury and Norman W. Paton.

    The vision of dataspaces has been articulated as providing various of the benefits of classical data integration, but with reduced up-front costs, combined with opportunities for incremental refinement, enabling a “pay as you go” approach. However, results that seek to realise the vision exhibit considerable variety in their contexts, priorities and techniques, to the extent that the definitional characteristics of dataspaces are not necessarily becoming clearer over time. With a view to clarifying the key concepts in the area, encouraging the use of consistent terminology, and enabling systematic comparison of proposals, this paper defines a collection of dimensions that capture both the the components that a dataspace management system may contain and the lifecycle it may support, and uses these dimensions to characterise representative proposals.

  • An XML-Based Model for Supporting Context-Aware Query and Cache Management
    Essam Mansour and Hagen Hoepfner.

    Database systems (DBSs) can play an essential role in facilitating the query and cache management in context-aware mobile information systems (CAMIS). Two of the fundamental aspects of such management are update notifications and context-aware query processing. Unfortunately, DBSs does not provide a built-in update notification function and are not aware of the context of their usage. This paper proposes an XML model called XReAl (XML-based Relational Algebra) that assists DBSs in extending their capabilities to support context-aware queries and cache management for mobile environments.

  • Metamodel-Based Optimisation of XPath Queries
    Gerard Marks and Mark Roantree.

    To date, query performance in XML databases remains a difficult problem. XML documents are often very large making fast access to nodes within document trees cumbersome for query processors. Many research teams have addressed this issue with efficient algorithms and powerful indexes, but XML systems still cannot perform at the same level as relational databases. In this paper, we present a metamodel, which enables us to efficiently solve relationships between nodes in an XML database using standard SQL. By implementing the metamodel presented here, one can turn any off-the-shelf relational database into a high performance XPath processor. We will demonstrate the significant improvement achieved over three leading databases, and identify where each database is strongest in relation to XPath query performance.

  • Hyperset Approach to Semi-Structured Databases, and Computation of Bisimulation in the Distributed Case
    Richard Molyneux and Vladimir Sazonov.

    We will briefly describe the recently implemented hyperset approach to semi-structured or Web-like and possibly distributed databases with the query system available online at http://www.csc.liv.ac.uk/~molyneux/t/. As this approach is crucially based on the bisimulation relation, the main stress in this paper is on its computation in the distributed case by using a so called bisimulation engine and local approximations of the global bisimulation relation.

  • Answering Multiple-Item Queries in Data Broadcast Systems
    Adesola Omotayo, Ken Barker, Moustafa Hammad, Lisa Higham and Jalal Kawash.

    A lot of research has been done on answering single-item queries, only a few have looked at answering multiple-item queries in data broadcast systems. The few that did, have proposed approaches that are less responsive to changes in the query queue. It is not immediately clear how single-item scheduling algorithms will perform when used in answering pull-based multiple-item queries. This paper investigates the performance of existing single-item scheduling algorithms in answering multiple-item queries in pull-based data broadcast systems. We observed that Longest Wait First, a near-optimal single-item data scheduling algorithm, has been used in environments where users' data access pattern is skewed. This paper also investigates the performance of Longest Wait First under various user access patterns. We propose QLWF: an online data broadcast scheduling algorithm for answering multiple-item queries in pull-based data broadcast systems. For the purpose of comparison with QLWF, we adapted existing pull single-item algorithm, push single-item algorithm, and push multiple-item algorithm to answer multiple-item queries in pull environments. Results from extensive sets of experiments show that QLWF has a superior performance compared with the adapted algorithms.

  • Multi-Join Continuous Query Optimization: Covering the Spectrum of Linear, Acyclic and Cyclic Queries
    Venkatesh Raghavan, Yali Zhu, Elke Rundensteiner and Daniel Dougherty.

    Traditional optimization algorithms that guarantee optimal plans have exponential time complexity and are thus not viable in streaming contexts. Continuous query optimizers commonly adopt heuristic techniques such as Adaptive Greedy to attain polynomial-time execution. However, these techniques are known to produce optimal plans only for linear and star shaped join queries. Motivated by the prevalence of acyclic, cyclic and even complete query shapes in stream applications, we conduct an extensive experimental study of the behavior of the state-of-the-art algorithms. This study has revealed that heuristic-based techniques tend to generate sub-standard plans even for simple acyclic join queries. For general acyclic join queries we extend the classical IK approach to the streaming context to define an algorithm TreeOpt that is guaranteed to find an optimal plan in polynomial time. For the case of cyclic queries, for which finding optimal plans is known to be NP-complete, we present an algorithm FAB which improves other heuristic-based techniques by (i) increasing the likelihood of finding an optimal plan and (ii) improving the effectiveness of finding a near-optimal plan when an optimal plan cannot be found in polynomial time. To handle the entire spectrum of query shapes from acyclic to cyclic we propose a Q-Aware approach that selects the optimization algorithm used for generating the join order, based on the shape of the query.

  • A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form, and Minimization
    Yuqing Wu, Dirk Van Gucht, Marc Gyssens and Jan Paredaens.

    We study the expressiveness of a positive fragment of path queries, denoted Path+, on node-labeled trees documents. The expressiveness of Path+ is studied from two angles. First, we establish that Path+ is equivalent in expressive power to a particular sub-fragment as well as to the class of tree queries, a sub-class of the first-order conjunctive queries defined over label, parent-child, and child-parent predicates. The translation algorithm from tree queries to Path+ yields a normal form for Path+ queries. Using this normal form, we can decompose an Path+ query into sub-queries that can be expressed in a very small sub-fragment of Path+ for which efficient evaluation strategies are available. Second, we characterize the expressiveness of Path+ in terms of its ability to resolve nodes in a document. This result is used to show that each tree query can be translated to a unique, equivalent, and minimal tree query. The combination of these results yields an effective strategy to evaluated a large class of path queries on documents.

Short Papers

  • Towards Building a Knowledge Base for Research on Andean Weaving
    Denise Arnold, Sven Helmer and Rodolfo Velasquez Arando.

    We are working on a knowledge base to store 3D Andean textile patterns together with rich cultural and historic context information. This will allow ontological studies in museum collections as well as on ethnographic and archaeological fieldwork. We build on an existing ontology, extending it to incorporate more content and make it more accessible. This goes well beyond storing and retrieving textile patterns and enables us to capture the semantics and wider context of these patterns.

  • Evaluating a Peer-to-Peer Database Server based on BitTorrent
    John Colquhoun and Paul Watson.

    Database systems have traditionally used a Client-Server architecture. As the server becomes overloaded, clients experience an increase in query response time, and in the worst case the server may be unable to provide any service at all.

    In file-sharing, the problem of server overloading has been addressed by the use of Peer-to-Peer (P2P) techniques in which users (peers) supply files to each other, so sharing the load. This paper describes the Wigan P2P Database System, which was designed to investigate if P2P techniques for reducing server load, thus increasing system scalability, could be applied successfully in a database environment. It is based on the BitTorrent file-sharing approach.

    This paper introduces the Wigan system architecture, explaining how the BitTorrent approach must be modified for a P2P database server. It presents and analyses experimental results, including the TPC-H benchmark, which show that the approach can succeed in delivering scalability in particular cases.

Poster Papers

  • Ontology-Based Method for Schema Matching in a Peer-to-Peer Database System
    Raddad Al King, Abdelkader Hameurlain and Franck Morvan.

    In a P2P DBS, the databases are often developed independently so their schemas are highly heterogeneous. Creating matching rules (henceforth MR) between a given mediated schema and each peer schema at the design-time is not suitable for a volatile P2P environment; in which, a peer may participate in the system only once. For this reason, the MR must be done at the run-time. Schema designers are often the only persons knowing about the semantics of their schemas. At the run-time, one (or both) schema designer(s) could not be available; hence the user must be able to create the MR to support his/her changing requirements. Given that the semantics of a domain ontology is explicitly explained and in order to help the user to create the MR, we propose a schema matching method based on a domain ontology which plays a similar role as that played by a given mediated schema.

  • A Database System for Integrating Conflicting and Uncertain Information from Multiple Correspondents
    Richard Cooper and Laura Devenny.

    One of the more interesting examples of building a database is when it brings together a body of information held by an interested community. Each person in the community knows a fraction of the information and even then knowledge of a particularly fact may be held with a degree of uncertainty, may be imprecise or may be altogether wrong. We discuss here a database system which absorbs assertions about the data from a number of correspondents capturing also the uncertainty of the assertion and taking account of the potential unreliability of the correspondent.

    This leads to a database made out of quadruples — entity, attribute, value and confidence — and to the question as to what is the appropriate answer to any query. We present first a mechanism for capturing and storing such assertions and the confidence we have in them. Then we introduce the concept of an authority — i.e. some mechanism which introduces absolutely certain values into the database. From these certain values and from the resolution of conflicts between correspondents, we assign a reliability measure to each correspondent. The confidence we assign to each assertion is then affected both by the certainty of the correspondent and by the confidence we have in the correspondent. Finally, we present a querying system which returns the most likely values.

  • The Adaptation Model of a Runtime Adaptable DBMS
    Florian Irmert, Thomas Fischer, Frank Lauterwald and Klaus Meyer-Wegener.

    Nowadays maintenance of database management systems (DBMSs) often requires offline operations like enhancement of functionality or security updates. This hampers the availability of the provided services and can cause undesirable implications. Therefore it is essential to minimize the downtime of DBMSs. We present the CoBRA DB (Component Based Runtime Adaptable DataBase) project that allows the adaptation and extension of a modular DBMS at runtime. In this paper we focus on the definition of an adaptation model describing the semantics of adaptation processes and exemplify this model by the decomposition of a database storage system.

  • Semantic Exploitation of Engineering Models: an Application to Oilfield Models
    Laura Silveira Mastella, Yamine Ait-Ameur, Stephane Jean, Michel Perrin and Jean-François Rainaud.

    Engineering development activities rely on computer-based models, which enclose technical data issued from different sources. In this heterogeneous context, retrieving, re-using and merging information is a challenge.

    We propose to annotate engineering models with concepts of domain ontologies, which provide data with explicit semantics. The semantic annotation makes it possible to formulate queries using the semantic concepts that are significant to the domain of the engineers. This work is inspired from a petroleum engineering case study and we validate our approach by presenting an implementation of this case study.

  • Schema Merging based on Semantic Mappings
    Nikos Rizopoulos and Peter McBrien.

    In model management, a Merge operator takes as input a pair of schemas, together with a set of mappings between their objects, and produces as an output an integrated schema. In this paper we present a new approach to implementing the Merge operator based on semantic mappings between objects. Our approach improves upon previous work by (1) using formal low-level transformation rules that can be translated into higher-level rules and (2) examining a much wider range of semantic mappings between schema objects. Our precise mappings and rules enable us to automate Merge and provide a sound and complete framework where schemas are merged without any information loss or gain.