?? icde_2002_elementary.txt
字號:
<proceedings><paper><title>Message from the Program Co-Chairs</title><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract></abstract></paper><paper><title>Program Committee</title><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract></abstract></paper><paper><title>HP-Inventing the Future of Storage</title><author><AuthorName>Nora Denzel</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract></abstract></paper><paper><title>DBXplorer: A System for Keyword-Based Search over Relational Databases</title><author><AuthorName>Sanjay Agrawal</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><author><AuthorName>Surajit Chaudhuri</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><author><AuthorName>Gautam Das</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Keyword-based search has been popularized by Internet search engines. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keyword-based search in relational databases. DBXplorer has been implemented using a commercial relational database and web server and allows users to interact via a browser front-end. We outline the challenges and discuss the implementation of our system including the results of extensive experimental evaluation.</abstract></paper><paper><title>TAILOR: A Record Linkage Tool Box</title><author><AuthorName>Mohamed G. Elfeky</AuthorName><institute><InstituteName>Purdue Universit</InstituteName><country></country></institute></author><author><AuthorName>Ahmed K. Elmagarmid</AuthorName><institute><InstituteName>Purdue Universit</InstituteName><country></country></institute></author><author><AuthorName>Vassilios S. Verykios</AuthorName><institute><InstituteName>Drexel Universit</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning prob-lems are frequently encountered in many research areas, such as knowledge discovery in databases, data ware-housing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, is one of the essential elements of data cleaning. In this paper, we address the record linkage problem by adopt-ing a machine learning approach. Three models are pro-posed and are analyzed empirically. Since no existing model, including those proposed in this paper, has been proved to be superior, we have developed an interactive Record Linkage Toolbox named TAILOR. Users of TAI-LOR can build their own record linkage models by tuning system parameters and by plugging in in-house developed and public domain tools. The proposed toolbox serves as a framework for the record linkage process, and is de-signed in an extensible way to interface with existing and future record linkage models. We have conducted an ex-tensive experimental study to evaluate our proposed mod-els using not only synthetic but also real data. Results show that the proposed machine learning record linkage models outperform the existing ones both in accuracy and in performance.</abstract></paper><paper><title>Providing Database as a Service</title><author><AuthorName>Hakan Hacigumus</AuthorName><institute><InstituteName>University of California, Irvin</InstituteName><country></country></institute></author><author><AuthorName>Sharad Mehrotra</AuthorName><institute><InstituteName>University of California, Irvin</InstituteName><country></country></institute></author><author><AuthorName>Bala Iyer</AuthorName><institute><InstituteName>IBM Silicon Valley Lab</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>In this paper, we explore a new paradigm for data management in which a third party service provider hosts &quot;database as a service&quot; providing its customers seamless mechanisms to create, store, and access their databases at the host site. Such a model alleviates the need for organizations to purchase expensive hardware and software, deal with software upgrades, and hire professionals for administrative and maintenance tasks which are taken over by the service provider. We have developed and deployed a database service on the Internet, called NetDB2, which is in constant use. In a sense, data management model supported by NetDB2 provides an effective mechanism for organizations to purchase data management as a service, thereby freeing them to concentrate on their core businesses. Among the primary challenges introduced by &quot;database as a service&quot; are additional overhead of remote access to data, an infrastructure to guarantee data privacy, and user interface design for such a service. These issues are investigated in the study. We identify data privacy as a particularly vital problem and propose alternative solutions based on data encryption. This paper is meant as a challenges paper for the database community to explore a rich set of research issues that arise in developing such a service.</abstract></paper><paper><title>Detecting Changes in XML Documents</title><author><AuthorName>Gregory Cobena</AuthorName><institute><InstituteName>INRIA Rocquencourt, Franc</InstituteName><country></country></institute></author><author><AuthorName>Serge Abiteboul</AuthorName><institute><InstituteName>INRIA Rocquencourt, Franc</InstituteName><country></country></institute></author><author><AuthorName>Amelie Marian</AuthorName><institute><InstituteName>Columbia University, N</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of ``quality''. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the ``optimal'' in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.</abstract></paper><paper><title>Reverse Engineering for Web Data: From Visual to Semantic Structures</title><author><AuthorName>Christina Yip Chung</AuthorName><institute><InstituteName>Verity Inc</InstituteName><country></country></institute></author><author><AuthorName>Michael Gertz</AuthorName><institute><InstituteName>University of California, Davi</InstituteName><country></country></institute></author><author><AuthorName>Neel Sundaresan</AuthorName><institute><InstituteName>NehaNet Corp</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Despite the advancement of XML, the majority of documents on the Web is still marked up with HTML for visual rendering purposes only, thus building a huge amount of &quot;legacy&quot; data. In order to facilitate querying Web based data in a way more efficient and effective than just keyword based retrieval, enriching such Web documents with both structure and semantics is necessary.This paper describes a novel approach to the integration of topic specific HTML documents into a repository of XML documents. In particular, we describe how topic specific HTML documents are transformed into XML documents. The proposed document transformation and semantic element tagging process utilizes document restructuring rules and minimum information about the topic in form of concepts. For the resulting XML documents, a majority schema is derived that describes common structures among the documents in the form of a DTD.We explore and discuss different techniques and rules for document conversion and majority schema discovery. We finally demonstrate the feasibility and effectiveness of our approach by applying it to a set of resume HTML documents gathered by a Web crawler.</abstract></paper><paper><title>From XML Schema to Relations: A Cost-Based Approach to XML Storage</title><author><AuthorName>Philip Bohannon</AuthorName><institute><InstituteName>Bell Laboratorie</InstituteName><country></country></institute></author><author><AuthorName>Juliana Freire</AuthorName><institute><InstituteName>Bell Laboratorie</InstituteName><country></country></institute></author><author><AuthorName>Prasan Roy</AuthorName><institute><InstituteName>Bell Laboratorie</InstituteName><country></country></institute></author><author><AuthorName>Jerome Simeon</AuthorName><institute><InstituteName>Bell Laboratorie</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>As Web applications manipulate an increasing amount of XML, there is a growing interest in storing XML data in relational databases. Due to the mismatch between the complexity of XML's tree structure and the simplicity of flat relational tables, there are many ways to store the same document in and RDBMS, and a number of heuristic techniques have been proposed. These techniques typically define fixed mappings and do not take application characteristics into account. However a fixed mapping is unlikely to work well for all possible applications. In contrast, LegoDB is a cost-based XML storage mapping engine that explores and space of possible XML-to-relational mappings and selects the best mapping for a given application. LegoDB leverages current XML and relational technologies: 1) is models the target application with an XML Schema, XML data statistics, and an Xquery workload; 2) the space of configurations is generated through XML-Schema rewritings; and 3) the best among the derived configurations is selected using cost estimates obtained through a standard relational optimizer. In this paper, we describe the LegoDB storage engine and provide experimental results that demonstrate the effectiveness of this approach.</abstract></paper><paper><title>Sequenced Subset Operators: Definition and Implementation</title><author><AuthorName>Joseph Dunn</AuthorName><institute><InstituteName>The University of Arizon</InstituteName><country></country></institute></author><author><AuthorName>Sean Davey</AuthorName><institute><InstituteName>The University of Arizon</InstituteName><country></country></institute></author><author><AuthorName>Anne Descour</AuthorName><institute><InstituteName>The University of Arizon</InstituteName><country></country></institute></author><author><AuthorName>Richard T. Snodgrass</AuthorName><institute><InstituteName>The University of Arizon</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Difference, intersection, semi-join and anti-semi-join may be considered binary subset operators, in that they all return a subset of their left-hand argument. These operators are useful for implementing SQL's EXCEPT, INTERSECT, NOT IN and NOT EXISTS, distributed queries and referential integrity. Difference-all and intersection-all operate on multi-sets and track the number of duplicates in both argument relations; they are used to implement SQL's EXCEPT ALL and INTERSECT ALL. Their temporally sequenced analogues, which effectively apply the subset operator at each point in time, are needed for implementing these constructs in temporal databases. These SQL expressions are complex; most necessitate at least a three-way join, with nested NOT EXISTS clauses. We consider how to implement these operators directly in a DBMS. These operators are interesting in that they can fragment the left-hand validity periods (sequenced difference-all also fragments the right-hand periods) and thus introduce memory complications found neither in their non-temporal counterparts nor in temporal joins and semi-joins. This paper introduces novel algorithms for implementing these operators by ordering the computation so that fragments need not be retained in main memory. We evaluate these algorithms and demonstrate that they are no more expensive than a single conventional join.</abstract></paper><paper><title>Exploring Spatial Datasets with Histograms</title><author><AuthorName>Chengyu Sun</AuthorName><institute><InstituteName>University of California, Santa Barbar</InstituteName><country></country></institute></author><author><AuthorName>Divyakant Agrawal</AuthorName><institute><InstituteName>University of California, Santa Barbar</InstituteName><country></country></institute></author><author><AuthorName>Amr El Abbadi</AuthorName><institute><InstituteName>University of California, Santa Barbar</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. The GeoBrowsing service developed for the ADL project provides users an effective and efficient way to explore the content of a spatial dataset. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a storage lower bound to answer queries about the contains relation accurately at a given grid resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query selectivities about these spatial relations. Experimental results show that these algorithms provide highly accurate estimates in real time for a wide range of datasets with various characteristics.</abstract></paper><paper><title>Efficient Temporal Join Processing Using Indices</title><author><AuthorName>Donghui Zhang</AuthorName><institute><InstituteName>University of California, Riversid</InstituteName><country></country></institute></author><author><AuthorName>Vassilis J. Tsotras</AuthorName><institute><InstituteName>University of California, Riversid</InstituteName><country></country></institute></author><author><AuthorName>Bernhard Seeger</AuthorName><institute><InstituteName>Philipps Universit
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -