?? sigmod_1996_elementary.txt
字號:
<proceedings><paper><title>Mining quantitative association rules in large relational tables</title><author><AuthorName>Ramakrishnan Srikant</AuthorName><institute><InstituteName>IBM Almaden Research Center, 650 Harry Road, San Jose, CA and Department of Computer Science, University of Wisconsin, Madison</InstituteName><country></country></institute></author><author><AuthorName>Rakesh Agrawal</AuthorName><institute><InstituteName>IBM Almaden Research Center, 650 Harry Road, San Jose, CA</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Rakesh Agrawal , Tomasz Imieli&#324;ski , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States</name><name>Rakesh Agrawal , Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules in Large Databases, Proceedings of the 20th International Conference on Very Large Data Bases, p.487-499, September 12-15, 1994</name><name>Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States</name><name>Jiawei Han , Yongjian Fu, Discovery of Multiple-Level Association Rules from Large Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.420-431, September 11-15, 1995</name><name>Maurice A. W. Houtsma , Arun N. Swami, Set-Oriented Mining for Association Rules in Relational Databases, Proceedings of the Eleventh International Conference on Data Engineering, p.25-33, March 06-10, 1995</name><name>Anil K. Jain , Richard C. Dubes, Algorithms for clustering data, Prentice-Hall, Inc., Upper Saddle River, NJ, 1988</name><name>Heikki Marmila, Harmu Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pages 181- 192, Seattle, Washington, July 1994.</name><name>Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States</name><name>G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky- Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 229-248. AAAI/MIT Press, Menlo Park, CA, 1991.</name><name>Ramakrishnan Srikant , Rakesh Agrawal, Mining Generalized Association Rules, Proceedings of the 21th International Conference on Very Large Data Bases, p.407-419, September 11-15, 1995</name><name>Ashoka Savasere , Edward Omiecinski , Shamkant B. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.432-444, September 11-15, 1995</name><name>Avi Silberschatz and Alexander Tuzhilin. On Subjective Measures of Interestingness in Knowledge Discovery. In Proc. of the First Int'l Conference on Knowledge Discovery and Data Mining, Montreal, Canada, August 1995.</name></citation><abstract>We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a "greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset.</abstract></paper><paper><title>Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization</title><author><AuthorName>Takeshi Fukuda</AuthorName><institute><InstituteName>IBM Tokyo Research Laboratory</InstituteName><country></country></institute></author><author><AuthorName>Yasukiko Morimoto</AuthorName><institute><InstituteName>IBM Tokyo Research Laboratory</InstituteName><country></country></institute></author><author><AuthorName>Shinichi Morishita</AuthorName><institute><InstituteName>IBM Tokyo Research Laboratory</InstituteName><country></country></institute></author><author><AuthorName>Takeshi Tokuyama</AuthorName><institute><InstituteName>IBM Tokyo Research Laboratory</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Tetsuo Asano , Danny Z. Chen , Naoki Katoh , Takeshi Tokuyama, Polynomial-time solutions to image segmentation, Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, p.104-113, January 28-30, 1996, Atlanta, Georgia, United States</name><name>Rakesh Agrawal , Sakti P. Ghosh , Tomasz Imielinski , Balakrishna R. Iyer , Arun N. Swami, An Interval Classifier for Database Mining Applications, Proceedings of the 18th International Conference on Very Large Data Bases, p.560-573, August 23-27, 1992</name><name>R. Agrawal , T. Imielinski , A. Swami, Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, v.5 n.6, p.914-925, December 1993</name><name>Rakesh Agrawal , Tomasz Imieli&#324;ski , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States</name><name>A. Aggarwal, M. Klawe, S. Moran, P. Shot, and R. Wilbur. Geometric applications of a matrixsearching algorithm. Algorithmica, 2:209-233, 1987.</name><name>Rakesh Agrawal , Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules in Large Databases, Proceedings of the 20th International Conference on Very Large Data Bases, p.487-499, September 12-15, 1994</name><name>Jon Bentley. Programming pearls. Communications of the A CM, 27(27):865-871, September 1984.</name><name>L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.</name><name>Paul Fischer , Klaus-Uwe H&#246;ffgen , Hanno Lefmann , Tomasz Luczak, Approximations with Axis-Aligned Rectangles (Extended Abstract), Proceedings of the 9th International Symposium on Fundamentals of Computation Theory, p.244-255, August 23-27, 1993</name><name>Takeshi Fukuda , Yasuhido Morimoto , Shinichi Morishita , Takeshi Tokuyama, Mining optimized association rules for numeric attributes, Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.182-191, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, and Takeshi Tokuyama. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. In Technical Report, IBM Tokyo Research Laboratory, 1996.</name><name>M.R. Garey and D. S. Johnson. The rectilinear steiner tree problem is np complete. SIAM J. Appl. Math, 32:836-834, 1977.</name><name>Jiawei Han , Yandong Cai , Nick Cercone, Knowledge Discovery in Databases: An Attribute-Oriented Approach, Proceedings of the 18th International Conference on Very Large Data Bases, p.547-559, August 23-27, 1992</name><name>Daniel A. Keim , Hans-Peter Kriegel , Thomas Seidl, Supporting Data Mining of Large Databases by Visual Feedback Queries, Proceedings of the Tenth International Conference on Data Engineering, p.302-313, February 14-18, 1994</name><name>Manish Mehta , Rakesh Agrawal , Jorma Rissanen, SLIQ: A Fast Scalable Classifier for Data Mining, Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, p.18-32, March 25-29, 1996</name><name>Raymond T. Ng , Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proceedings of the 20th International Conference on Very Large Data Bases, p.144-155, September 12-15, 1994</name><name>Raymond T. Ng , Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proceedings of the 20th International Conference on Very Large Data Bases, p.144-155, September 12-15, 1994</name><name>G. L. Nemhauser , A. H. G. Rinnooy Kan , M. J. Todd, Optimization, Elsevier North-Holland, Inc., New York, NY, 1989</name><name>Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States</name><name>G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, pages 229-248, 1991.</name><name>Gregory Piateski , William Frawley, Knowledge Discovery in Databases, MIT Press, Cambridge, MA, 1991</name><name>J. R. Quinlan, Induction of Decision Trees, Machine Learning, v.1 n.1, p.81-106</name><name>J. Ross Quinlan, C4.5: programs for machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1993</name><name>Ramakrishnan Srikant , Rakesh Agrawal, Mining quantitative association rules in large relational tables, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.1-12, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Michael Stonebraker , Rakesh Agrawal , Umeshwar Dayal , Erich J. Neuhold , Andreas Reuter, DBMS Research at a Crossroads: The Vienna Update, Proceedings of the 19th International Conference on Very Large Data Bases, p.688-692, August 24-27, 1993</name></citation><abstract>We discuss data mining based on association rules for two numeric attributes and one Boolean attribute. For example, in a database of bank customers, "Age" and "Balance" are two numeric attributes, and "CardLoan" is a Boolean attribute. Taking the pair (Age, Balance) as a point in two-dimensional space, we consider an association rule of the form((Age, Balance) &isin; P) &rArr; (CardLoan = Yes),which implies that bank customers whose ages and balances fall in a planar region P tend to use card loan with a high probability. We consider two classes of regions, rectangles and admissible (i.e. connected and x-monotone) regions. For each class, we propose efficient algorithms for computing the regions that give optimal association rules for gain, support, and confidence, respectively. We have implemented the algorithms for admissible regions, and constructed a system for visualizing the rules.</abstract></paper><paper><title>IDEA: interactive data exploration and analysis</title><author><AuthorName>Peter G. Selfridge</AuthorName><institute><InstituteName>AT&T Research</InstituteName><country></country></institute></author><author><AuthorName>Divesh Srivastava</AuthorName><institute><InstituteName>AT&T Research</InstituteName><country></country></institute></author><author><AuthorName>Lynn O. Wilson</AuthorName><institute><InstituteName>AT&T</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>R. Agrawal , T. Imielinski , A. Swami, Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, v.5 n.6, p.914-925, December 1993</name><name>Asymetrix Corporation, ll0-110th Ave N.E., Suite 700, Bellevue, WA 98004. ToolBook Manual.</name><name>R.J. Brachman and T. Anand. The knowledge discovery process. In Working Notes of the AAAI-9~ Workshop on Knowledge Discovery in Databases, 1994.</name><name>R. J. Brachman, P. G. Selfridge, L. G. Terveen, B. Altman, F. Halper, T. Kirk, and A. Lazar. Integrated support for data archeology. International Journal of Intelligent and Cooperative Information Systems, 2(2):159-185, June 1993.</name><name>A data miner's tools.Byte Magazine, 2(10):91, October 1995.</name><name>M. Holsheimer and M. L. Kersten. Architectural support for data mining. In Working Notes of the AAAI-94 Workshop on Knowledge Discovery in Databases, 1994.</name><name>Gregory Piateski , William Frawley, Knowledge Discovery in Databases, MIT Press, Cambridge, MA, 1991</name><name>G. Piatetsky-Shapiro and C. J. Matheus. Knowledge discovery workbench for exploring business databases. International Journal of Intelligent Systems, 7(7):675-686, 1992.</name><name>NE. Simoudis, B. Livezey, and R. Kerber. Integration inductive and deductive reasoning for database mining. In Workzn9 Noles of the AAAI-94 Workshop on Knowledge Discovery in Databases, 1994.</name><name>WATCOM International Corporation, 415 Phillip Street, Waterloo, Ontario, CA NXL 3X2. WATCOM SQL ~. O.</name></citation><abstract>The analysis of business data is often an ill-defined task characterized by large amounts of noisy data. Because of this, business data analysis must combine two kinds of intertwined tasks: exploration and analysis. Exploration is the process of finding the appropriate subset of data to analyze, and analysis is the process of measuring the data to provide the business answer. While there are many tools available both for exploration and for analysis, a single tool or set of tools may not provide full support for these intertwined tasks. We report here on a project that set out to understand a specific business data analysis problem and build an environment to support it. The results of this understanding are, first of all, a detailed list of requirements of this task; second, a set of capabilities that meet these requirements; and third, an implemented client-server solution that addresses many of these requirements and identifies others for future work. Our solution incorporates several novel perspectives on data analysis and combines a history mechanism with a graphical, re-usable representation of the analysis and exploration process. Our approach emphasizes using the database itself to represent as many of these functions as possible.</abstract></paper><paper><title>Rapid bushy join-order optimization with Cartesian products</title><author><AuthorName>Bennet Vance</AuthorName><institute><InstituteName>Oregon Graduate Institute of Science & Technology</InstituteName><country></country></institute></author><author><AuthorName>David Maier</AuthorName><institute><InstituteName>Oregon Graduate Institute of Science & Technology</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Sophie Cluet , Guido Moerkotte, On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products, Proceedings of the 5th International Conference on Database Theory, p.54-67, January 11-13, 1995</name><name>C&#233;sar A. Galindo-Legaria , Arjan Pellenkoft , Martin L. Kersten, Fast, Randomized Join-Order Selection - Why Use Transformations?, Proceedings of the 20th International Conference on Very Large Data Bases, p.85-95, September 12-15, 1994</name><name>Goetz Graefe , William J. McKenna, The Volcano Optimizer Generator: Extensibility and Efficient Search, Proceedings of the Ninth International Conference on Data Engineering, p.209-218, April 19-23, 1993</name><name>Joseph M. Hellerstein , Michael Stonebraker, Predicate migration: optimizing queries with expensive predicates, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.267-276, May 25-28, 1993, Washington, D.C., United States</name><name>Toshihide Ibaraki , Tiko Kameda, On the optimal nesting order for computing N-relational joins, ACM Transactions on Database Systems (TODS), v.9 n.3, p.482-502, Sept. 1984</name><name>Yannis E. Ioannidis , Younkyung Cha Kang, Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.168-177, May 29-31, 1991, Denver, Colorado, United States</name><name>Donald E. Knuth. Fundamental Algorithms, volume 1 of The Art o/ Computer Programming. Addison-Wesley, second edition, 1973.</name><name>O. Martin and S. Otto. Combining simulated annealing with local search heuristics. To appear as a chapter of Metaheumst~cs in Combinatorial Optim~zatzon, volume 60 in the series Annals of Operations Research, edited by G. Laporte and I. Osman.</name><name>Kiyoshi Ono , Guy M. Lohman, Measuring the complexity of join enumeration in query optimization, Proceedings of the sixteenth international conference on Very large databases, p.314-325, September 1990, Brisbane, Australia</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name><name>Michael Steinbrunn, Guido Moerkotte, and A1- fons Kemper. Optimizing join orders. Technical Report MIP-9307, Universit~t Passau, 1993.</name><name>M. Steinbrunn. Heuristic and Randomised Opt~misat~on Techniques in Object-Omented Database Systems. infix-Verlag, Ringstrafle 32, 53757 St. Augustin, Germany, 1996. Dissertation, Universit~t Passau.</name></citation><abstract>Query optimizers often limit the search space for join orderings, for example by excluding Cartesian products in subplans or by restricting plan trees to left-deep vines. Such exclusions are widely assumed to reduce optimization effort while minimally affecting plan quality. However, we show that searching the complete space of plans is more affordable than has been previously recognized, and that the common exclusions may be of little benefit.We start by presenting a Cartesian product optimizer that requires at most a few seconds of workstation time to search the space of bushy plans for products of up to 15 relations. Building on this result, we present a join-order optimizer that achieves a similar level of performance, and retains the ability to include Cartesian products in subplans wherever appropriate. The main contribution of the paper is in fully separating join-order enumeration from predicate analysis, and in showing that the former problem in particular can be solved swiftly by novel implementation techniques. A secondary contribution is to initiate a systematic approach to the benchmarking of join-order optimization, which we apply to the evaluation of our method.</abstract></paper><paper><title>SQL query optimization: reordering for a general class of queries</title><author><AuthorName>Piyush Goel</AuthorName><institute><InstituteName>IBM Santa Teresa Laboratory</InstituteName><country></country></institute></author><author><AuthorName>Bala Iyer</AuthorName><institute><InstituteName>IBM Santa Teresa Laboratory</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Bhargava, G., Goel, P. and Iyer, B., "Reordering of complex queries involving joins and outer joins," IBM Techn,cal Report TR03.567, July b1994.</name><name>Gautam Bhargava , Piyush Goel , Bala Iyer, Hypergraph based reorderings of outer join queries with complex predicates, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.304-315, May 22-25, 1995, San Jose, California, United States</name><name>Gautam Bhargava , Piyush Goel , Balakrishna R. Iyer, No regression algorithm for the enumeration of projections in SQL queries with joins and outer joins, Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research, p.8, November 07-09, 1995, Toronto, Ontario, Canada</name><name>Gautam Bhargava , Piyush Goel , Balakrishna R. Iyer, Simplification of outer joins, Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research, p.7, November 07-09, 1995, Toronto, Ontario, Canada</name><name>Gautam Bhargava , Piyush Goel , Balakrishna R. Iyer, Efficient Processing of Outer Joins and Aggregate Functions, Proceedings of the Twelfth International Conference on Data Engineering, p.441-449, February 26-March 01, 1996</name><name>C&#233;sar A. Galindo-Legaria , Arnon Rosenthal, How to Extend a Conventional Optimizer to Handle One- and Two-Sided Outerjoin, Proceedings of the Eighth International Conference on Data Engineering, p.402-409, February 03-07, 1992</name><name>Cesar Alejandro Galindo-Legaria, Algebraic optimization of outerjoin queries, Harvard University, Cambridge, MA, 1992</name><name>C&#233;sar A. Galindo-Legaria, Outerjoins as disjunctions, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.348-358, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>Richard A. Ganski , Harry K. T. Wong, Optimization of nested SQL queries revisited, Proceedings of the 1987 ACM SIGMOD international conference on Management of data, p.23-33, May 27-29, 1987, San Francisco, California, United States</name><name>Gupta, A., Harinarayan, V. and Quass, D., "Generalized Projections: A powerful approach to aggregation", To appear in VLDB, 1995.</name><name>M. Muralikrishna, Improved Unnesting Algorithms for Join Aggregate SQL Queries, Proceedings of the 18th International Conference on Very Large Data Bases, p.91-102, August 23-27, 1992</name><name>Hamid Pirahesh , Joseph M. Hellerstein , Waqar Hasan, Extensible/rule based query rewrite optimization in Starburst, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.39-48, June 02-05, 1992, San Diego, California, United States</name><name>Arnon Rosenthal , Cesar Galindo-Legaria, Query graphs, implementing trees, and freely-reorderable outerjoins, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.291-299, May 23-26, 1990, Atlantic City, New Jersey, United States</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name></citation><abstract>The strength of commercial query optimizers like DB2 comes from their ability to select an optimal order by generating all equivalent reorderings of binary operators. However, there are no known methods to generate all equivalent reorderings for a SQL query containing joins, outer joins, and groupby aggregations. Consequently, some of the reorderings with significantly lower cost may be missed. Using hypergraph model and a set of novel identities, we propose a method to reorder a SQL query containing joins, outer joins, and groupby aggregations. While these operators are sufficient to capture the SQL semantics, it is during their reordering that we identify a powerful primitive needed for a dbms. We report our findings of a simple, yet fundamental operator, generalized selection, and demonstrate its power to solve the problem of reordering of SQL queries containing joins, outer joins, and groupby aggregations.</abstract></paper><paper><title>Fundamental techniques for order optimization</title><author><AuthorName>David Simmen</AuthorName><institute><InstituteName>IBM Santa Teresa Lab</InstituteName><country></country></institute></author><author><AuthorName>Eugene Shekita</AuthorName><institute><InstituteName>IBM Almaden Research Center</InstituteName><country></country></institute></author><author><AuthorName>Timothy Malkemus</AuthorName><institute><InstituteName>IBM Austin Lab</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>G. Antosheknov. Query processing in dec rdb: Major issues and future challenges. In IEEE Bulletin on the Technical Comittee on Data Engineer:ng, December 1993.</name><name>Catriel Beeri , Philip A. Bernstein, Computational problems related to the design of normal form relational schemas, ACM Transactions on Database Systems (TODS), v.4 n.1, p.30-59, March 1979</name><name>Dina Bitton , David J. DeWitt, Duplicate record elimination in large data files, ACM Transactions on Database Systems (TODS), v.8 n.2, p.255-265, June 1983</name><name>M. Blasgen and K. Eswaran. On the evaluation of queries in a relational data base system. Technical Report 1745, IBM Santa Teresa Lab, April 1976.</name><name>S. Chaudhuri and K. Shim. Including group-by in query optimization. In Proceedings of the 19th International Conference on Very Large Data Bases, August 1993.</name><name>H. Darwen and C. Date. The role of functional dependencies in query decomposition. In Relatzonal Database Writings 1989-1991~ Addison Wesley, 1992.</name><name>David J DeWitt , Randy H Katz , Frank Olken , Leonard D Shapiro , Michael R Stonebraker , David Wood, Implementation techniques for main memory database systems, Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts</name><name>S. Englert. Tpc benchmark d. In Transact:on Processing Performance Council, 777 N. First St, Suite 600, San Jose CA 95112-6311, October 1995.</name><name>Goetz Graefe , David J. DeWitt, The EXODUS optimizer generator, Proceedings of the 1987 ACM SIGMOD international conference on Management of data, p.160-172, May 27-29, 1987, San Francisco, California, United States</name><name>Goetz Graefe, Query evaluation techniques for large databases, ACM Computing Surveys (CSUR), v.25 n.2, p.73-169, June 1993</name><name>Joseph M. Hellerstein, Practical predicate placement, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.325-335, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>L. M. Haas , J. C. Freytag , G. M. Lohman , H. Pirahesh, Extensible query processing in starburst, Proceedings of the 1989 ACM SIGMOD international conference on Management of data, p.377-388, June 1989, Portland, Oregon, United States</name><name>Matthias Jarke , Jurgen Koch, Query Optimization in Database Systems, ACM Computing Surveys (CSUR), v.16 n.2, p.111-152, June 1984</name><name>Guy M. Lohman, Grammar-like functional rules for representing query optimization alternatives, Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.18-27, June 01-03, 1988, Chicago, Illinois, United States</name><name>Kiyoshi Ono , Guy M. Lohman, Measuring the complexity of join enumeration in query optimization, Proceedings of the sixteenth international conference on Very large databases, p.314-325, September 1990, Brisbane, Australia</name><name>Hamid Pirahesh , Joseph M. Hellerstein , Waqar Hasan, Extensible/rule based query rewrite optimization in Starburst, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.39-48, June 02-05, 1992, San Diego, California, United States</name><name>G. N. Paulley , Per-&#197;ke Larson, Exploiting Uniqueness in Query Optimization, Proceedings of the Tenth International Conference on Data Engineering, p.68-79, February 14-18, 1994</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name><name>Weipeng P. Yan , Per-&#197;ke Larson, Performing Group-By before Join, Proceedings of the Tenth International Conference on Data Engineering, p.89-100, February 14-18, 1994</name></citation><abstract>Decision support applications are growing in popularity as more business data is kept on-line. Such applications typically include complex SQL queries that can test a query optimizer's ability to produce an efficient access plan. Many access plan strategies exploit the physical ordering of data provided by indexes or sorting. Sorting is an expensive operation, however. Therefore, it is imperative that sorting is optimized in some way or avoided all together. Toward that goal, this paper describes novel optimization techniques for pushing down sorts in joins, minimizing the number of sorting columns, and detecting when sorting can be avoided because of predicates, keys, or indexes. A set of fundamental operations is described that provide the foundation for implementing such techniques. The operations exploit data properties that arise from predicate application, uniqueness, and functional dependencies. These operations and techniques have been implemented in IBM's DB2/CS.</abstract></paper><paper><title>A Teradata content-based multimedia object manager for massively parallel architectures</title><author><AuthorName>W. O'Connell</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>I. T. Ieong</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>D. Schrader</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>C. Watson</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>G. Au</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>A. Biliris</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>S. Choo</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>P. Colin</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>G. Linderman</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>E. Panagos</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>J. Wang</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>T. Walter</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation></citation><abstract>The Teradata Multimedia Object Manager is a general-purpose content analysis multimedia server designed for symmetric multiprocessing and massively parallel processing environments. The Multimedia Object Manager defines and manipulates user-defined functions (UDFs), which are invoked in parallel to analyze or manipulate the contents of multimedia objects. Several computationally intensive applications of this technology, which use large persistent datasets, include fingerprint matching, signature verification, face recognition, and speech recognition/translation.</abstract></paper><paper><title>Fault-tolerant architectures for continuous media servers</title><author><AuthorName>Banu &#214;zden</AuthorName><institute><InstituteName>Bell Laboratories</InstituteName><country></country></institute></author><author><AuthorName>Rajeev Rastogi</AuthorName><institute><InstituteName>Bell Laboratories</InstituteName><country></country></institute></author><author><AuthorName>Prashant Shenoy</AuthorName><institute><InstituteName>University of Texas, Austin</InstituteName><country></country></institute></author><author><AuthorName>Avi Silberschatz</AuthorName><institute><InstituteName>Bell Laboratories</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Steven Berson , Leana Golubchik , Richard R. Muntz, Fault tolerant design of multimedia servers, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.364-375, May 22-25, 1995, San Jose, California, United States</name><name>Mon-Song Chen , Dilip D. Kandlur , Philip S. Yu, Optimization of the grouped sweeping scheduling (GSS) with heterogeneous multimedia streams, Proceedings of the first ACM international conference on Multimedia, p.235-242, August 02-06, 1993, Anaheim, California, United States</name><name>Peter M. Chen , Edward K. Lee , Garth A. Gibson , Randy H. Katz , David A. Patterson, RAID: high-performance, reliable secondary storage, ACM Computing Surveys (CSUR), v.26 n.2, p.145-185, June 1994</name><name>Mark Holland , Garth A. Gibson, Parity declustering for continuous operation in redundant disk arrays, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.23-35, October 12-15, 1992, Boston, Massachusetts, United States</name><name>Jr. M Hall. Combinatorial Theory. John Wiley &: Sons, 1986.</name><name>Richard R. Muntz , John C. S. Lui, Performance analysis of disk arrays under failure, Proceedings of the sixteenth international conference on Very large databases, p.162-173, September 1990, Brisbane, Australia</name><name>Antoine Mourad. Reliable disk striping in videoon-demand servers. In Proceedings of the International Conference on Distributed Multimedia Systems and Applications, stanford, CA, August 1995.</name><name>A Framework for the Storage and Retrieval of Continuous Media Data, Proceedings of the International Conference on Multimedia Computing and Systems (ICMCS'95), p.2, May 15-18, 1995</name><name>B. Ozden, R. Rastogi, and A. Silberschatz. Disk striping in video server environments. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Hiroshima, Japan, June 1996.</name><name>David A. Patterson , Garth Gibson , Randy H. Katz, A case for redundant arrays of inexpensive disks (RAID), Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.109-116, June 01-03, 1988, Chicago, Illinois, United States</name><name>Abraham Siberschatz , Peter B. Galvin, Operating System Concepts, 4th Ed., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1993</name><name>Fouad A. Tobagi , Joseph Pang , Randall Baird , Mark Gang, Streaming RAID: a disk array management system for video files, Proceedings of the first ACM international conference on Multimedia, p.393-400, August 02-06, 1993, Anaheim, California, United States</name></citation><abstract>Continuous media servers that provide support for the storage and retrieval of continuous media data (e.g., video, audio) at guaranteed rates are becoming increasingly important. Such servers, typically, rely on several disks to service a large number of clients, and are thus highly susceptible to disk failures. We have developed two fault-tolerant approaches that rely on admission control in order to meet rate guarantees for continuous media requests. The schemes enable data to be retrieved from disks at the required rate even if a certain disk were to fail. For both approaches, we present data placement strategies and admission control algorithms. We also present design techniques for maximizing the number of clients that can be supported by a continuous media server. Finally, through extensive simulations, we demonstrate the effectiveness of our schemes.</abstract></paper><paper><title>Optimizing queries over multimedia repositories</title><author><AuthorName>Surajit Chaudhuri</AuthorName><institute><InstituteName>Microsoft Research and Hewlett-Packard Laboratories</InstituteName><country></country></institute></author><author><AuthorName>Luis Gravano</AuthorName><institute><InstituteName>Hewlett-Packard Laboratories, Stanford University</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Ronald Fagin, Combining fuzzy information from multiple systems (extended abstract), Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.216-226, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Surajit Chaudhuri and Luis Gravano. Optimizing queries over multimedia repositories. Technical report, Hewlett-Packard Laboratories, March 1996. Also available as ftp://db.stanford.edu/pub/gravano/- 1996/s ~gmod. ps.</name><name>F. Rabitti , P. Savino, Retrieval of multimedia documents by imprecise query specification, Proceedings of the international conference on extending database technology on Advances in database technology, p.203-218, March 1990, Venice, Italy</name><name>W. Niblack, R. Barber, W. Equitz, h/i. Flickner, E. Glasman, D. Petkovic, P. Yanker, and C. Faloutsos. The QBIC project: Querying images by content using color, texture, and shape. In Storage and retrieval ]or zmage and video databases (SPIE), pages 173-187, February 1993.</name><name>Gerard Salton, Automatic text processing: the transformation, analysis, and retrieval of information by computer, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1989</name><name>Antonin Guttman, R-trees: a dynamic index structure for spatial searching, Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts</name><name>Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States</name><name>Timos K. Sellis , Nick Roussopoulos , Christos Faloutsos, The R+-Tree: A Dynamic Index for Multi-Dimensional Objects, Proceedings of the 13th International Conference on Very Large Data Bases, p.507-518, September 01-04, 1987</name><name>Nick Roussopoulos , Stephen Kelley , Fr&#233;d&#233;ric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name><name>Alfons Kemper , Guido Moerkotte , Michael Steinbrunn, Optimizing Boolean Expressions in Object-Bases, Proceedings of the 18th International Conference on Very Large Data Bases, p.79-90, August 23-27, 1992</name><name>Joseph M. Hellerstein , Michael Stonebraker, Predicate migration: optimizing queries with expensive predicates, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.267-276, May 25-28, 1993, Washington, D.C., United States</name><name>Toshihide Ibaraki , Tiko Kameda, On the optimal nesting order for computing N-relational joins, ACM Transactions on Database Systems (TODS), v.9 n.3, p.482-502, Sept. 1984</name><name>Ravi Krishnamurthy , Haran Boral , Carlo Zaniolo, Optimization of Nonrecursive Queries, Proceedings of the 12th International Conference on Very Large Data Bases, p.128-137, August 25-28, 1986</name><name>Luis Gravano , Hector Garcia-Molina, Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies, Proceedings of the 21th International Conference on Very Large Data Bases, p.78-89, September 11-15, 1995</name><name>Christos Faloutsos , Ibrahim Kamel, Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension, Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.4-13, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>Michael Carey , Laura Haas, Extensible database management systems, ACM SIGMOD Record, v.19 n.4, p.54-60, Dec. 1990</name><name>A. Kemper , G. Moerkotte , K. Peithner , M. Steinbrunn, Optimizing disjunctive queries with expensive predicates, ACM SIGMOD Record, v.23 n.2, p.336-347, June 1994</name><name>M. T. &#214;zsu , D. J. Meechan, Finding heuristics for processing selection queries in relational database systems, Information Systems, v.15 n.3, p.359-373, 1990</name><name>Arnon Rosenthal , David Reiner, An architecture for query optimization, Proceedings of the 1982 ACM SIGMOD international conference on Management of data, June 02-04, 1982, Orlando, Florida</name><name>C. Mohan , Don Haderle , Yun Wang , Josephine Cheng, Single table access using multiple indexes: optimization, execution, and concurrency control techniques, Proceedings of the international conference on extending database technology on Advances in database technology, p.29-43, March 1990, Venice, Italy</name><name>D. S. Batory, On searching transposed files, ACM Transactions on Database Systems (TODS), v.4 n.4, p.531-544, Dec. 1979</name><name>M. J. Carey , L. M. Haas , P. M. Schwarz , M. Arya , W. F. Cody , R. Fagin , M. Flickner , A. W. Luniewski , W. Niblack , D. Petkovic , J. Thomas , J. H. Williams , E. L. Wimmers, Towards heterogeneous multimedia information systems: the Garlic approach, Proceedings of the 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management (RIDE-DOM'95), p.124, March 06-07, 1995</name></citation><abstract>Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, indicating how well the object matches the selection condition (ranking). Also, multimedia repositories may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of queries over multimedia repositories. A key issue is the choice of the indexes used to search the repository. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we solve the problem efficiently when the predicates in the query are independent. We also show that the problem of optimizing queries that ask for a few top-ranked objects can be viewed, in many cases, as that of evaluating selection conditions. Thus, both problems can be viewed together as an extended filtering problem.</abstract></paper><paper><title>BIRCH: an efficient data clustering method for very large databases</title><author><AuthorName>Tian Zhang</AuthorName><institute><InstituteName>Computer Sciences Dept., Univ. of Wisconsin-Madison</InstituteName><country></country></institute></author><author><AuthorName>Raghu Ramakrishnan</AuthorName><institute><InstituteName>Computer Sciences Dept., Univ. of Wisconsin-Madison</InstituteName><country></country></institute></author><author><AuthorName>Miron Livny</AuthorName><institute><InstituteName>Computer Sciences Dept., Univ. of Wisconsin-Madison</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Peter C, heeselnan, James Kelly, Matthew Self, et al., Auto CIass : A Bayesian Classification System, Proc. of the 5th Int'l Conf. on Machine Learning, Morgan Kaufman, 3un. 1988.</name><name>Richard Duds, and Peter E. Hart, Pattern Classification and Scene Analysis, Wiley, 1973.</name><name>R. Dubes, and A.K. Jain, Clustering Methodologies in Exploratory Data Analysis Advances in C, omputers, Edited by M.C. Yovits, Vol. 19, Academic Press, New York, 1980.</name><name>Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu, A Database Interface .for Clustering in Large Spatial Databases, Proc. of 1st [nt'l Conf. on Knowledge Discovery and Data Mining, 1995.</name><name>Martin Ester , Hans-Peter Kriegel , Xiaowei Xu, Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification, Proceedings of the 4th International Symposium on Advances in Spatial Databases, p.67-82, August 06-09, 1995</name><name>Douglas H. Fisher, Knowledge Acquisition Via Incremental Conceptual Clustering, Machine Learning, v.2 n.2, p.139-172, September 1987</name><name>Douglas H. Fisher, Iterative Optimization and Simplification of Hierarchical CIuaterings, Technical Report CS-95-01, Dept. of Computer Science, Vanderbilt l lniversity, Nashville, TN 37235.</name><name>Allen Gersho , Robert M. Gray, Vector quantization and signal compression, Kluwer Academic Publishers, Norwell, MA, 1991</name><name>Leonard Kaufman, and Peter J. Rousseeuw, Finding Groups in Data - An Introduction to Cluster Analysis, Wiley Series in Probability and Mathematical Statistics, 1990.</name><name>Michael Lebowitz, Experiments with Incremental Concept Formation: UNIMEM, Machine Learning, v.2 n.2, p.103-138, September 1987</name><name>R.C.T.Lee, Clustering analysis and its applications, Advances in Information Systems Science, Edited by J .T.Toum, Vol. 8, pp. 169-292, Plenum Press, New York, 1981.</name><name>F. Murtagh, A Survey of _Recent Advances in Hier'archical Clustering Algorithms, The Computer Journal, 1933.</name><name>Raymond T. Ng , Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proceedings of the 20th International Conference on Very Large Data Bases, p.144-155, September 12-15, 1994</name><name>Clark F. Olson, Parallel Algorithms for Hierarchical Clustering, University of California at Berkeley, Berkeley, CA, 1994</name><name>Tian Zhang, Raghu Ramakrishnan, and Miron Livl~y, BIRCH: An Efficient Data Clustering Method .for Very Large Databases, Technical Report, Computer Sciences Dept., Univ. of Wisconsin-Madison, 1995.</name></citation><abstract>Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH's time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.</abstract></paper><paper><title>On-line reorganization of sparsely-populated B+-trees</title><author><AuthorName>Chendong Zou</AuthorName><institute><InstituteName>College of Computer Science, Northeastern University</InstituteName><country></country></institute></author><author><AuthorName>Betty Salzberg</AuthorName><institute><InstituteName>College of Computer Science, Northeastern University</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>R. Bayer and M Schkolnick. C, oncurrency of operations on B-trees. Aeta Informatica, 9(1):1- 21, 1977.</name><name>Jim Gray , Andreas Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1992</name><name>Theodore Johnson , Dennis Shasha, B-trees with inserts and deletes: why free-at-empty is better than merge-at-half, Journal of Computer and System Sciences, v.47 n.1, p.45-76, Aug. 1993</name><name>David Lomet , Betty Salzberg, Access method concurrency with recovery, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.351-360, June 02-05, 1992, San Diego, California, United States</name><name>David B. Lomet , Mark R. Tuttle, Redo Recovery after System Crashes, Proceedings of the 21th International Conference on Very Large Data Bases, p.457-468, September 11-15, 1995</name><name>C. Mohan , Inderpal Narang, Algorithms for creating indexes for very large tables without quiescing updates, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.361-370, June 02-05, 1992, San Diego, California, United States</name><name>C. Mohan, ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes, Proceedings of the sixteenth international conference on Very large databases, p.392-405, September 1990, Brisbane, Australia</name><name>Edward Omiecinski , Liehuey Lee , Peter Scheuermann, Concurrent File Reorganization for Record Clustering: A Performance Study, Proceedings of the Eighth International Conference on Data Engineering, p.265-272, February 03-07, 1992</name><name>Edward Omiecinski, Concurrent Storage Structure Conversion: from B+ Tree to Linear Hash File, Proceedings of the Fourth International Conference on Data Engineering, p.589-596, February 01-05, 1988</name><name>Betty Salzberg, File structures: an analytic approach, Prentice-Hall, Inc., Upper Saddle River, NJ, 1988</name><name>V. Srinivasan , Michael J. Carey, Performance of On-Line Index Construction Algorithms, Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology, p.293-309, March 23-27, 1992</name><name>Betty Salzberg , Allyn Dimock, Principles of Transaction-Based On-Line Reorganization, Proceedings of the 18th International Conference on Very Large Data Bases, p.511-520, August 23-27, 1992</name><name>Gary Smith. Online reorganization of keysequenced t~bles and files. Tandem System Review, 6(2):52-59, October 1990. Describe algorithm of Franco Putzolu.</name><name>H. Wedekind. On the select,on of access paths zn a data base system, chapter Database Management. North Holland Publishing Company, 1974.</name><name>(:hendong Zou and Betty SMzberg. On-Line Reorganization of Sparsely-Populated B+trees. Technical Report NU-CCS-95-18, Northeastern University, College of Computer Science, Boston, MA (USA), 1995.</name></citation><abstract>In this paper, we present an efficient method to do online reorganization of sparsely-populated B+-trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves may swap locations or be moved into empty pages so that they are in key order on the disk. After the leaves are reorganized, the method shrinks the tree by making a copy of the upper part of the tree while leaving the leaves in place. A new concurrency method is introduced so that only a minimum number of pages are locked during reorganization. During leaf reorganization, Forward Recovery is used to save all work already done while maintaining consistency after system crashes. A heuristic algorithm is developed to reduce the number of swaps needed during leaf reorganization, so that better concurrency and easier recovery can be achieved. A detailed description of switching from the old B+-tree to the new B+-tree is described for the first time.</abstract></paper><paper><title>Two techniques for on-line index modification in shared nothing parallel databases</title><author><AuthorName>Kiran J. Achyutuni</AuthorName><institute><InstituteName>Informix Software Inc and Georgia Tech</InstituteName><country></country></institute></author><author><AuthorName>Edward Omiecinski</AuthorName><institute><InstituteName>Georgia Tech</InstituteName><country></country></institute></author><author><AuthorName>Shamkant B. Navathe</AuthorName><institute><InstituteName>Georgia Tech</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>Kiran Jyotsna Achyutuni , Shamkant B. Navathe , Edward Omiecenski, On-line tuning of data placement in parallel databases, 1995</name><name>David DeWitt , Jim Gray, Parallel database systems: the future of high performance database systems, Communications of the ACM, v.35 n.6, p.85-98, June 1992</name><name>Inform,x Software Inc. Informix Manuals, release 7 1 edition, December 1994-</name><name>S D Lang , J R Driscoll , J H Jou, Batch insertion for tree structured file organizations&mdash;improving differential database representation, Information Systems, v.11 n.2, p.167-175, 1986</name><name>C. Mohan , Inderpal Narang, Algorithms for creating indexes for very large tables without quiescing updates, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.361-370, June 02-05, 1992, San Diego, California, United States</name><name>C. Mohan, ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes, Proceedings of the sixteenth international conference on Very large databases, p.392-405, September 1990, Brisbane, Australia</name><name>E. Omiecinski , L. Lee , P. Scheuermann, Performance Analysis of a Concurrent File Reorganization Algorithm for Record Clustering, IEEE Transactions on Knowledge and Data Engineering, v.6 n.2, p.248-257, April 1994</name><name>E, Omtecinskl Incremental flie reorgan~zat,on schemes In Proc, l lth Intl Conf Very Large Data Bases, pages pp 346-35T, Sac Marco, CA, Aug. 1985 Morgan Kaufmann Publishers</name><name>E. Omiecinski, Concurrent file conversion between B+-tree and linear hash files, Information Systems, v.14 n.5, p.371-383, 1989</name><name>Edward Omiecinski , Peter Scheuermann, A parallel algorithm for record clustering, ACM Transactions on Database Systems (TODS), v.15 n.4, p.599-624, Dec. 1990</name><name>V Srinivasan and M Carey. On-line index construction a[gor,thms In Proc. 4th Internatlon Workshop on High Performance Transaction Systems, Asilomar, September 1991</name><name>Betty Salzberg , Allyn Dimock, Principles of Transaction-Based On-Line Reorganization, Proceedings of the 18th International Conference on Very Large Data Bases, p.511-520, August 23-27, 1992</name><name>Gary H. Sockut , Robert P. Goldberg, Database Reorganization&mdash;Principles and Practice, ACM Computing Surveys (CSUR), v.11 n.4, p.371-395, Dec. 1979</name><name>O H Sockut and B I% lyer l~eorganizing databases concurrently with usage' A survey. Technlcal Report T16 0;5 488, IBM, Santa Teresa Laboratory, San Jose, CA, June 199;5</name><name>G S Smith Onllne reorgan,zat]on of key-sequenced tables and files Tandem System Review, 6(2).pp 52-59, Oct 1990.</name><name>P. Scheuermann, G Welkum, and P Zabback Autornatlc tuning of data placement and load balancing in disk arrays. Technical 16eport DBS-92-91, ETH-Zurlch, Department of Computer Sc,ence, ETH-Zentrum, CH-8092 Zur*ch~ Sw,tzerland, 1992</name><name>P. Scheuermann, O. Weikum, and P Zabback Disk coollng ,n parallel disk systems. Bulletin of the Technlcal Committee on Data I~ng[neering, 17(3),29-40, September 1994.</name><name>J Tro~si NonStop ava,lab~llty and database configuration operatKons Tandem Systems 16evlew, 10(;5)'18-2;5, July 1994</name><name>Patrick Valduriez, Parallel database systems: open problems and new issues, Distributed and Parallel Databases, v.1 n.2, p.137-165, April 1993</name><name>Gerhard Weikum , Peter Zabback , Peter Scheuermann, Dynamic file allocation in disk arrays, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.406-415, May 29-31, 1991, Denver, Colorado, United States</name></citation><abstract>Whenever data is moved across nodes in the parallel database system, the indexes need to be modified too. Index modification overhead can be quite severe because there can be a large number of indexes on a relation. In this paper, we study two alternatives to index modification, namely OAT (One-At-a-Time page movement) and BULK (bulk page movement). OAT and BULK are two extremes on the spectrum of the granularity of data movement. OAT and BULK differ in two respects: first, OAT uses very little additional disk space (at most one extra page), whereas BULK uses a large amount of disk space. Second, BULK uses sequential prefetch I/O to optimize on the number of I/Os during index modification, while OAT does not. Using an experimental testbed, we show that BULK is an order of magnitude faster than OAT. In terms of the impact on transaction performance during reorganization, BULK and OAT perform differently: when the number of indexes to be modified is either one or two, OAT has a lesser impact on the transaction performance degradation. However, when the number of indexes is greater than two, both techniques have the same impact on transaction performance.</abstract></paper><paper><title>Query caching and optimization in distributed mediator systems</title><author><AuthorName>S. Adali</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>K. S. Candan</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Y. Papakonstantinou</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>V. S. Subrahmanian</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation></citation><abstract>Query processing and optimization in mediator systems that access distributed non-proprietary sources pose many novel problems. Cost-based query optimization is hard because the mediator does not have access to source statistics information and furthermore it may not be easy to model the source's performance. At the same time, querying remote sources may be very expensive because of high connection overhead, long computation time, financial charges, and temporary unavailability. We propose a cost-based optimization technique that caches statistics of actual calls to the sources and consequently estimates the cost of the possible execution plans based on the statistics cache. We investigate issues pertaining to the design of the statistics cache and experimentally analyze various tradeoffs. We also present a query result caching mechanism that allows us to effectively use results of prior queries when the source is not readily available. We employ the novel invariants mechanism, which shows how semantic information about data sources may be used to discover cached query results of interest.</abstract></paper><paper><title>Performance tradeoffs for client-server query processing</title><author><AuthorName>Michael J. Franklin</AuthorName><institute><InstituteName>Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland. College Park</InstituteName><country></country></institute></author><author><AuthorName>Bj&#246;rn Th&#243;r J&#243;nsson</AuthorName><institute><InstituteName>Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland. College Park</InstituteName><country></country></institute></author><author><AuthorName>Donald Kossmann</AuthorName><institute><InstituteName>Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland. College Park</InstituteName><country></country></institute></author><year>1996</year><conference>International Conference on Management of Data</conference><citation><name>K. Brown. PRPL: A database workload specification language~ Master's thesis, Univ. of Wisconsin, 1992.</name><name>Michael J. Carey , David J. DeWitt , Michael J. Franklin , Nancy E. Hall , Mark L. McAuliffe , Jeffrey F. Naughton , Daniel T. Schuh , Marvin H. Solomon , C. K. Tan , Odysseas G. Tsatalos , Seth J. White , Michael J. Zwilling, Shoring up persistent applications, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.383-394, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>R.G.G. Cattetl, editor. Object Database Standard. Morgan- Kaufmann Publishers, San Mateo, 1994.</name><name>Michael J. Carey , Hongjun Lu, Load balancing in a locally distributed DB system, Proceedings of the 1986 ACM SIGMOD international conference on Management of data, p.108-119, May 28-30, 1986, Washington, D.C., United States</name><name>David J. DeWitt , Philippe Futtersack , David Maier , Fernando Velez, A study of three alternative workstation server architectures for object-oriented database systems, Proceedings of the sixteenth international conference on Very large databases, p.107-121, September 1990, Brisbane, Australia</name><name>M. Franklin. Client Data Caching. Kluwer Academic Press, Boston, 1996.</name><name>Sumit Ganguly , Waqar Hasan , Ravi Krishnamurthy, Query optimization for parallel execution, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.9-18, June 02-05, 1992, San Diego, California, United States</name><name>Goetz Graefe, Query evaluation techniques for large databases, ACM Computing Surveys (CSUR), v.25 n.2, p.73-169, June 1993</name><name>S. Ganguly and W. Wang. Optimizing queries for coarse grain parallelism. Technical report, Rutgers University, 1993.</name><name>Robert Brian Hagmann , Domenico Ferrari, Performance analysis of several back-end database architectures, ACM Transactions on Database Systems (TODS), v.11 n.1, p.1-26, March 1986</name><name>Y. E. Ioannidis , Younkyung Kang, Randomized algorithms for optimizing large join queries, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.312-321, May 23-26, 1990, Atlantic City, New Jersey, United States</name><name>B. Paul Jenq , Darrell Woelk , Won Kim , Wan-Lik Lee, Query processing in distributed ORION, Proceedings of the international conference on extending database technology on Advances in database technology, p.169-187, March 1990, Venice, Italy</name><name>Donald Kossmann , Michael J. Franklin, A study of query execution strategies for client-server database systems, University of Maryland at College Park, College Park, MD, 1995</name><name>Krishna G. Kulkarni, Object-oriented extensions in SQL3: a status report, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.478, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>H. Lu and M. Carey. Some experimental results on distributed join algorithms in a local network. 11th VLDB Conf., Stockholm, 1985.</name><name>Rosana S. G. Lanzelotte , Patrick Valduriez , Mohamed Za&#239;t, On the Effectiveness of Optimization Search Strategies for Parallel Execution Spaces, Proceedings of the 19th International Conference on Very Large Data Bases, p.493-504, August 24-27, 1993</name><name>Lothar F. Mackert , Guy M. Lohman, R* Optimizer Validation and Performance Evaluation for Distributed Queries, Proceedings of the 12th International Conference on Very Large Data Bases, p.149-159, August 25-28, 1986</name><name>T. P. Martin , K. H. Lam , Judy I. Russell, An evaluation of site selection algorithms for distributed query processing, The Computer Journal, v.33 n.1, p.61-70, Feb. 1990</name><name>Jignesh M. Patel , Michael J. Carey , Mary K. Vernon, Accurate modeling of the hybrid hash join algorithm, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.56-66, May 16-20, 1994, Nashville, Tennessee, United States</name><name>N. Roussopoulos, et al. The ADMS project: Views "R" us. IEEE Data Engeneering Bulletin, 18(2), 1995.</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name><name>M. Stonebraker, et aI. Third generation data base system manifesto. Technical Report, UC Berkeley, t990.</name><name>Michael Stonebraker , Paul M. Aoki , Robert Devine , Witold Litwin , Michael A. Olson, Mariposa: A New Architecture for Distributed Data,
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -