Abstract
A number of interesting problems arise in supporting the efficient and flexible storage, maintenance and manipulation of large volumes of data (e.g., >100 gigabytes of data in a single table). Very large tables are becoming common. Typically, high availability is an important requirement for such data. The currently-popular relational DBMSs have been very slow in providing the needed support. To make it possible for RDBMSs to be deployed for managing many large enterprises' operational data and to support complex queries efficiently, these features are very crucial. We discuss some of the issues involved in improving the availability and efficient accessibility of partitioned tables via parallelism, fine-granularity locking, transient versioning and partition independence. We outline some solutions that have been proposed. These solutions relate to algorithms for index building, utilities for fuzzy backups, incremental recovery and reorganization, buffer management, transient versioning, concurrency control and record management.
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A. Mining Association Rules Between Set of Items in Large Databases, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.
Boral, H., Alexander, W., Clay, L., Copeland, G., Danforth, S., Franklin, M., Hart, B., Smith, M., Valduriez, P. Prototyping Bubba, a Highly Parallel Database System, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.
Bober, P., Carey, M. On Mixing Queries and Transactions Via Multiversion Locking, Proc. 8th International Conference on Data Engineering, Tempe, February 1992.
Borr, A. Robustness to Crash in a Distributed Database: A Non Shared-Memory Multi-Processor Approach, Proc. 10th International Conference on Very Large Data Bases, Singapore, August 1984.
Carey, M., Haas, L., Livny, M. Tapes Hold Data, Too: Challenges of Tuples on Tertiary Store, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.
Carino, F., Kostamaa, P. Exegesis of DBC/1012 and P-90 — Industrial Supercomputer Database Machines, Proc. 4th International PARLE Conference, Paris, June 1992, Springer-Verlag.
Crus, R., Engles, R., Haderle, D., Herron, H. Method for Referential Constraint Enforcement in a Database Management System, U.S. Patent 4,947,320, IBM, August 1990.
Cheng, J., Haderle, D., Hedges, R., Iyer, B., Messinger, T., Mohan, C., Wang, Y. An Efficient Hybrid Join Algorithm: A DB2 Prototype, Proc. 7th International Conference on Data Engineering, Kobe, April 1991. A longer version of this paper is available as IBM Research Report RJ7884, IBM Almaden Research Center, December 1990.
Choy, D., Mohan, C. Locking Protocols for Two-Tier Indexing of Partitioned Data, IBM Research Report, IBM Almaden Research Center, June 1993.
Cohen, E., King, G., Brady, J. Storage Hierarchies, IBM Systems Journal, Vol. 28, No. 1, 1989.
Crus, R., Haderle, D., Teng, J. Method for Minimizing Locking and Reading in a Segmented Storage Space, U.S. Patent 4,961,134, IBM, October 1990.
Crus, R. Data Recovery in IBM Database 2, IBM Systems Journal, Vol. 23, No. 2, 1984.
Davison, W. Parallel Index Building in Informix OnLine 6.0, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992.
DeWitt, D., Gray, J. Parallel Database Systems: The Future of Database Processing or a Passing Fad?, ACM SIGMOD Record, Volume 19, Number 4, Decemeber 1990.
DeWitt, D., Ghandeharizadeh, S., Schneider, D., Bricker, A., Hsiao, H.-I, Rasmussen, R. The Gamma Database Machine Project, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.
Dias, D., Iyer, B., Robinson, J., Yu, P. Integrated Concurrency-Coherency Controls for Multisystem Data Sharing, IEEE Transactions on Software Engineering, Vol. 15, No. 4, April 1989.
Frawley, W., Piatetsky-Shapiro, G., Matheus, C. Knowledge Discovery in Databases: An Overview, In Knowledge Discovery in Databases, G. Piatetsky-Shapiro, W. Frawley (Eds.), The MIT Press, 1991.
Gawlick, D., Kinkade, D. Varieties of Concurrency Control in IMS/VS Fast Path, IEEE Database Engineering, Vol. 8, No. 2, June 1985.
Garcia-Molina, H., Polyzois, C. Issues in Disaster Recovery, Proc. IEEE Compcon Spring '90, March 1990.
Gray, J., McJones, P., Blasgen, M., Lindsay, B., Lorie, R., Price, T., Putzolu, F., Traiger, I. The Recovery Manager of the System R Database Manager, ACM Computing Surveys, Vol. 13, No. 2, June 1981.
Gray, J. Notes on Data Base Operating Systems, In Operating Systems — An Advanced Course, R. Bayer, R. Graham, and G. Seegmuller (Eds.), Lecture Notes in Computer Science, Volume 60, Springer-Verlag, 1978.
Gray, J., Walker, M. Parity Striping of Disc Arrays: Low-Cost Reliable Storage with Acceptable Throughput, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990.
Haderle, D., Jackson, R. IBM Database 2 Overview, IBM Systems Journal, Vol. 23, No. 2, 1984.
Hauser, D., Shibamiya, A. Evolution of DB2 Performance, InfoDB, Summer 1992.
Haderle, D., Watts, J. Method for Enforcing Referential Constraints in a Database Management System, U.S. Patent 4,933,848, IBM, June 1990.
Hvasshovd, S., Saeter, T., Torbjornsen, O., Moe, P., Risnes, O. A Continuously Available and Highly Scalable Transaction Server: Design Experience from the HypRa Project, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991.
IMS/VS Extended Recovery Facility (XRF): General Information, Document Number GG24-3150, IBM, March 1987.
Database 2 — The Competitive Edge, Document Number G520-6905-00, IBM, November 1991.
DB2 V2.3 Nondistributed Performance Topics, Document Number GG24-3823, IBM, August 1992.
Iyer, B., Dias, D. System Issues in Parallel Sorting for Database Systems, Proc. 6th IEEE International Conference on Data Engineering, Los Angeles, February 1990.
Krishnamurthy, R., Imielinski, T. Research Directions in Knowledge Discovery, ACM SIGMOD Record, Volume 20, Number 3, September 1991.
Lomet, D., Salzberg, B. Access Methods for Multiversion Data, Proc. ACM SIGMOD International Conference on Management of Data, Portland, May 1989.
Lomet, D., Salzberg, B. Rollback Databases, Technical Report CRL 92/1, DEC Cambridge Research Laboratory, January 1992.
Lyon, J. Tandem's Remote Data Facility, Proc. IEEE Compcon Spring '90, March 1990.
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992. Also available as IBM Research Report RJ6649, IBM Almaden Research Center, January 1989; Revised November 1990.
Mohan, C., Haderle, D., Wang, Y., Cheng, J. Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques, Proc. International Conference on Extending Data Base Technology, Venice, March 1990. An expanded version of this paper is available as IBM Research Report RJ7341, IBM Almaden Research Center, March 1990.
Mohan, C. ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiactton Transactions Operating on B-Tree Indexes, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990. A different version of this paper is available as IBM Research Report RJ7008, IBM Almaden Research Center, September 1989.
Mohan, C. Comit_LSN: A Novel and Simple Method for Reducing Locking and Latching in Transaction Processing Systems, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990. Also available as IBM Research Report RJ7344, IBM Almaden Research Center, February 1990.
Mohan, C. Interactions Between Query Optimization and Concurrency Control, Proc. 2nd International Workshop on Research Issues on Data Engineering: Transaction and Query Processing, Tempe, February 1992. Also available as IBM Research Report RJ8681, IBM Almaden Research Center, March 1992.
Mohan, C. IBM's Relational DBMS Products: Features and Technologies, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.
Mohan, C. A Cost-Effective Method for Providing Improved Data Availability During DBMS Restart Recovery After a Failure, Proc. 19th International Conference on Very Large Data Bases, Dublin, August 1993. Also available as IBM Research Report RJ8114, IBM Almaden Research Center, May 1991.
Mohan, C., Levine, F. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. A longer version of this paper is available as IBM Research Report RJ6846, IBM Almaden Research Center, August 1989; Revised June 1991.
Mohan, C., Narang, I. Recovery and Coherency-Control Protocols for Fast Intersystem Page Transfer and Fine-Granularity Locking in a Shared Disks Transaction Environment, Proc. 17th International Conference on Very Large Data Bases, Barcelona, September 1991. A longer version of this paper is available as IBM Research Report RJ8017, IBM Almaden Research Center, March 1991.
Mohan, C., Narang, I. Efficient Locking and Caching of Data in the Multisystem Shared Disks Transaction Environment, Proc. International Conference on Extending Data Base Technology, Vienna, March 1992. Also available as IBM Research Report RJ8301, IBM Almaden Research Center, August 1991.
Mohan, C., Narang, I. Data Base Recovery in Shared Disks and Client-Server Architectures, Proc. 12th International Conference on Distributed Computing Systems, Yokohama, June 1992. Also available as IBM Research Report RJ8685, IBM Almaden Research Center, March 1992.
Mohan, C., Narang, I. Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. A longer version of this paper is available as IBM Research Report RJ8016, IBM Almaden Research Center, March 1991.
Mohan, C., Narang, I. An Efficient and Flexible Method for Archiving a Data Base, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.
Mohan, C., Narang, I., Silen, S. Solutions to Hot Spot Problems in a Shared Disks Transaction Environment, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991. Also available as IBM Research Report RJ8281, IBM Almaden Research Center, August 1991.
Mohan, C., Pirahesh, H. ARIES-RRH: Restricted Repeating of History in the ARIES Transaction Recovery Method, Proc. 7th International Conference on Data Engineering, Kobe, April 1991. Also available as IBM Research Report RJ7342, IBM Almaden Research Center, February 1990.
Mohan, C., Pirahesh, H., Lorie, R. Efficient and Flexible Methods for Transient Versioning of Records to Avoid Locking by Read-Only Transactions, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. Also available as IBM Research Report RJ8683, IBM Almaden Research Center, March 1992.
Moore, M., Sodhi, A. Parallelism in NonStop SQL Release 2, Tandem Systems Review, Vol. 6, No. 2, October 1990.
Mohan, C., Treiber, K., Obermarck, R. Algorithms for the Management of Remote Backup Data Bases for Disaster Recovery, Proc. 9th International Conference on Data Engineering, Vienna, April 1993. Also available as IBM Research Report RJ7885, IBM Almaden Research Center, December 1990; Revised June 1991.
Olken, F., Rotem, D. Random Sampling from B + -trees, Proc. 15th International Conference on Very Large Data Bases, Amsterdam, August 1989.
Omerza, R. United Parcel Service DIALS Overview, Proc. 4th Annual International DB2 User Group Conference, New York, May 1992.
Patterson, D., Gibson, G., Katz, R. A Case for Redundant Arrays of Inexpensive Disks (RAID), Proc. ACM-SIGMOD International Conference on Management of Data, Chicago, May 1988.
Pirahesh, H., Mohan, C., Cheng, J., Liu, T.S., Selinger, P. Parallelism in Relational Data Base Systems: Architectural Issues and Design Approaches, Proc. 2nd International Symposium on Databases in Parallel and Distributed Systems, Dublin, July 1990, IEEE Computer Society Press. An expanded version of this paper is available as IBM Research Report RJ7724, IBM Almaden Research Center, October 1990.
Polyzois, C. Disaster Recovery for Transaction Processing Systems, PhD Thesis, Princeton University, June 1992.
Pong, M. An Overview of NonStop SQL Release 2, Tandem Systems Review, Vol. 6, No. 2, October 1990.
Rahm, E. Recovery Concepts for Data Sharing Systems, Proc. 21st International Symposium on Fault-Tolerant Computing, Montreal, June 1991.
Rahm, E. Parallel Query Processing in Shared Disk Database Systems, Technical Report 1/93, University of Kaiserslautern, March 1993.
Raghavan, A., Rengarajan, T.K. Database Availability for Transaction Processing, Digital Technical Journal, Vol. 3, No. 1, Winter 1991.
Rengarajan, T.K., Spiro, P., Wright, W. High Availability Mechanisms of VAX DBMS Software, Digital Technical Journal, No. 8, February 1989.
Rothermel, K., Mohan, C. ARIES/NT: A Recovery Method Based on Write-Ahead Logging for Nested Transactions, Proc. 15th International Conference on Very Large Data Bases, Amsterdam, August 1989. A longer version appears as IBM Research Report RJ6650, IBM Almaden Research Center, January 1989.
Salzberg, B., Dimock, A. Principles of Transaction-Based On-Line Reorganization, Proc. 18th International Conference on Very Large Data Bases, Vancouver, August 1992.
Scrutchin, T. TPF: Performance, Capacity, Availability, Proc. IEEE Compcon Spring '87, San Francisco, February 1987.
Stonebraker, M., Frew, J., Gardels, K., Meredith, J. The Sequoia 2000 Storage Benchmark, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.
Silberschatz, A., Stonebraker, M., Ullman, J. (Eds.) Database Systems: Achievements and Opportunities, Communications of the ACM, Volume 34, Number 10, October 1991.
Smith, G. Online Reorganization of Key-Sequenced Tables and Files, Tandem Systems Review, Vol. 6, No. 2, October 1990.
Sockut, G., Iyer, B. Reorganizing Databases Concurrently with Usage: A Survey, Technical Report TR 03.488, IBM Santa Teresa Laboratory, June 1993.
Srinivasan, V., Carey, M. On-Line Index Construction Algorithms, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991.
Srinivasan, V. On-Line Processing in Large-Scale Transaction Systems, PhD Thesis, Technical Report 1071, University of Wisconsin at Madison.
Stonebraker, M. The Design of the POSTGRES Storage System, Proc. 13th International Conference on Very Large Data Bases, Brighton, September 1987.
Stonebraker, M. Architecture of Future Data Base Systems, Data Engineering, Volume 13, Number 4, Decemeber 1990.
Stonebraker, M. Managing Persistent Objects in a Multi-Level Store, Proc. ACM-SIGMOD International Conference on Management of Data, Denver, May 1991.
Teng, J., Gumaer, R. Managing IBM Database 2 Buffers to Maximize Performance, IBM Systems Journal, Vol. 23, No. 2, 1984.
Tsur, S. Data Dredging, Data Engineering, Volume 13, Number 4, Decemeber 1990.
Witowski, A., Carino, F., Kostamma, P. NCR 3700 — The Next Generation Industrial Database Computer, Proc. 19th International Conference on Very Large Data Bases, Dublin, August 1993.
Young, C. A 1.4 Terabyte Database Faces Utilities, Proc. 5th Annual IDUG North American Conference, Dallas, May 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mohan, C. (1993). A survey of DBMS research issues in supporting very large tables. In: Lomet, D.B. (eds) Foundations of Data Organization and Algorithms. FODO 1993. Lecture Notes in Computer Science, vol 730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57301-1_19
Download citation
DOI: https://doi.org/10.1007/3-540-57301-1_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57301-2
Online ISBN: 978-3-540-48047-1
eBook Packages: Springer Book Archive