Skip to main content

A survey of DBMS research issues in supporting very large tables

  • Invited Talk
  • Conference paper
  • First Online:
Foundations of Data Organization and Algorithms (FODO 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 730))

Abstract

A number of interesting problems arise in supporting the efficient and flexible storage, maintenance and manipulation of large volumes of data (e.g., >100 gigabytes of data in a single table). Very large tables are becoming common. Typically, high availability is an important requirement for such data. The currently-popular relational DBMSs have been very slow in providing the needed support. To make it possible for RDBMSs to be deployed for managing many large enterprises' operational data and to support complex queries efficiently, these features are very crucial. We discuss some of the issues involved in improving the availability and efficient accessibility of partitioned tables via parallelism, fine-granularity locking, transient versioning and partition independence. We outline some solutions that have been proposed. These solutions relate to algorithms for index building, utilities for fuzzy backups, incremental recovery and reorganization, buffer management, transient versioning, concurrency control and record management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A. Mining Association Rules Between Set of Items in Large Databases, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.

    Google Scholar 

  2. Boral, H., Alexander, W., Clay, L., Copeland, G., Danforth, S., Franklin, M., Hart, B., Smith, M., Valduriez, P. Prototyping Bubba, a Highly Parallel Database System, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.

    Google Scholar 

  3. Bober, P., Carey, M. On Mixing Queries and Transactions Via Multiversion Locking, Proc. 8th International Conference on Data Engineering, Tempe, February 1992.

    Google Scholar 

  4. Borr, A. Robustness to Crash in a Distributed Database: A Non Shared-Memory Multi-Processor Approach, Proc. 10th International Conference on Very Large Data Bases, Singapore, August 1984.

    Google Scholar 

  5. Carey, M., Haas, L., Livny, M. Tapes Hold Data, Too: Challenges of Tuples on Tertiary Store, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.

    Google Scholar 

  6. Carino, F., Kostamaa, P. Exegesis of DBC/1012 and P-90 — Industrial Supercomputer Database Machines, Proc. 4th International PARLE Conference, Paris, June 1992, Springer-Verlag.

    Google Scholar 

  7. Crus, R., Engles, R., Haderle, D., Herron, H. Method for Referential Constraint Enforcement in a Database Management System, U.S. Patent 4,947,320, IBM, August 1990.

    Google Scholar 

  8. Cheng, J., Haderle, D., Hedges, R., Iyer, B., Messinger, T., Mohan, C., Wang, Y. An Efficient Hybrid Join Algorithm: A DB2 Prototype, Proc. 7th International Conference on Data Engineering, Kobe, April 1991. A longer version of this paper is available as IBM Research Report RJ7884, IBM Almaden Research Center, December 1990.

    Google Scholar 

  9. Choy, D., Mohan, C. Locking Protocols for Two-Tier Indexing of Partitioned Data, IBM Research Report, IBM Almaden Research Center, June 1993.

    Google Scholar 

  10. Cohen, E., King, G., Brady, J. Storage Hierarchies, IBM Systems Journal, Vol. 28, No. 1, 1989.

    Google Scholar 

  11. Crus, R., Haderle, D., Teng, J. Method for Minimizing Locking and Reading in a Segmented Storage Space, U.S. Patent 4,961,134, IBM, October 1990.

    Google Scholar 

  12. Crus, R. Data Recovery in IBM Database 2, IBM Systems Journal, Vol. 23, No. 2, 1984.

    Google Scholar 

  13. Davison, W. Parallel Index Building in Informix OnLine 6.0, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992.

    Google Scholar 

  14. DeWitt, D., Gray, J. Parallel Database Systems: The Future of Database Processing or a Passing Fad?, ACM SIGMOD Record, Volume 19, Number 4, Decemeber 1990.

    Google Scholar 

  15. DeWitt, D., Ghandeharizadeh, S., Schneider, D., Bricker, A., Hsiao, H.-I, Rasmussen, R. The Gamma Database Machine Project, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.

    Google Scholar 

  16. Dias, D., Iyer, B., Robinson, J., Yu, P. Integrated Concurrency-Coherency Controls for Multisystem Data Sharing, IEEE Transactions on Software Engineering, Vol. 15, No. 4, April 1989.

    Google Scholar 

  17. Frawley, W., Piatetsky-Shapiro, G., Matheus, C. Knowledge Discovery in Databases: An Overview, In Knowledge Discovery in Databases, G. Piatetsky-Shapiro, W. Frawley (Eds.), The MIT Press, 1991.

    Google Scholar 

  18. Gawlick, D., Kinkade, D. Varieties of Concurrency Control in IMS/VS Fast Path, IEEE Database Engineering, Vol. 8, No. 2, June 1985.

    Google Scholar 

  19. Garcia-Molina, H., Polyzois, C. Issues in Disaster Recovery, Proc. IEEE Compcon Spring '90, March 1990.

    Google Scholar 

  20. Gray, J., McJones, P., Blasgen, M., Lindsay, B., Lorie, R., Price, T., Putzolu, F., Traiger, I. The Recovery Manager of the System R Database Manager, ACM Computing Surveys, Vol. 13, No. 2, June 1981.

    Google Scholar 

  21. Gray, J. Notes on Data Base Operating Systems, In Operating Systems — An Advanced Course, R. Bayer, R. Graham, and G. Seegmuller (Eds.), Lecture Notes in Computer Science, Volume 60, Springer-Verlag, 1978.

    Google Scholar 

  22. Gray, J., Walker, M. Parity Striping of Disc Arrays: Low-Cost Reliable Storage with Acceptable Throughput, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990.

    Google Scholar 

  23. Haderle, D., Jackson, R. IBM Database 2 Overview, IBM Systems Journal, Vol. 23, No. 2, 1984.

    Google Scholar 

  24. Hauser, D., Shibamiya, A. Evolution of DB2 Performance, InfoDB, Summer 1992.

    Google Scholar 

  25. Haderle, D., Watts, J. Method for Enforcing Referential Constraints in a Database Management System, U.S. Patent 4,933,848, IBM, June 1990.

    Google Scholar 

  26. Hvasshovd, S., Saeter, T., Torbjornsen, O., Moe, P., Risnes, O. A Continuously Available and Highly Scalable Transaction Server: Design Experience from the HypRa Project, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991.

    Google Scholar 

  27. IMS/VS Extended Recovery Facility (XRF): General Information, Document Number GG24-3150, IBM, March 1987.

    Google Scholar 

  28. Database 2 — The Competitive Edge, Document Number G520-6905-00, IBM, November 1991.

    Google Scholar 

  29. DB2 V2.3 Nondistributed Performance Topics, Document Number GG24-3823, IBM, August 1992.

    Google Scholar 

  30. Iyer, B., Dias, D. System Issues in Parallel Sorting for Database Systems, Proc. 6th IEEE International Conference on Data Engineering, Los Angeles, February 1990.

    Google Scholar 

  31. Krishnamurthy, R., Imielinski, T. Research Directions in Knowledge Discovery, ACM SIGMOD Record, Volume 20, Number 3, September 1991.

    Google Scholar 

  32. Lomet, D., Salzberg, B. Access Methods for Multiversion Data, Proc. ACM SIGMOD International Conference on Management of Data, Portland, May 1989.

    Google Scholar 

  33. Lomet, D., Salzberg, B. Rollback Databases, Technical Report CRL 92/1, DEC Cambridge Research Laboratory, January 1992.

    Google Scholar 

  34. Lyon, J. Tandem's Remote Data Facility, Proc. IEEE Compcon Spring '90, March 1990.

    Google Scholar 

  35. Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992. Also available as IBM Research Report RJ6649, IBM Almaden Research Center, January 1989; Revised November 1990.

    Google Scholar 

  36. Mohan, C., Haderle, D., Wang, Y., Cheng, J. Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques, Proc. International Conference on Extending Data Base Technology, Venice, March 1990. An expanded version of this paper is available as IBM Research Report RJ7341, IBM Almaden Research Center, March 1990.

    Google Scholar 

  37. Mohan, C. ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiactton Transactions Operating on B-Tree Indexes, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990. A different version of this paper is available as IBM Research Report RJ7008, IBM Almaden Research Center, September 1989.

    Google Scholar 

  38. Mohan, C. Comit_LSN: A Novel and Simple Method for Reducing Locking and Latching in Transaction Processing Systems, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990. Also available as IBM Research Report RJ7344, IBM Almaden Research Center, February 1990.

    Google Scholar 

  39. Mohan, C. Interactions Between Query Optimization and Concurrency Control, Proc. 2nd International Workshop on Research Issues on Data Engineering: Transaction and Query Processing, Tempe, February 1992. Also available as IBM Research Report RJ8681, IBM Almaden Research Center, March 1992.

    Google Scholar 

  40. Mohan, C. IBM's Relational DBMS Products: Features and Technologies, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.

    Google Scholar 

  41. Mohan, C. A Cost-Effective Method for Providing Improved Data Availability During DBMS Restart Recovery After a Failure, Proc. 19th International Conference on Very Large Data Bases, Dublin, August 1993. Also available as IBM Research Report RJ8114, IBM Almaden Research Center, May 1991.

    Google Scholar 

  42. Mohan, C., Levine, F. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. A longer version of this paper is available as IBM Research Report RJ6846, IBM Almaden Research Center, August 1989; Revised June 1991.

    Google Scholar 

  43. Mohan, C., Narang, I. Recovery and Coherency-Control Protocols for Fast Intersystem Page Transfer and Fine-Granularity Locking in a Shared Disks Transaction Environment, Proc. 17th International Conference on Very Large Data Bases, Barcelona, September 1991. A longer version of this paper is available as IBM Research Report RJ8017, IBM Almaden Research Center, March 1991.

    Google Scholar 

  44. Mohan, C., Narang, I. Efficient Locking and Caching of Data in the Multisystem Shared Disks Transaction Environment, Proc. International Conference on Extending Data Base Technology, Vienna, March 1992. Also available as IBM Research Report RJ8301, IBM Almaden Research Center, August 1991.

    Google Scholar 

  45. Mohan, C., Narang, I. Data Base Recovery in Shared Disks and Client-Server Architectures, Proc. 12th International Conference on Distributed Computing Systems, Yokohama, June 1992. Also available as IBM Research Report RJ8685, IBM Almaden Research Center, March 1992.

    Google Scholar 

  46. Mohan, C., Narang, I. Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. A longer version of this paper is available as IBM Research Report RJ8016, IBM Almaden Research Center, March 1991.

    Google Scholar 

  47. Mohan, C., Narang, I. An Efficient and Flexible Method for Archiving a Data Base, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.

    Google Scholar 

  48. Mohan, C., Narang, I., Silen, S. Solutions to Hot Spot Problems in a Shared Disks Transaction Environment, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991. Also available as IBM Research Report RJ8281, IBM Almaden Research Center, August 1991.

    Google Scholar 

  49. Mohan, C., Pirahesh, H. ARIES-RRH: Restricted Repeating of History in the ARIES Transaction Recovery Method, Proc. 7th International Conference on Data Engineering, Kobe, April 1991. Also available as IBM Research Report RJ7342, IBM Almaden Research Center, February 1990.

    Google Scholar 

  50. Mohan, C., Pirahesh, H., Lorie, R. Efficient and Flexible Methods for Transient Versioning of Records to Avoid Locking by Read-Only Transactions, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. Also available as IBM Research Report RJ8683, IBM Almaden Research Center, March 1992.

    Google Scholar 

  51. Moore, M., Sodhi, A. Parallelism in NonStop SQL Release 2, Tandem Systems Review, Vol. 6, No. 2, October 1990.

    Google Scholar 

  52. Mohan, C., Treiber, K., Obermarck, R. Algorithms for the Management of Remote Backup Data Bases for Disaster Recovery, Proc. 9th International Conference on Data Engineering, Vienna, April 1993. Also available as IBM Research Report RJ7885, IBM Almaden Research Center, December 1990; Revised June 1991.

    Google Scholar 

  53. Olken, F., Rotem, D. Random Sampling from B + -trees, Proc. 15th International Conference on Very Large Data Bases, Amsterdam, August 1989.

    Google Scholar 

  54. Omerza, R. United Parcel Service DIALS Overview, Proc. 4th Annual International DB2 User Group Conference, New York, May 1992.

    Google Scholar 

  55. Patterson, D., Gibson, G., Katz, R. A Case for Redundant Arrays of Inexpensive Disks (RAID), Proc. ACM-SIGMOD International Conference on Management of Data, Chicago, May 1988.

    Google Scholar 

  56. Pirahesh, H., Mohan, C., Cheng, J., Liu, T.S., Selinger, P. Parallelism in Relational Data Base Systems: Architectural Issues and Design Approaches, Proc. 2nd International Symposium on Databases in Parallel and Distributed Systems, Dublin, July 1990, IEEE Computer Society Press. An expanded version of this paper is available as IBM Research Report RJ7724, IBM Almaden Research Center, October 1990.

    Google Scholar 

  57. Polyzois, C. Disaster Recovery for Transaction Processing Systems, PhD Thesis, Princeton University, June 1992.

    Google Scholar 

  58. Pong, M. An Overview of NonStop SQL Release 2, Tandem Systems Review, Vol. 6, No. 2, October 1990.

    Google Scholar 

  59. Rahm, E. Recovery Concepts for Data Sharing Systems, Proc. 21st International Symposium on Fault-Tolerant Computing, Montreal, June 1991.

    Google Scholar 

  60. Rahm, E. Parallel Query Processing in Shared Disk Database Systems, Technical Report 1/93, University of Kaiserslautern, March 1993.

    Google Scholar 

  61. Raghavan, A., Rengarajan, T.K. Database Availability for Transaction Processing, Digital Technical Journal, Vol. 3, No. 1, Winter 1991.

    Google Scholar 

  62. Rengarajan, T.K., Spiro, P., Wright, W. High Availability Mechanisms of VAX DBMS Software, Digital Technical Journal, No. 8, February 1989.

    Google Scholar 

  63. Rothermel, K., Mohan, C. ARIES/NT: A Recovery Method Based on Write-Ahead Logging for Nested Transactions, Proc. 15th International Conference on Very Large Data Bases, Amsterdam, August 1989. A longer version appears as IBM Research Report RJ6650, IBM Almaden Research Center, January 1989.

    Google Scholar 

  64. Salzberg, B., Dimock, A. Principles of Transaction-Based On-Line Reorganization, Proc. 18th International Conference on Very Large Data Bases, Vancouver, August 1992.

    Google Scholar 

  65. Scrutchin, T. TPF: Performance, Capacity, Availability, Proc. IEEE Compcon Spring '87, San Francisco, February 1987.

    Google Scholar 

  66. Stonebraker, M., Frew, J., Gardels, K., Meredith, J. The Sequoia 2000 Storage Benchmark, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.

    Google Scholar 

  67. Silberschatz, A., Stonebraker, M., Ullman, J. (Eds.) Database Systems: Achievements and Opportunities, Communications of the ACM, Volume 34, Number 10, October 1991.

    Google Scholar 

  68. Smith, G. Online Reorganization of Key-Sequenced Tables and Files, Tandem Systems Review, Vol. 6, No. 2, October 1990.

    Google Scholar 

  69. Sockut, G., Iyer, B. Reorganizing Databases Concurrently with Usage: A Survey, Technical Report TR 03.488, IBM Santa Teresa Laboratory, June 1993.

    Google Scholar 

  70. Srinivasan, V., Carey, M. On-Line Index Construction Algorithms, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991.

    Google Scholar 

  71. Srinivasan, V. On-Line Processing in Large-Scale Transaction Systems, PhD Thesis, Technical Report 1071, University of Wisconsin at Madison.

    Google Scholar 

  72. Stonebraker, M. The Design of the POSTGRES Storage System, Proc. 13th International Conference on Very Large Data Bases, Brighton, September 1987.

    Google Scholar 

  73. Stonebraker, M. Architecture of Future Data Base Systems, Data Engineering, Volume 13, Number 4, Decemeber 1990.

    Google Scholar 

  74. Stonebraker, M. Managing Persistent Objects in a Multi-Level Store, Proc. ACM-SIGMOD International Conference on Management of Data, Denver, May 1991.

    Google Scholar 

  75. Teng, J., Gumaer, R. Managing IBM Database 2 Buffers to Maximize Performance, IBM Systems Journal, Vol. 23, No. 2, 1984.

    Google Scholar 

  76. Tsur, S. Data Dredging, Data Engineering, Volume 13, Number 4, Decemeber 1990.

    Google Scholar 

  77. Witowski, A., Carino, F., Kostamma, P. NCR 3700 — The Next Generation Industrial Database Computer, Proc. 19th International Conference on Very Large Data Bases, Dublin, August 1993.

    Google Scholar 

  78. Young, C. A 1.4 Terabyte Database Faces Utilities, Proc. 5th Annual IDUG North American Conference, Dallas, May 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

David B. Lomet

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mohan, C. (1993). A survey of DBMS research issues in supporting very large tables. In: Lomet, D.B. (eds) Foundations of Data Organization and Algorithms. FODO 1993. Lecture Notes in Computer Science, vol 730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57301-1_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-57301-1_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57301-2

  • Online ISBN: 978-3-540-48047-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics