Skip to main content
Log in

A data mining approach to database compression

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agostino SD. Parallelism and dictionary based data compression. Information Sciences 2001;135:43–56.

    Article  Google Scholar 

  • Babu S, Garofalakis M, Rastogi R. SPARTAN: a model-based semantic compression system for massive data tables. In: Proceedings of ACM SIGMOD′;2001. 2001;283–294.

  • Balkenhol B, Kurtz S. Universal data compression based on the burrows-wheeler transformation: theory and practice. IEEE Transactions on Computers 2000;49(10):1043–1053.

    Article  Google Scholar 

  • Bassiouni MA. Data compression in scientific and statistical databases. IEEE Transactions on Software Engineering 1985;SE-11(10):1047–1058.

    Google Scholar 

  • Bell TC, Witten IH, Cleary JG. Modeling for text compression. ACM Computing Surveys 1989;21:557–591.

    Article  Google Scholar 

  • Bentley JL, Sleator DD, Tarjan RE, Wei VK. A locally adaptive data compression scheme. Communications of the ACM 1986;29(4):320–330.

    Article  Google Scholar 

  • Cannane A, Williams HE. A compression scheme for large databases. In: Australasian Database Conference. 2000;22(2):6–11.

    Google Scholar 

  • Chang CC, Wang CH. A locally adaptive data compression strategy for chinese-english characters. Journal of Systems and Software 1997;36(2):167–179.

    Article  Google Scholar 

  • Chang HKC, Chen SH. A new locally adaptive data compression scheme using multilist structure. The Computer Journal 1993;36(6):570–578.

    Article  Google Scholar 

  • Changchien SW, Lu TC. Knowledge discovery from object-oriented databases using an association rules mining algorithm. In: Proceedings of the Fifth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies 2001;6:7–8.

    Google Scholar 

  • Cockshott WP, McGregor, Kotsis DN, Wilson J. Data compression in database systems. In: Proceedings International Database Engineering and Applications Symposium 1998.

  • Connack GV, Horspool RNS. Data compression using dynamic markov modeling. The Computer Journal 1987;30:541–550.

    Google Scholar 

  • Cormack GV. Data compression on a database system. Communications of the ACM 1985;28:1336–1342.

    Article  Google Scholar 

  • Crochemore M, Mignosi F, Restivo A, Salemi S. Data compression using antidictionaries. In: Proceedings of the IEEE 2000;88(11):1756–1768.

    Article  Google Scholar 

  • Effros M. PPM performance with BWT complexity: a fast and effective data compression algorithm. In: Proceedings of the IEEE 88(11), 2000:1703–1712.

    Article  Google Scholar 

  • Elias P. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 1975;IT-21:194–203.

    Article  Google Scholar 

  • Gallagher RG. Variations on a theme by huffman. IEEE Transactions on Information Theory 1978;IT-24:668–674.

    Article  Google Scholar 

  • Gibson JD. Adaptive prediction in speech differential encoding system. In: Proceedings of the IEEE. 1980; 68:488–525.

    Article  Google Scholar 

  • Goh CL, Aisaka K, Tsukamoto M, Harumoto K, Nishio S. Database compression with data mining methods. In: Proceedings of FODO′;98. 1998;97–106.

  • Huffman D. A method for the construction of minimum redundancy codes. In: Proceedings of IRE. 1951;1098–1101.

  • Knuth DE. Dynamic huffman coding. Journal of Algorithms 1985;6:163–180.

    Article  Google Scholar 

  • Linoff G, Stanfill C. Compression of indexes with full positional information in very large text databases. In: Proceedings of the 16th International. ACM SIGIR Conference on Research and Development in Information Retrieval. 1993;88–95.

  • Moffat A, Zobel J. Text compression for dynamic document database. IEEE Transactions on Knowledge and Data Engineering 1997;9(2):302–313.

    Article  Google Scholar 

  • Ng WK, Ravishankar CV. Block-oriented compression techniques for large statistical databases. IEEE Transactions on Knowledge and Data Engineering 1997;9(2):314–328.

    Article  Google Scholar 

  • Pawlak Z. Rough sets. International Journal of Computer and Information Science 1982;11:341–356.

    Article  Google Scholar 

  • Vitter JS. Design and analysis of dynamic huffman codes. Journal of ACM 1987;34:825–845.

    Article  Google Scholar 

  • Welch TA. A technique for high-performance data compression. IEEE Computer 1984;17:8–19.

    Google Scholar 

  • Witten IH, Neal RM, Cleary JG. Arithmetic coding for data compression. Communications of ACM 1987;30(6):520–540.

    Article  Google Scholar 

  • Yang EH, Kaltchenko A, Kieffer JC. Universal lossless data compression with side information by using a conditional MPM grammar transform. IEEE Transactions on Information Theory 2001;47(6):2130–2150.

    Article  Google Scholar 

  • Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 1977;IT-23:337–343.

    Article  Google Scholar 

  • Ziv J, Lempel A. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 1978;IT-24:530–536.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Wesley Changchien.

Additional information

Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C. She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information Engineering at National Chung Cheng University. Her current research interests include database design, image processing and data mining techniques.

S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database marketing, knowledge management, data mining, and decision support systems.

Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science and Technology. He is currently an associate professor of information management department in the National Chung Hsing University at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas focus on data engineering, database techniques and information security.

Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan, R.O.C. His research interests include data mining, XML, and database compression.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, CF., Changchien, S.W., Wang, WT. et al. A data mining approach to database compression. Inf Syst Front 8, 147–161 (2006). https://doi.org/10.1007/s10796-006-8777-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-006-8777-x

Keywords

Navigation