A data mining approach to database compression

Lee, Chin-Feng; Changchien, S. Wesley; Wang, Wei-Tse; Shen, Jau-Ji

doi:10.1007/s10796-006-8777-x

A data mining approach to database compression

Published: July 2006

Volume 8, pages 147–161, (2006)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Chin-Feng Lee¹,
S. Wesley Changchien²,
Wei-Tse Wang¹ &
…
Jau-Ji Shen³

267 Accesses
8 Citations
Explore all metrics

Abstract

Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Text Compression Algorithm - Data Mining Perspective

An optimal text compression algorithm based on frequent pattern mining

Article 15 July 2017

SR-Mine: Adaptive Transaction Compression Method for Frequent Itemsets Mining

Article 16 November 2021

References

Agostino SD. Parallelism and dictionary based data compression. Information Sciences 2001;135:43–56.
Article Google Scholar
Babu S, Garofalakis M, Rastogi R. SPARTAN: a model-based semantic compression system for massive data tables. In: Proceedings of ACM SIGMOD′;2001. 2001;283–294.
Balkenhol B, Kurtz S. Universal data compression based on the burrows-wheeler transformation: theory and practice. IEEE Transactions on Computers 2000;49(10):1043–1053.
Article Google Scholar
Bassiouni MA. Data compression in scientific and statistical databases. IEEE Transactions on Software Engineering 1985;SE-11(10):1047–1058.
Google Scholar
Bell TC, Witten IH, Cleary JG. Modeling for text compression. ACM Computing Surveys 1989;21:557–591.
Article Google Scholar
Bentley JL, Sleator DD, Tarjan RE, Wei VK. A locally adaptive data compression scheme. Communications of the ACM 1986;29(4):320–330.
Article Google Scholar
Cannane A, Williams HE. A compression scheme for large databases. In: Australasian Database Conference. 2000;22(2):6–11.
Google Scholar
Chang CC, Wang CH. A locally adaptive data compression strategy for chinese-english characters. Journal of Systems and Software 1997;36(2):167–179.
Article Google Scholar
Chang HKC, Chen SH. A new locally adaptive data compression scheme using multilist structure. The Computer Journal 1993;36(6):570–578.
Article Google Scholar
Changchien SW, Lu TC. Knowledge discovery from object-oriented databases using an association rules mining algorithm. In: Proceedings of the Fifth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies 2001;6:7–8.
Google Scholar
Cockshott WP, McGregor, Kotsis DN, Wilson J. Data compression in database systems. In: Proceedings International Database Engineering and Applications Symposium 1998.
Connack GV, Horspool RNS. Data compression using dynamic markov modeling. The Computer Journal 1987;30:541–550.
Google Scholar
Cormack GV. Data compression on a database system. Communications of the ACM 1985;28:1336–1342.
Article Google Scholar
Crochemore M, Mignosi F, Restivo A, Salemi S. Data compression using antidictionaries. In: Proceedings of the IEEE 2000;88(11):1756–1768.
Article Google Scholar
Effros M. PPM performance with BWT complexity: a fast and effective data compression algorithm. In: Proceedings of the IEEE 88(11), 2000:1703–1712.
Article Google Scholar
Elias P. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 1975;IT-21:194–203.
Article Google Scholar
Gallagher RG. Variations on a theme by huffman. IEEE Transactions on Information Theory 1978;IT-24:668–674.
Article Google Scholar
Gibson JD. Adaptive prediction in speech differential encoding system. In: Proceedings of the IEEE. 1980; 68:488–525.
Article Google Scholar
Goh CL, Aisaka K, Tsukamoto M, Harumoto K, Nishio S. Database compression with data mining methods. In: Proceedings of FODO′;98. 1998;97–106.
Huffman D. A method for the construction of minimum redundancy codes. In: Proceedings of IRE. 1951;1098–1101.
Knuth DE. Dynamic huffman coding. Journal of Algorithms 1985;6:163–180.
Article Google Scholar
Linoff G, Stanfill C. Compression of indexes with full positional information in very large text databases. In: Proceedings of the 16th International. ACM SIGIR Conference on Research and Development in Information Retrieval. 1993;88–95.
Moffat A, Zobel J. Text compression for dynamic document database. IEEE Transactions on Knowledge and Data Engineering 1997;9(2):302–313.
Article Google Scholar
Ng WK, Ravishankar CV. Block-oriented compression techniques for large statistical databases. IEEE Transactions on Knowledge and Data Engineering 1997;9(2):314–328.
Article Google Scholar
Pawlak Z. Rough sets. International Journal of Computer and Information Science 1982;11:341–356.
Article Google Scholar
Vitter JS. Design and analysis of dynamic huffman codes. Journal of ACM 1987;34:825–845.
Article Google Scholar
Welch TA. A technique for high-performance data compression. IEEE Computer 1984;17:8–19.
Google Scholar
Witten IH, Neal RM, Cleary JG. Arithmetic coding for data compression. Communications of ACM 1987;30(6):520–540.
Article Google Scholar
Yang EH, Kaltchenko A, Kieffer JC. Universal lossless data compression with side information by using a conditional MPM grammar transform. IEEE Transactions on Information Theory 2001;47(6):2130–2150.
Article Google Scholar
Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 1977;IT-23:337–343.
Article Google Scholar
Ziv J, Lempel A. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 1978;IT-24:530–536.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Management, Chaoyang University of Technology, 168, Jifong E. Rd., Wufong Township, Taichung County, 41349, Taiwan, R.O.C.
Chin-Feng Lee & Wei-Tse Wang
Institute of Electronic Commerce, National Chung-Hsing University, 250 Kuo Kuang Road, Taichung, 402, Taiwan, ROC
S. Wesley Changchien
Department of Information Management, National Chung Hsing University, 250, Kuo Kuang Rd., Taichung, 402, Taiwan, R.O.C.
Jau-Ji Shen

Authors

Chin-Feng Lee
View author publications
You can also search for this author in PubMed Google Scholar
S. Wesley Changchien
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Tse Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jau-Ji Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Wesley Changchien.

Additional information

Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C. She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information Engineering at National Chung Cheng University. Her current research interests include database design, image processing and data mining techniques.

S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database marketing, knowledge management, data mining, and decision support systems.

Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science and Technology. He is currently an associate professor of information management department in the National Chung Hsing University at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas focus on data engineering, database techniques and information security.

Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan, R.O.C. His research interests include data mining, XML, and database compression.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, CF., Changchien, S.W., Wang, WT. et al. A data mining approach to database compression. Inf Syst Front 8, 147–161 (2006). https://doi.org/10.1007/s10796-006-8777-x

Download citation

Received: 31 March 2004
Revised: 31 March 2004
Accepted: 13 September 2004
Issue Date: July 2006
DOI: https://doi.org/10.1007/s10796-006-8777-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A data mining approach to database compression

Abstract

Access this article

Similar content being viewed by others

An Efficient Text Compression Algorithm - Data Mining Perspective

An optimal text compression algorithm based on frequent pattern mining

SR-Mine: Adaptive Transaction Compression Method for Frequent Itemsets Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A data mining approach to database compression

Abstract

Access this article

Similar content being viewed by others

An Efficient Text Compression Algorithm - Data Mining Perspective

An optimal text compression algorithm based on frequent pattern mining

SR-Mine: Adaptive Transaction Compression Method for Frequent Itemsets Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation