Abstract
Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods.
Similar content being viewed by others
References
Agostino SD. Parallelism and dictionary based data compression. Information Sciences 2001;135:43–56.
Babu S, Garofalakis M, Rastogi R. SPARTAN: a model-based semantic compression system for massive data tables. In: Proceedings of ACM SIGMOD′;2001. 2001;283–294.
Balkenhol B, Kurtz S. Universal data compression based on the burrows-wheeler transformation: theory and practice. IEEE Transactions on Computers 2000;49(10):1043–1053.
Bassiouni MA. Data compression in scientific and statistical databases. IEEE Transactions on Software Engineering 1985;SE-11(10):1047–1058.
Bell TC, Witten IH, Cleary JG. Modeling for text compression. ACM Computing Surveys 1989;21:557–591.
Bentley JL, Sleator DD, Tarjan RE, Wei VK. A locally adaptive data compression scheme. Communications of the ACM 1986;29(4):320–330.
Cannane A, Williams HE. A compression scheme for large databases. In: Australasian Database Conference. 2000;22(2):6–11.
Chang CC, Wang CH. A locally adaptive data compression strategy for chinese-english characters. Journal of Systems and Software 1997;36(2):167–179.
Chang HKC, Chen SH. A new locally adaptive data compression scheme using multilist structure. The Computer Journal 1993;36(6):570–578.
Changchien SW, Lu TC. Knowledge discovery from object-oriented databases using an association rules mining algorithm. In: Proceedings of the Fifth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies 2001;6:7–8.
Cockshott WP, McGregor, Kotsis DN, Wilson J. Data compression in database systems. In: Proceedings International Database Engineering and Applications Symposium 1998.
Connack GV, Horspool RNS. Data compression using dynamic markov modeling. The Computer Journal 1987;30:541–550.
Cormack GV. Data compression on a database system. Communications of the ACM 1985;28:1336–1342.
Crochemore M, Mignosi F, Restivo A, Salemi S. Data compression using antidictionaries. In: Proceedings of the IEEE 2000;88(11):1756–1768.
Effros M. PPM performance with BWT complexity: a fast and effective data compression algorithm. In: Proceedings of the IEEE 88(11), 2000:1703–1712.
Elias P. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 1975;IT-21:194–203.
Gallagher RG. Variations on a theme by huffman. IEEE Transactions on Information Theory 1978;IT-24:668–674.
Gibson JD. Adaptive prediction in speech differential encoding system. In: Proceedings of the IEEE. 1980; 68:488–525.
Goh CL, Aisaka K, Tsukamoto M, Harumoto K, Nishio S. Database compression with data mining methods. In: Proceedings of FODO′;98. 1998;97–106.
Huffman D. A method for the construction of minimum redundancy codes. In: Proceedings of IRE. 1951;1098–1101.
Knuth DE. Dynamic huffman coding. Journal of Algorithms 1985;6:163–180.
Linoff G, Stanfill C. Compression of indexes with full positional information in very large text databases. In: Proceedings of the 16th International. ACM SIGIR Conference on Research and Development in Information Retrieval. 1993;88–95.
Moffat A, Zobel J. Text compression for dynamic document database. IEEE Transactions on Knowledge and Data Engineering 1997;9(2):302–313.
Ng WK, Ravishankar CV. Block-oriented compression techniques for large statistical databases. IEEE Transactions on Knowledge and Data Engineering 1997;9(2):314–328.
Pawlak Z. Rough sets. International Journal of Computer and Information Science 1982;11:341–356.
Vitter JS. Design and analysis of dynamic huffman codes. Journal of ACM 1987;34:825–845.
Welch TA. A technique for high-performance data compression. IEEE Computer 1984;17:8–19.
Witten IH, Neal RM, Cleary JG. Arithmetic coding for data compression. Communications of ACM 1987;30(6):520–540.
Yang EH, Kaltchenko A, Kieffer JC. Universal lossless data compression with side information by using a conditional MPM grammar transform. IEEE Transactions on Information Theory 2001;47(6):2130–2150.
Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 1977;IT-23:337–343.
Ziv J, Lempel A. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 1978;IT-24:530–536.
Author information
Authors and Affiliations
Corresponding author
Additional information
Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C. She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information Engineering at National Chung Cheng University. Her current research interests include database design, image processing and data mining techniques.
S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database marketing, knowledge management, data mining, and decision support systems.
Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science and Technology. He is currently an associate professor of information management department in the National Chung Hsing University at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas focus on data engineering, database techniques and information security.
Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan, R.O.C. His research interests include data mining, XML, and database compression.
Rights and permissions
About this article
Cite this article
Lee, CF., Changchien, S.W., Wang, WT. et al. A data mining approach to database compression. Inf Syst Front 8, 147–161 (2006). https://doi.org/10.1007/s10796-006-8777-x
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10796-006-8777-x