Abstract
In this paper, we propose a new algorithm named Inverted Hashing and Pruning (IHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently, because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, the Apriori algorithm [1] and Direct Hashing and Pruning (DHP) algorithm [5], are evaluated in the context of mining text databases, and are compared with the proposed IHP algorithm. It has been shown that the IHP algorithm has better performance for large text databases.
This research was supported in part by Ohio Board of Regents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. of the 20th VLDB Conf., 1994, pp. 487–499.
S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc of the ACM SIGMOD Int’l Conf. on Management of Data, 1997, pp. 255–264.
M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6, Dec. 1996, pp. 866–883.
M. Gordon and S. Dumais, “Using Latent Semantic Indexing for Literature Based Discovery,” Journal of the Amer. Soc. of Info Science, Vol. 49, No. 8, June 1998, pp. 674–685.
J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. on Knowledge and Data Engineering, Vol. 9, No. 5, Sep/Oct 1997, pp. 813–825.
G. Salton, Automatic Text Processing: the transformation, analysis, and retrieval of information by computer, Addison-Wesley Publishing, 1988.
A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. of the 21st VLDB Conf., 1995, pp. 432–444.
H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. of the 22nd VLDB Conf., 1996, pp. 134–145.
E. M. Voorhees and D. K. Harmon (editors), The Fifth Text Retrieval Conference, National Institute of Standards and Technology, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holt, J.D., Chung, S.M. (2000). Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_29
Download citation
DOI: https://doi.org/10.1007/3-540-44466-1_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive