Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning

Holt, John D.; Chung, Soon M.

doi:10.1007/3-540-44466-1_29

John D. Holt⁷ &
Soon M. Chung⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1874))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

516 Accesses
1 Citations

Abstract

In this paper, we propose a new algorithm named Inverted Hashing and Pruning (IHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently, because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, the Apriori algorithm [1] and Direct Hashing and Pruning (DHP) algorithm [5], are evaluated in the context of mining text databases, and are compared with the proposed IHP algorithm. It has been shown that the IHP algorithm has better performance for large text databases.

This research was supported in part by Ohio Board of Regents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. of the 20th VLDB Conf., 1994, pp. 487–499.
Google Scholar
S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc of the ACM SIGMOD Int’l Conf. on Management of Data, 1997, pp. 255–264.
Google Scholar
M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6, Dec. 1996, pp. 866–883.
Article Google Scholar
M. Gordon and S. Dumais, “Using Latent Semantic Indexing for Literature Based Discovery,” Journal of the Amer. Soc. of Info Science, Vol. 49, No. 8, June 1998, pp. 674–685.
Article Google Scholar
J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. on Knowledge and Data Engineering, Vol. 9, No. 5, Sep/Oct 1997, pp. 813–825.
Article Google Scholar
G. Salton, Automatic Text Processing: the transformation, analysis, and retrieval of information by computer, Addison-Wesley Publishing, 1988.
Google Scholar
A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. of the 21st VLDB Conf., 1995, pp. 432–444.
Google Scholar
H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. of the 22nd VLDB Conf., 1996, pp. 134–145.
Google Scholar
E. M. Voorhees and D. K. Harmon (editors), The Fifth Text Retrieval Conference, National Institute of Standards and Technology, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Engineering, Wright State University, Dayton, Ohio, 45435, USA
John D. Holt & Soon M. Chung

Authors

John D. Holt
View author publications
You can also search for this author in PubMed Google Scholar
Soon M. Chung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Yahiko Kambayashi
Computer Science Department, Western Michigan University, Kalamazoo, MI, 49008, USA
Mukesh Mohania
Vienna University of Technology, IFS, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A. Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holt, J.D., Chung, S.M. (2000). Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_29

Download citation

DOI: https://doi.org/10.1007/3-540-44466-1_29
Published: 06 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics