Abstract
In this paper, we propose effective feature selection method using association word mining. Documents are represented as association-wordvectors that include a few words instead of single words. The focus in this paper is the association rule in reduction of a high dimensional feature space. The accuracy and recall of document classification depend on the number of words for composing association words, confidence, and support at Apriori algorithm. We show how confidence, support, and the number of words for composing association words at Apriori algorithm are selected efficiently. We have used Naive Bayes classifier on text data using proposed feature-vector document representation. By experiment for categorizing documents, we have proved that feature selection method of association word mining is more efficient than information gain and document frequency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
R. Agrawal and T. Imielinski and A. Swami, “Mining association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, 1993.
W.W. Cohen and Y. Singer, “Context sensitive learning methods for text categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–315, 1996.
V. Hatzivassiloglou and K. McKeown, “Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning,” Proceedings of the 31st Annual Meeting of the ACL, pp. 172–182, 1993.
D.D. Lewis, Representation and Learning in Information Retrieval, PhD thesis (Technical Report pp. 91–93, Computer Science Dept., Univ. of Massachussetts at Amherst, 1992.
D.D. Lewis and M. Ringuette, “Comparison of two Learning algorithms for text categorization,” Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, 1994.
Y.H. Li and A.K. Jain, “Classification of Text Documents,” Computer Journal, Vol. 41,No. 8, pp. 537–546, 1998.
T. Michael, Maching Learning, McGraw-Hill, pp. 154–200, 1997.
D. Mladenic, “Feature subset selection in text-learning,” Proceedings of the 10th European Conference on Machine Learning, pp. 95–100, 1998.
D, Mladenic and M. Grobelnik, “Feature selection for classification based on text hierarchy,” Proceedings of the Workshop on Learning from Text and the Web, 1998.
I. Moulinier and G. Raskinis and J. Ganascia, “Text categorization: a symbolic approach,” Proceedings of Fifth Annual Symposium on Document Analysis and Information Retrieval, 1996.
M. Pazzani, D. Billsus, Learning and Revising User Profiles: The Identification of Interesting Web Sites, Machine Learning 27, Kluwer Academic Publishers, pp. 313–331, 1997.
V. Rijsbergen and C. Joost, Information Retrieval, Butterworths, London-second edition, 1979.
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
E. Wiener and J.O. Pederson and A.S. Weigend, “A neural network approach to topic spotting,” Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, 1995.
P.C. Wong and P. Whitney and J. Thomas, “Visualizing Association Rules for Text Mining,” Proceedings of the 1999 IEEE Symposium on Information Visualization, pp. 120–123, 1999.
Y. Yang and C.G. Chute, “An example-based mapping method for text categorization and retrieval,” ACM Transaction on Information Systems, pp. 253–277, 1994.
Y. Yang and J.O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, 1997.
Cognitive Science Laboratory, Princeton University, “WordNet-a Lexical Database for English,” http://www.cogsci.princeton.edu/~wn/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ko, SJ., Lee, JH. (2001). Feature Selection Using Association Word Mining for Classification. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_22
Download citation
DOI: https://doi.org/10.1007/3-540-44759-8_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42527-4
Online ISBN: 978-3-540-44759-7
eBook Packages: Springer Book Archive