Feature Selection Using Association Word Mining for Classification

Ko, Su-Jeong; Lee, Jung-Hyun

doi:10.1007/3-540-44759-8_22

Su-Jeong Ko⁸ &
Jung-Hyun Lee⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2113))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

559 Accesses
8 Citations

Abstract

In this paper, we propose effective feature selection method using association word mining. Documents are represented as association-wordvectors that include a few words instead of single words. The focus in this paper is the association rule in reduction of a high dimensional feature space. The accuracy and recall of document classification depend on the number of words for composing association words, confidence, and support at Apriori algorithm. We show how confidence, support, and the number of words for composing association words at Apriori algorithm are selected efficiently. We have used Naive Bayes classifier on text data using proposed feature-vector document representation. By experiment for categorizing documents, we have proved that feature selection method of association word mining is more efficient than information gain and document frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
Google Scholar
R. Agrawal and T. Imielinski and A. Swami, “Mining association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, 1993.
Google Scholar
W.W. Cohen and Y. Singer, “Context sensitive learning methods for text categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–315, 1996.
Google Scholar
V. Hatzivassiloglou and K. McKeown, “Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning,” Proceedings of the 31st Annual Meeting of the ACL, pp. 172–182, 1993.
Google Scholar
D.D. Lewis, Representation and Learning in Information Retrieval, PhD thesis (Technical Report pp. 91–93, Computer Science Dept., Univ. of Massachussetts at Amherst, 1992.
Google Scholar
D.D. Lewis and M. Ringuette, “Comparison of two Learning algorithms for text categorization,” Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, 1994.
Google Scholar
Y.H. Li and A.K. Jain, “Classification of Text Documents,” Computer Journal, Vol. 41,No. 8, pp. 537–546, 1998.
Article MATH Google Scholar
T. Michael, Maching Learning, McGraw-Hill, pp. 154–200, 1997.
Google Scholar
D. Mladenic, “Feature subset selection in text-learning,” Proceedings of the 10th European Conference on Machine Learning, pp. 95–100, 1998.
Google Scholar
D, Mladenic and M. Grobelnik, “Feature selection for classification based on text hierarchy,” Proceedings of the Workshop on Learning from Text and the Web, 1998.
Google Scholar
I. Moulinier and G. Raskinis and J. Ganascia, “Text categorization: a symbolic approach,” Proceedings of Fifth Annual Symposium on Document Analysis and Information Retrieval, 1996.
Google Scholar
M. Pazzani, D. Billsus, Learning and Revising User Profiles: The Identification of Interesting Web Sites, Machine Learning 27, Kluwer Academic Publishers, pp. 313–331, 1997.
Article Google Scholar
V. Rijsbergen and C. Joost, Information Retrieval, Butterworths, London-second edition, 1979.
Google Scholar
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
Google Scholar
E. Wiener and J.O. Pederson and A.S. Weigend, “A neural network approach to topic spotting,” Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, 1995.
Google Scholar
P.C. Wong and P. Whitney and J. Thomas, “Visualizing Association Rules for Text Mining,” Proceedings of the 1999 IEEE Symposium on Information Visualization, pp. 120–123, 1999.
Google Scholar
Y. Yang and C.G. Chute, “An example-based mapping method for text categorization and retrieval,” ACM Transaction on Information Systems, pp. 253–277, 1994.
Google Scholar
Y. Yang and J.O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, 1997.
Google Scholar
Cognitive Science Laboratory, Princeton University, “WordNet-a Lexical Database for English,” http://www.cogsci.princeton.edu/~wn/

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Inha University, Inchon, Korea
Su-Jeong Ko & Jung-Hyun Lee

Authors

Su-Jeong Ko
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Hyun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Klagenfurt, IFI -IWAS Universitaetsstr. 65, 9020, Klagenfurt, Austria
Heinrich C. Mayr
Faculty of Electrical Engineering, Czech Technical University, Technicka 2, 166 27, Prague 6, Czech Republic
Jiri Lazansky
School of Computer and Information Science, University of South Australia, Mawson Lakes Campus, Mawson Lakes, SA, 5095
Gerald Quirchmayr
Department of Information Systems, Technical University of Munich, Orleanstr. 34, 81667, Munich, Germany
Pavel Vogel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ko, SJ., Lee, JH. (2001). Feature Selection Using Association Word Mining for Classification. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_22

Download citation

DOI: https://doi.org/10.1007/3-540-44759-8_22
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42527-4
Online ISBN: 978-3-540-44759-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics