Skip to main content

Feature Selection Using Association Word Mining for Classification

  • Conference paper
  • First Online:
Book cover Database and Expert Systems Applications (DEXA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2113))

Included in the following conference series:

Abstract

In this paper, we propose effective feature selection method using association word mining. Documents are represented as association-wordvectors that include a few words instead of single words. The focus in this paper is the association rule in reduction of a high dimensional feature space. The accuracy and recall of document classification depend on the number of words for composing association words, confidence, and support at Apriori algorithm. We show how confidence, support, and the number of words for composing association words at Apriori algorithm are selected efficiently. We have used Naive Bayes classifier on text data using proposed feature-vector document representation. By experiment for categorizing documents, we have proved that feature selection method of association word mining is more efficient than information gain and document frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.

    Google Scholar 

  2. R. Agrawal and T. Imielinski and A. Swami, “Mining association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, 1993.

    Google Scholar 

  3. W.W. Cohen and Y. Singer, “Context sensitive learning methods for text categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–315, 1996.

    Google Scholar 

  4. V. Hatzivassiloglou and K. McKeown, “Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning,” Proceedings of the 31st Annual Meeting of the ACL, pp. 172–182, 1993.

    Google Scholar 

  5. D.D. Lewis, Representation and Learning in Information Retrieval, PhD thesis (Technical Report pp. 91–93, Computer Science Dept., Univ. of Massachussetts at Amherst, 1992.

    Google Scholar 

  6. D.D. Lewis and M. Ringuette, “Comparison of two Learning algorithms for text categorization,” Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, 1994.

    Google Scholar 

  7. Y.H. Li and A.K. Jain, “Classification of Text Documents,” Computer Journal, Vol. 41,No. 8, pp. 537–546, 1998.

    Article  MATH  Google Scholar 

  8. T. Michael, Maching Learning, McGraw-Hill, pp. 154–200, 1997.

    Google Scholar 

  9. D. Mladenic, “Feature subset selection in text-learning,” Proceedings of the 10th European Conference on Machine Learning, pp. 95–100, 1998.

    Google Scholar 

  10. D, Mladenic and M. Grobelnik, “Feature selection for classification based on text hierarchy,” Proceedings of the Workshop on Learning from Text and the Web, 1998.

    Google Scholar 

  11. I. Moulinier and G. Raskinis and J. Ganascia, “Text categorization: a symbolic approach,” Proceedings of Fifth Annual Symposium on Document Analysis and Information Retrieval, 1996.

    Google Scholar 

  12. M. Pazzani, D. Billsus, Learning and Revising User Profiles: The Identification of Interesting Web Sites, Machine Learning 27, Kluwer Academic Publishers, pp. 313–331, 1997.

    Article  Google Scholar 

  13. V. Rijsbergen and C. Joost, Information Retrieval, Butterworths, London-second edition, 1979.

    Google Scholar 

  14. G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.

    Google Scholar 

  15. E. Wiener and J.O. Pederson and A.S. Weigend, “A neural network approach to topic spotting,” Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, 1995.

    Google Scholar 

  16. P.C. Wong and P. Whitney and J. Thomas, “Visualizing Association Rules for Text Mining,” Proceedings of the 1999 IEEE Symposium on Information Visualization, pp. 120–123, 1999.

    Google Scholar 

  17. Y. Yang and C.G. Chute, “An example-based mapping method for text categorization and retrieval,” ACM Transaction on Information Systems, pp. 253–277, 1994.

    Google Scholar 

  18. Y. Yang and J.O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, 1997.

    Google Scholar 

  19. Cognitive Science Laboratory, Princeton University, “WordNet-a Lexical Database for English,” http://www.cogsci.princeton.edu/~wn/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ko, SJ., Lee, JH. (2001). Feature Selection Using Association Word Mining for Classification. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-44759-8_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42527-4

  • Online ISBN: 978-3-540-44759-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics