Skip to main content

Bayesian Web Document Classification through Optimizing Association Word

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2718))

Abstract

Previous Bayesian document classification has a problem because it does not reflect semantic relation accurately in expressing characteristic of document. In order to resolve this problem, this paper suggests Bayesian document classification method through mining and refining of association word. Apriori algorithm extracts characteristic of test document in form of association words that reflects semantic relation and it mines association words from learning documents. If association word from learning documents is mined only with Apriori algorithm, inappropriate association word is included within them. Accordingly it has disadvantage of lack of accuracy in document classification. In order to complement the disadvantage, we adopt method to refine association words through use of genetic algorithm. Naïve Bayes classifier classifies test documents based on refined association words.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.

    Google Scholar 

  2. R. Agrawal and T. Imielinski and A. Swami, “Mining association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, 1993.

    Google Scholar 

  3. H. Chen, Y. Chung, M. Ramsey, C. Yang, P. Ma, J. Yen, “Intelligent Spider for Internet Searching,” Proceedings of the 30th Annual Hawaii International Conference on System SciencesVolume IV, pp. 178–188, 1997.

    Google Scholar 

  4. W. Frakes and R. Baeza-Yates, information Retrieval, Prentice Hall, 1992.

    Google Scholar 

  5. T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization,” ICML-97, 1997.

    Google Scholar 

  6. S. J. Ko and J. H. Lee, “Feature Selection using Association Word Mining for Classification,” In Proceedings of DEXA2001, LNCS2113, 2001.

    Google Scholar 

  7. V. Hatzivassiloglou and K. McKeown, “Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning,” Proceedings of the 31st Annual Meeting of the ACL, pp. 172–182, 1993.

    Google Scholar 

  8. Introduction to Rainbow URL:http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naivebayes.html.

  9. D. D. Lewis, “Naive (Bayes) at forty: The Independence Assumption in Information Retrieval,” In European Conference on Machine Learning, 1998.

    Google Scholar 

  10. Y. H. Li and A. K. Jain, “Classification of Text Documents,” The Computer Journal, Vol. 41, No. 8, 1998.

    Google Scholar 

  11. M. E. Maron, “Automatic indexing: An experimental inquiry,” Journal of the Association for Computing Machinery, 8:404–417, 1961.

    MATH  Google Scholar 

  12. T. Michael, Maching Learning, McGraw-Hill, pp. 154–200, 1997.

    Google Scholar 

  13. A. McCallum and K. Nigram, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI-98 Workshop on Learning for Text Categorization, 1998.

    Google Scholar 

  14. J. McMahon and F. Smith, “Improving statistical language model performance with automatically generated word hierarchies,” Computational Linguistics, Vol. 22, No. 2, 1995.

    Google Scholar 

  15. D. Mladenic, “Feature subset selection in text-learning,” Proceedings of the 10th European Conference on Machine Learning, pp. 95–100, 1998.

    Google Scholar 

  16. Cognitive Science Laboratory, Princeton University, “Word Net-a Lexical Database for English,” http://www.cogsci.princeton.edu/~wn/.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ko, S.J., Choi, J.H., Lee, J.H. (2003). Bayesian Web Document Classification through Optimizing Association Word. In: Chung, P.W.H., Hinde, C., Ali, M. (eds) Developments in Applied Artificial Intelligence. IEA/AIE 2003. Lecture Notes in Computer Science(), vol 2718. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45034-3_57

Download citation

  • DOI: https://doi.org/10.1007/3-540-45034-3_57

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40455-2

  • Online ISBN: 978-3-540-45034-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics