Bayesian Web Document Classification through Optimizing Association Word

Ko, Su Jeong; Choi, Jun Hyeog; Lee, Jung Hyun

doi:10.1007/3-540-45034-3_57

Bayesian Web Document Classification through Optimizing Association Word

Su Jeong Ko³,
Jun Hyeog Choi⁴ &
Jung Hyun Lee³

Conference paper
First Online: 01 January 2003

3677 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2718))

Abstract

Previous Bayesian document classification has a problem because it does not reflect semantic relation accurately in expressing characteristic of document. In order to resolve this problem, this paper suggests Bayesian document classification method through mining and refining of association word. Apriori algorithm extracts characteristic of test document in form of association words that reflects semantic relation and it mines association words from learning documents. If association word from learning documents is mined only with Apriori algorithm, inappropriate association word is included within them. Accordingly it has disadvantage of lack of accuracy in document classification. In order to complement the disadvantage, we adopt method to refine association words through use of genetic algorithm. Naïve Bayes classifier classifies test documents based on refined association words.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
Google Scholar
R. Agrawal and T. Imielinski and A. Swami, “Mining association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, 1993.
Google Scholar
H. Chen, Y. Chung, M. Ramsey, C. Yang, P. Ma, J. Yen, “Intelligent Spider for Internet Searching,” Proceedings of the 30th Annual Hawaii International Conference on System Sciences — Volume IV, pp. 178–188, 1997.
Google Scholar
W. Frakes and R. Baeza-Yates, information Retrieval, Prentice Hall, 1992.
Google Scholar
T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization,” ICML-97, 1997.
Google Scholar
S. J. Ko and J. H. Lee, “Feature Selection using Association Word Mining for Classification,” In Proceedings of DEXA2001, LNCS2113, 2001.
Google Scholar
V. Hatzivassiloglou and K. McKeown, “Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning,” Proceedings of the 31st Annual Meeting of the ACL, pp. 172–182, 1993.
Google Scholar
Introduction to Rainbow URL:http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naivebayes.html.
D. D. Lewis, “Naive (Bayes) at forty: The Independence Assumption in Information Retrieval,” In European Conference on Machine Learning, 1998.
Google Scholar
Y. H. Li and A. K. Jain, “Classification of Text Documents,” The Computer Journal, Vol. 41, No. 8, 1998.
Google Scholar
M. E. Maron, “Automatic indexing: An experimental inquiry,” Journal of the Association for Computing Machinery, 8:404–417, 1961.
MATH Google Scholar
T. Michael, Maching Learning, McGraw-Hill, pp. 154–200, 1997.
Google Scholar
A. McCallum and K. Nigram, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI-98 Workshop on Learning for Text Categorization, 1998.
Google Scholar
J. McMahon and F. Smith, “Improving statistical language model performance with automatically generated word hierarchies,” Computational Linguistics, Vol. 22, No. 2, 1995.
Google Scholar
D. Mladenic, “Feature subset selection in text-learning,” Proceedings of the 10th European Conference on Machine Learning, pp. 95–100, 1998.
Google Scholar
Cognitive Science Laboratory, Princeton University, “Word Net-a Lexical Database for English,” http://www.cogsci.princeton.edu/~wn/.

Download references

Author information

Authors and Affiliations

School of Computer Science & Engineering, Inha University, Yong_hyen dong, Namgu, Inchon, Korea
Su Jeong Ko & Jung Hyun Lee
Division of Computer Science, Kimpo College, San 14-1, Ponae-ri, Wolgot-myun, Kimpo, Kyonggi-do, Korea
Jun Hyeog Choi

Authors

Su Jeong Ko
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hyeog Choi
View author publications
You can also search for this author in PubMed Google Scholar
Jung Hyun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Loughborough University, Loughborough, LE11 3TU, England
Paul W. H. Chung & Chris Hinde &
Dept. of Computer Science, Southwest Texas State University, 601 University Drive, San Marcos, TX, 78666, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ko, S.J., Choi, J.H., Lee, J.H. (2003). Bayesian Web Document Classification through Optimizing Association Word. In: Chung, P.W.H., Hinde, C., Ali, M. (eds) Developments in Applied Artificial Intelligence. IEA/AIE 2003. Lecture Notes in Computer Science(), vol 2718. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45034-3_57

Download citation

DOI: https://doi.org/10.1007/3-540-45034-3_57
Published: 24 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40455-2
Online ISBN: 978-3-540-45034-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics