Skip to main content

Solving Problems Two at a Time: Classification of Web Pages Using a Generic Pair-Wise Multiple Classifier System

  • Conference paper
  • First Online:
Multiple Classifier Systems (MCS 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2709))

Included in the following conference series:

Abstract

We propose a generic multiple classifier system based solely on pairwise classifiers to classify web pages. Web page classification is getting huge attention now because of its use in enhancing the accuracy of search engines and in summarizing web content for small-screen handheld devices. We have used a Support Vector Machine (SVM) as our core pair-wise classifier. The proposed system has produced very encouraging results on the problem web page classification. The proposed solution is totally generic and should be applicable in solving a wide range of multiple class pattern recognition problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Apte and F. Damerau. Automated learning of decision rules for text categorization. ACM TOES 12(2):233–251, 1994.

    Article  Google Scholar 

  2. N. Soonthornphisaj and B. Kijsirikul. Proc. National Computer Science and Engineering Conf., Thailand, 2000.

    Google Scholar 

  3. H. Yu, K. Chang and J. Han. Heterogeneous learner for web page classification, IEEE Int. Conf. on Data Mining (ICDM), pages 538–545, 2002.

    Google Scholar 

  4. H. Yu, J. Han and K. Chang. Positive example based learning for web page classification using SVM. Proc. ACM SIGKDD, 2002.

    Google Scholar 

  5. W. Wong and A. Fu. Incremental document clustering for web page classification. IEEE Int. Conf. on Information Society in the 21st Century: Emerging Technologies and New Challenges (ISO), 2000.

    Google Scholar 

  6. D. Mladenic. Turning Yahoo! into an automatic web-page classifier. Proc. of the 13th European Conf. on Artificial Intelligence (ECAI’98), pages 473–474, 1998.

    Google Scholar 

  7. M. Mlandenić, M. Diligenti, M. Gori, M. Maggini and V. Milutinovic. Web page classification using special information. Workshop su NLP e Web: la sfida della multimodalita tra approcci simbolici e apprtoacci ststistici, Bulgaria, 2002.

    Google Scholar 

  8. M. Mlandenić, M. Diligenti, M. Gori, M. Maggini and V. Milutinovic. Web page classification using visual layout analysis. Proc. IEEE Int. Conf. on Data Mining (ICDM), 2002.

    Google Scholar 

  9. E. Glover, K. Tsioutsiouliklis, S. Lawrence, D. Pennock and G. Flake. Using web structure for classifying and describing web pages. Proc. 11th WWW Conf., 2002.

    Google Scholar 

  10. A. Sirvatham and K. Kumar. Web page classification based on document structure. IEEE Indian Council National Student Paper Contest, 2001.

    Google Scholar 

  11. O. Kwon, J. Lee, Web page classification based on k-nearest neighbor approach. 15th Int. Workshop on Information Retrieval with Asian Languages (IRAL), 2000.

    Google Scholar 

  12. G. Attardi; A. Gulli; F. Sebastiani. Automatic Web page categorization by link and context analysis. THAI-ETIS European Symposium on Telematics, Hypermedia and Artificial Intelligence, pages 1–15, 1999.

    Google Scholar 

  13. X. Peng, B. Choi. Automatic web page classification in a dynamic and hierarchical way. IEEE Int. Conf. on Data Mining, 2002.

    Google Scholar 

  14. V. Loia and P. Luongo. An evolutionary approach to automatic web page categorization and updating. Int. Conf. on Web Intelligence, pages 292–302, 2001.

    Google Scholar 

  15. M. Tsukada, T. Washio and H. Motoda: Automatic web-page classification by using machine learning methods. Int. Conf. on Web Intelligence, pages 303–313, 2001.

    Google Scholar 

  16. A. F. R. Rahman and M. C. Fairhurst. Selective partition algorithm for finding regions of maximum pair-wise dissimilarity among statistical class models. Pattern Recognition Letters, 18(7):605–611, 1997.

    Article  Google Scholar 

  17. A. F. R. Rahman and M. C. Fairhurst, “A novel pair-wise recognition scheme for handwritten characters in the frame-work of a multi-expert configuration”. Lecture Notes in Computer Science: 1311, A. Del Bimbo (Ed.), pages 624–631, 1997.

    Google Scholar 

  18. P. Argentiero, R. Chin, and P. Beaudet. An automated approach to the design of decision tree classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 4(1):51–57, 1982.

    Google Scholar 

  19. K. Anisimovich, V. Rybkin, A. Shamis, and V. Tereshchenko. Using combination of structural, feature and raster classifiers for recognition of handprinted characters. In Proc. 4th Int. Conf. on Document Analysis and Recognition, ICDAR97, vol. 2, pages 881–885, 1997.

    Article  Google Scholar 

  20. P. Jonghyun, C. Sung-Bae, L. Kwanyong, and L. Yillbyung. Multiple recognizers system using two-stage combination. In Proc. of the 13th Int. Conf. on Pattern Recognition, pages 581–585, 1996.

    Google Scholar 

  21. J. Zhou, Q. Gan, and C. Y. Suen. A high performance hand-printed numeral recognition system with verification module. In Proc. 4th Int. Conf. on Document Analysis and Recognition, ICDAR97, vol. 1, pages 293–297, 1997.

    Article  Google Scholar 

  22. M. C. Fairhurst and A. F. R. Rahman. A Generalised approach to the recognition of structurally similar handwritten characters. Int. Jour. of IEE Proc. on Vision, Image and Signal Processing, 144(1), pp. 15–22, 1997.

    Article  Google Scholar 

  23. C. H. Tung and H. J. Lee. 2-stage character recognition by detection and correction of erroneously-identified characters. In Proc. of the Second Int. Conf. on Document Analysis and Recognition, pages 834–837, 1993.

    Google Scholar 

  24. F. Wang, L. Vuurpijl and L. Schomaker. Support vector machines for the classification of western handwriting capitals. Proc. IWFHR 2000, pages 167–176.

    Google Scholar 

  25. L. Vuurpijl, and L. Schomaker. Two-stage character classification: A combined approach of clustering and support vector classifiers. Proc. IWFHR 2000, pages 423–432.

    Google Scholar 

  26. F Schwenker and G. Palm. Tree structured support vector machines for multi-class pattern recognition. In Proc. MCS 2001, pages 409–417.

    Google Scholar 

  27. D. S. Frossyniotis and A. Stafylopatis. A multi-SVM classification system. Proc. MCS 2001, pages 198–207.

    Google Scholar 

  28. B. Scholkopf, S. T. Dumais, E. Osuna and J. Platt. Support Vector Machine. In IEEE Intelligent Systems Magazine, Trends and Controversies, Marti Hearst, ed., 13(4), pages 18–28, 1998.

    Google Scholar 

  29. V. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.

    Google Scholar 

  30. T. Joachims. In Making large-Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999.

    Google Scholar 

  31. A. F R. Rahman, Y. Tarnikova and H. Alam. Exploring a Hybrid of Support Vector Machines (SVMs) and a Heuristic Based System in Classifying Web Pages. Document Recognition and Retrieval X, 15th Annual IS&S/SPIE Symposium, pages 120–127, 2003.

    Google Scholar 

  32. M. Sinka and D. Corne. A large benchmark dataset for web document clustering. Int. Conf. on Hybrid Intelligent Systems (HIS’02), 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alam, H., Rahman, F., Tarnikova, Y. (2003). Solving Problems Two at a Time: Classification of Web Pages Using a Generic Pair-Wise Multiple Classifier System. In: Windeatt, T., Roli, F. (eds) Multiple Classifier Systems. MCS 2003. Lecture Notes in Computer Science, vol 2709. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44938-8_39

Download citation

  • DOI: https://doi.org/10.1007/3-540-44938-8_39

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40369-2

  • Online ISBN: 978-3-540-44938-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics