Skip to main content

A SVM Method for Web Page Categorization Based on Weight Adjustment and Boosting Mechanism

  • Conference paper
  • 1324 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3614))

Abstract

Web page classification is an important research direction of web mining. In the paper, a SVM method of web page classification is presented. It include four steps: (1) using analysis module to extract the core text and structural tags from a web page; (2) adopting the improved VSM model to generate the initial feature vectors based on the core text of web page; (3) adjusting weights of the selected features based on structural tags in web page to generate the base SVM classifier; (4) combining the base classifiers produced by iteration based on Boosting mechanism to obtain the target SVM classifier. The experiment of web page classification shows that the approach presented is efficient.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yang, Y.: An evaluation of statistical approach to text categorization. Technical Report CMU-CS-97-127, Computer Science Department, Carnegie Mellon University (1997)

    Google Scholar 

  2. Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TF-IDF for Text Categorization. In: Proc. of ICML 1997, pp. 143–151. Morgan Kaufmann Publishers, San Francisco (1997)

    Google Scholar 

  3. Yu-chang, L., Ming-yu, L., Fan, L., Li-zhu, Z.: Analysis and construction of word weighing function in VSM. Computer Research and Development 39(10), 1205–1210 (2002)

    Google Scholar 

  4. Xianjun, X., Jiantao, S., Yuchang, L.: The research and implementation of a new result-faced methods for webpage information extraction. Computer Engineering and Application 38, 87–91 (2002)

    Google Scholar 

  5. Kecman, V.: Learning and Soft Computing, Support Verctor Machines. In: Neural Networks and Fuzzy Logic Models, The MIT Press, Cambridge (2001)

    Google Scholar 

  6. Wang, L.P. (ed.): Support Vector Machines: Theory and Application. Springer, Heidelberg (2005)

    Google Scholar 

  7. Jiantao, S., Dou, S., Yuchang, L.: Web document classification techniques. Journal of Tsinghua University 44(1), 65–68 (2004)

    Google Scholar 

  8. Diao, L., Lu, M., Hu, K., Lu, Y., Shi, C.: New boosting algorithms for text categorization. In: Proceedings of the World Congress on Intelligent Control and Automation (WCICA), vol. 3, pp. 2326–2329 (2002)

    Google Scholar 

  9. Lili, D., Yuchang, L., Chunyi, S.: A method to boost support vector machines. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 463–468. Springer, Heidelberg (2002)

    Google Scholar 

  10. Mingyu, L., Qiang, Z., Fan, L., et al.: Recommendation of Web Pages Based on Concept Association. In: Proc. of 4th IEEE International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems, pp. 221–227. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, M., Guo, C., Sun, J., Lu, Y. (2005). A SVM Method for Web Page Categorization Based on Weight Adjustment and Boosting Mechanism. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_100

Download citation

  • DOI: https://doi.org/10.1007/11540007_100

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28331-7

  • Online ISBN: 978-3-540-31828-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics