Skip to main content

Automatic Recognition of News Web Pages

  • Conference paper
Intelligence and Security Informatics (ISI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5075))

Included in the following conference series:

  • 2325 Accesses


The information on the World Wide Web is congested with large amounts of news contents. The filtering, summarization and classification of news Web pages have become hot topics of research, aiming for useful news contents. Accurately identifying news Web pages is a crucial problem in these research topics. To solve this problem, this paper proposes an automatic recognition method for news Web pages based on a combination of URL attributes, structure attributes and content attributes. Our experimental results demonstrate that this method provides a high accuracy of above 96% with the recognition of news Web page.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Guan, T., Wong, K.F.: KPS-A Web Information Mining Algorithm. In: The 8th International World Wide Web Conference, pp. 1495–1507 (1997)

    Google Scholar 

  2. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Kwon, O.W., Lee, J.H.: Web Page Classification based on k-Nearest Neighbor Approach. In: The 5th International Workshop on Information Retrieval with Asian Languages, pp. 9–15. ACM, New York (2000)

    Google Scholar 

  4. Yang, Y., Slattery, S., Ghani, R.A.: A study of app roaches to hypertext categorization. Intelligent Information Systems 18(2/3), 219–241 (2002)

    Article  Google Scholar 

  5. Furnkranz, J.: Exploiting structural information for text classification on the WWW. In: DA 1999, pp. 487–497. Springer, Amsterdam (1999)

    Google Scholar 

  6. Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: A Comparison of Implicit and Explicit Links for Web Page Classification. In: The World Wide Web Conference Committee (IW3C2). ACM 1-59593-323-9/06/0005

    Google Scholar 

  7. Chakrabarti, S., Joshi, M., Tawde, V.: Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks. In: The ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208–216. ACM, New York (2001)

    Google Scholar 

  8. Kuo, Y.H., Wong, M.H.: Web Document Classification based on Hyperlinks and Document Semantics. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS, vol. 1886, pp. 44–51. Springer, Heidelberg (2000)

    Google Scholar 

  9. Kan, M.-Y.: Web page categorization without the web page. In: WWW 2004, May 17–22, ACM, New York (2004) 1-58113-912-8/04/0005

    Google Scholar 

  10. Yan, F., et al.: Using Naive Bayes to Coordinate the Classification of Web Pages. Journal of Software (in Chinese) 12(9), 1386–1392 (2001)

    Google Scholar 

  11. Xie, W., Mammadov, M., Yearwood, J.: Using Links to Aid Web Classification. In: ICIS 2007 (2007) 0-7695-2841-4/07

    Google Scholar 

  12. Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link Analysis, Eigenvectors and Stability. In: The 7th International Joint Conference on Artificial Intelligence, pp. 903–910. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  13. Ng, A.Y., Zheng, A.X., Jordan, M.I.: Stable Algorithms for Link Analysis. In: The ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–266. ACM, New York (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, Z., Wu, GQ., Wu, X., Hu, XG., Wang, FY. (2008). Automatic Recognition of News Web Pages. In: Yang, C.C., et al. Intelligence and Security Informatics. ISI 2008. Lecture Notes in Computer Science, vol 5075. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69136-5

  • Online ISBN: 978-3-540-69304-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics