Automatic Recognition of News Web Pages

Zhu, Zhu; Wu, Gong-Qing; Wu, Xindong; Hu, Xue-Gang; Wang, Fei-Yue

doi:10.1007/978-3-540-69304-8_52

Zhu Zhu²⁵,
Gong-Qing Wu²⁵,
Xindong Wu^25,26,
Xue-Gang Hu²⁵ &
…
Fei-Yue Wang²⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5075))

Included in the following conference series:

International Conference on Intelligence and Security Informatics

2325 Accesses

Abstract

The information on the World Wide Web is congested with large amounts of news contents. The filtering, summarization and classification of news Web pages have become hot topics of research, aiming for useful news contents. Accurately identifying news Web pages is a crucial problem in these research topics. To solve this problem, this paper proposes an automatic recognition method for news Web pages based on a combination of URL attributes, structure attributes and content attributes. Our experimental results demonstrate that this method provides a high accuracy of above 96% with the recognition of news Web page.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

URL-Based Web Page Classification: With n-Gram Language Models

Web Page Classification Based on an Accurate Technique for Key Data Extraction

Web Crawler and Classifier for News Articles

References

Guan, T., Wong, K.F.: KPS-A Web Information Mining Algorithm. In: The 8th International World Wide Web Conference, pp. 1495–1507 (1997)
Google Scholar
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Kwon, O.W., Lee, J.H.: Web Page Classification based on k-Nearest Neighbor Approach. In: The 5th International Workshop on Information Retrieval with Asian Languages, pp. 9–15. ACM, New York (2000)
Google Scholar
Yang, Y., Slattery, S., Ghani, R.A.: A study of app roaches to hypertext categorization. Intelligent Information Systems 18(2/3), 219–241 (2002)
Article Google Scholar
Furnkranz, J.: Exploiting structural information for text classification on the WWW. In: DA 1999, pp. 487–497. Springer, Amsterdam (1999)
Google Scholar
Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: A Comparison of Implicit and Explicit Links for Web Page Classification. In: The World Wide Web Conference Committee (IW3C2). ACM 1-59593-323-9/06/0005
Google Scholar
Chakrabarti, S., Joshi, M., Tawde, V.: Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks. In: The ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208–216. ACM, New York (2001)
Google Scholar
Kuo, Y.H., Wong, M.H.: Web Document Classification based on Hyperlinks and Document Semantics. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS, vol. 1886, pp. 44–51. Springer, Heidelberg (2000)
Google Scholar
Kan, M.-Y.: Web page categorization without the web page. In: WWW 2004, May 17–22, ACM, New York (2004) 1-58113-912-8/04/0005
Google Scholar
Yan, F., et al.: Using Naive Bayes to Coordinate the Classification of Web Pages. Journal of Software (in Chinese) 12(9), 1386–1392 (2001)
Google Scholar
Xie, W., Mammadov, M., Yearwood, J.: Using Links to Aid Web Classification. In: ICIS 2007 (2007) 0-7695-2841-4/07
Google Scholar
Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link Analysis, Eigenvectors and Stability. In: The 7th International Joint Conference on Artificial Intelligence, pp. 903–910. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Ng, A.Y., Zheng, A.X., Jordan, M.I.: Stable Algorithms for Link Analysis. In: The ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–266. ACM, New York (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Hefei University of Technology, Heifei, 230009, China
Zhu Zhu, Gong-Qing Wu, Xindong Wu & Xue-Gang Hu
Department of Computer Science, University of Vermont, Burlington, VT 50405, U.S.A.
Xindong Wu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Fei-Yue Wang

Authors

Zhu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Gong-Qing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xue-Gang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Fei-Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Chinese University of Hong Kong, Hong Kong
Christopher C. Yang
The University of Arizona, USA
Hsinchun Chen
The University of Hong Kong, Hong Kong
Michael Chau
Nanyang Technological University, Singapore
Kuiyu Chang
University of Central Florida, USA
Sheau-Dong Lang
Tatung University, Taiwan
Patrick S. Chen
California University of Pennsylvania, USA
Raymond Hsieh
University of Arizona and Chinese Academy of Sciences, USA
Daniel Zeng
Chinese Academy of Sciences, China
Fei-Yue Wang & Wenji Mao &
Carnegie Mellon University, USA
Kathleen Carley & Justin Zhan &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Z., Wu, GQ., Wu, X., Hu, XG., Wang, FY. (2008). Automatic Recognition of News Web Pages. In: Yang, C.C., et al. Intelligence and Security Informatics. ISI 2008. Lecture Notes in Computer Science, vol 5075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69304-8_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-69304-8_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69136-5
Online ISBN: 978-3-540-69304-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Recognition of News Web Pages

Abstract

Access this chapter

Preview

Similar content being viewed by others

URL-Based Web Page Classification: With n-Gram Language Models

Web Page Classification Based on an Accurate Technique for Key Data Extraction

Web Crawler and Classifier for News Articles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Recognition of News Web Pages

Abstract

Access this chapter

Preview

Similar content being viewed by others

URL-Based Web Page Classification: With n-Gram Language Models

Web Page Classification Based on an Accurate Technique for Key Data Extraction

Web Crawler and Classifier for News Articles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation