Abstract
The information on the World Wide Web is congested with large amounts of news contents. The filtering, summarization and classification of news Web pages have become hot topics of research, aiming for useful news contents. Accurately identifying news Web pages is a crucial problem in these research topics. To solve this problem, this paper proposes an automatic recognition method for news Web pages based on a combination of URL attributes, structure attributes and content attributes. Our experimental results demonstrate that this method provides a high accuracy of above 96% with the recognition of news Web page.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Guan, T., Wong, K.F.: KPS-A Web Information Mining Algorithm. In: The 8th International World Wide Web Conference, pp. 1495–1507 (1997)
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)
Kwon, O.W., Lee, J.H.: Web Page Classification based on k-Nearest Neighbor Approach. In: The 5th International Workshop on Information Retrieval with Asian Languages, pp. 9–15. ACM, New York (2000)
Yang, Y., Slattery, S., Ghani, R.A.: A study of app roaches to hypertext categorization. Intelligent Information Systems 18(2/3), 219–241 (2002)
Furnkranz, J.: Exploiting structural information for text classification on the WWW. In: DA 1999, pp. 487–497. Springer, Amsterdam (1999)
Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: A Comparison of Implicit and Explicit Links for Web Page Classification. In: The World Wide Web Conference Committee (IW3C2). ACM 1-59593-323-9/06/0005
Chakrabarti, S., Joshi, M., Tawde, V.: Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks. In: The ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208–216. ACM, New York (2001)
Kuo, Y.H., Wong, M.H.: Web Document Classification based on Hyperlinks and Document Semantics. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS, vol. 1886, pp. 44–51. Springer, Heidelberg (2000)
Kan, M.-Y.: Web page categorization without the web page. In: WWW 2004, May 17–22, ACM, New York (2004) 1-58113-912-8/04/0005
Yan, F., et al.: Using Naive Bayes to Coordinate the Classification of Web Pages. Journal of Software (in Chinese) 12(9), 1386–1392 (2001)
Xie, W., Mammadov, M., Yearwood, J.: Using Links to Aid Web Classification. In: ICIS 2007 (2007) 0-7695-2841-4/07
Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link Analysis, Eigenvectors and Stability. In: The 7th International Joint Conference on Artificial Intelligence, pp. 903–910. Morgan Kaufmann, San Francisco (2001)
Ng, A.Y., Zheng, A.X., Jordan, M.I.: Stable Algorithms for Link Analysis. In: The ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–266. ACM, New York (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, Z., Wu, GQ., Wu, X., Hu, XG., Wang, FY. (2008). Automatic Recognition of News Web Pages. In: Yang, C.C., et al. Intelligence and Security Informatics. ISI 2008. Lecture Notes in Computer Science, vol 5075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69304-8_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-69304-8_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69136-5
Online ISBN: 978-3-540-69304-8
eBook Packages: Computer ScienceComputer Science (R0)