Abstract
Web spam is troubling both internet users and search engine companies, because it seriously damages the reliability of search engine and the benefit of Web users, degrades the Web information quality. This paper discusses a Web spam detection method inspired by Ant Colony Optimization (ACO) algorithm. The approach consists of two stages: preprocessing and Web spam detection. On preprocessing stage, the class-imbalance problem is solved by using a clustering technique and an optimal feature subset is culled by Chi-square statistics. The dataset is also discretized based on the information entropy method. These works make the spam detection at the second stage more efficient and easier. On next stage, spam detection model is built based on the ant colony optimization algorithm. Experimental results on the WEBSPAM-UK2006 reveal that our approach can achieve the same or even better results with less number of features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lin, J.L.: Detection of cloaked Web spam by using tag-based methods. Expert Systems with Applications 36(4), 7493–7499 (2009)
Geng, G.G., Wang, L.M., Wang, W., Hu, A.L., Shen, S.: Statistical cross-language Web content quality assessment. Knowledge-Based Systems 35, 312–319 (2012)
Parpinelli, R.S., Lopes, H.S., Freitas, A.A.: Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation 6(4), 321–332 (2002)
Liu, X.P., Li, X., Liu, L., He, J.Q., Ai, B.: An innovative method to classify remote-sensing images using ant colony optimization. IEEE Transactions on Geoscience and Remote Sensing 46(12), 4198–4208 (2008)
Araujo, L., Martinez-Romo, J.: Web spam detection: new classification features based on qualified link analysis and language models. IEEE Transactions on Information Forensics and Security 5(3), 581–590 (2010)
Niu, X., Ma, J., He, Q., Wang, S., Zhang, D.: Learning to detect web spam by genetic programming. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 18–27. Springer, Heidelberg (2010)
Liu, Y., Chen, F., Kong, W., Yu, H., Zhang, M., Ma, S., Ru, L.: Identifying Web Spam with the Wisdom of the Crowds. ACM Transactions on the Web 6(1) (2012)
Taweesiriwate, A., Manaskasemsak, B., Rungsawang, A.: Web spam detection using link-based ant colony optimization. In: Processings of 26th International Conference on Advanced Information Networking and Applications (AINA) (2012)
Rungsawang, A., Taweesiriwate, A., Manaskasemsak, B.: Spam host detection using ant colony optimization. In: Park, J.J., Arabnia, H., Chang, H.-B., Shon, T. (eds.) IT Convergence and Services. LNEE, vol. 107, pp. 13–21. Springer, Heidelberg (2012)
Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: A reference collection for Web spam. ACM SIGIR Forum 40(2), 11–24 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tang, SH., Zhu, Y., Yang, F., Xu, Q. (2014). Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-10085-2_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10084-5
Online ISBN: 978-3-319-10085-2
eBook Packages: Computer ScienceComputer Science (R0)