Skip to main content

Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm

  • Conference paper
Database and Expert Systems Applications (DEXA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8645))

Included in the following conference series:

  • 1395 Accesses

Abstract

Web spam is troubling both internet users and search engine companies, because it seriously damages the reliability of search engine and the benefit of Web users, degrades the Web information quality. This paper discusses a Web spam detection method inspired by Ant Colony Optimization (ACO) algorithm. The approach consists of two stages: preprocessing and Web spam detection. On preprocessing stage, the class-imbalance problem is solved by using a clustering technique and an optimal feature subset is culled by Chi-square statistics. The dataset is also discretized based on the information entropy method. These works make the spam detection at the second stage more efficient and easier. On next stage, spam detection model is built based on the ant colony optimization algorithm. Experimental results on the WEBSPAM-UK2006 reveal that our approach can achieve the same or even better results with less number of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lin, J.L.: Detection of cloaked Web spam by using tag-based methods. Expert Systems with Applications 36(4), 7493–7499 (2009)

    Article  Google Scholar 

  2. Geng, G.G., Wang, L.M., Wang, W., Hu, A.L., Shen, S.: Statistical cross-language Web content quality assessment. Knowledge-Based Systems 35, 312–319 (2012)

    Article  Google Scholar 

  3. Parpinelli, R.S., Lopes, H.S., Freitas, A.A.: Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation 6(4), 321–332 (2002)

    Article  Google Scholar 

  4. Liu, X.P., Li, X., Liu, L., He, J.Q., Ai, B.: An innovative method to classify remote-sensing images using ant colony optimization. IEEE Transactions on Geoscience and Remote Sensing 46(12), 4198–4208 (2008)

    Article  Google Scholar 

  5. Araujo, L., Martinez-Romo, J.: Web spam detection: new classification features based on qualified link analysis and language models. IEEE Transactions on Information Forensics and Security 5(3), 581–590 (2010)

    Article  Google Scholar 

  6. Niu, X., Ma, J., He, Q., Wang, S., Zhang, D.: Learning to detect web spam by genetic programming. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 18–27. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Liu, Y., Chen, F., Kong, W., Yu, H., Zhang, M., Ma, S., Ru, L.: Identifying Web Spam with the Wisdom of the Crowds. ACM Transactions on the Web 6(1) (2012)

    Google Scholar 

  8. Taweesiriwate, A., Manaskasemsak, B., Rungsawang, A.: Web spam detection using link-based ant colony optimization. In: Processings of 26th International Conference on Advanced Information Networking and Applications (AINA) (2012)

    Google Scholar 

  9. Rungsawang, A., Taweesiriwate, A., Manaskasemsak, B.: Spam host detection using ant colony optimization. In: Park, J.J., Arabnia, H., Chang, H.-B., Shon, T. (eds.) IT Convergence and Services. LNEE, vol. 107, pp. 13–21. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: A reference collection for Web spam. ACM SIGIR Forum 40(2), 11–24 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tang, SH., Zhu, Y., Yang, F., Xu, Q. (2014). Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10085-2_21

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10084-5

  • Online ISBN: 978-3-319-10085-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics