Web Spam Detection Using MapReduce Approach to Collective Classification

Indyk, Wojciech; Kajdanowicz, Tomasz; Kazienko, Przemyslaw; Plamowski, Slawomir

doi:10.1007/978-3-642-33018-6_20

Wojciech Indyk¹⁰,
Tomasz Kajdanowicz¹⁰,
Przemyslaw Kazienko¹⁰ &
…
Slawomir Plamowski¹⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 189))

1969 Accesses

Abstract

The web spam detection problem was considered in the paper. Based on interconnected spam and no-spam hosts a collective classification approach based on label propagation is aimed at discovering the spam hosts. Each host is represented as network node and links between hosts constitute network’s edges. The proposed method provides reasonable results and is able to compute large data as is settled in MapReduce programming model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Collaborative Abstraction Based Email Spam Filtering with Fingerprints

Article 02 November 2021

A Social Spam Detection Framework via Semi-supervised Learning

Distributed classification for image spam detection

Article 01 July 2017

References

Indyk, W., Kajdanowicz, T., Kazienko, P., Plamowski, S.: MapReduce approach to collective classification for networks. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 656–663. Springer, Heidelberg (2012)
Google Scholar
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, pp. 83–92. ACM, New York (2006)
Google Scholar
Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. Technical Report 2004-25, Stanford InfoLab (March 2004)
Google Scholar
Drost, I., Scheffer, T.: Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 96–107. Springer, Heidelberg (2005)
Google Scholar
Davison, B.D.: Recognizing nepotistic links on the web. In: AAAI 2000 Workshop on Artificial Intelligence for Web Search, pp. 23–28. AAAI Press (2000)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Google Scholar
Mishne, G.: Blocking blog spam with language model disagreement. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2005) (2005)
Google Scholar
Zhang, H., Goel, A., Govindan, R., Mason, K., Van Roy, B.: Making Eigenvector-Based Reputation Systems Robust to Collusion. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 92–104. Springer, Heidelberg (2004)
Google Scholar
da Costa Carvalho, A.L., Chirita, P.A., Carvalho, C., Calado, P., Alex, P., Chirita, R., Moura, E.S.D., Nejdl, W.: Site level noise removal for search engines. In: Proc. of International World Wide Web Conference (WWW), pp. 73–82. ACM Press (2006)
Google Scholar
Gyngyi, Z., Garcia-molina, H., Pedersen, J.: Combating web spam with trustrank. In: VLDB, pp. 576–587. Morgan Kaufmann (2004)
Google Scholar
Wu, B., Davison, B.D.: Identifying link farm spam pages. In: Proceedings of the 14th International World Wide Web Conference, pp. 820–829. ACM Press (2005)
Google Scholar
Wu, B., Goel, V., Davison, B.D.: Topical trustrank: using topicality to combat web spam (2006)
Google Scholar
Benczur, A.A., Csalogany, K., Sarlos, T., Uher, M.: Spamrank - fully automatic link spam detection. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb (2005)
Google Scholar
Gyngyi, Z., Garcia-molina, H.: Link spam alliances. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pp. 517–528 (2005)
Google Scholar
Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. In: Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, WebDB 2004, pp. 1–6. ACM, New York (2004)
Google Scholar
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Link-based characterization and detection of web spam. In: Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb (2006)
Google Scholar
Kolari, P., Java, A., Finin, T., Oates, T., Joshi, A.: Detecting spam blogs: A machine learning approach. In: 2006 Proceedings of the 21st National Conference on Artificial Intelligence, AAAI (2006)
Google Scholar
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 423–430. ACM, New York (2007)
Google Scholar
Szummer, M., Jaakkola, T.: Clustering and efficient use of unlabeled examples. In: Proceedings of Neural Information Processing Systems, NIPS (2001)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the International Conference on Machine Learning, ICML (2003)
Google Scholar
Azran, A.: The rendezvous algorithm: Multiclass semi-supervised learning with markov random walks. In: Proceedings of the International Conference on Machine Learning, ICML (2007)
Google Scholar
Geng, G., Li, Q., Zhang, X.: Link based small sample learning for web spam detection. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 1185–1186 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Management, Wroclaw University of Technology, Wroclaw, Poland
Wojciech Indyk, Tomasz Kajdanowicz, Przemyslaw Kazienko & Slawomir Plamowski

Authors

Wojciech Indyk
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Kajdanowicz
View author publications
You can also search for this author in PubMed Google Scholar
Przemyslaw Kazienko
View author publications
You can also search for this author in PubMed Google Scholar
Slawomir Plamowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wojciech Indyk .

Editor information

Editors and Affiliations

, Department of Civil Engineering, University of Burgos, Campus Vena (Edif.C), C/ Francisco de Vitoria, s/n, Burgos, 09006, Spain
Álvaro Herrero
VŠB-TU Ostrava, 17. listopadu 15, Ostrava, 70833, Czech Republic
Václav Snášel
MIR Labs, Scientific Network for Innovation, Machine Intelligence Research Labs, Auburn, 98071, USA
Ajith Abraham
VŠB-TU Ostrava, 17. listopadu 15, Ostrava, 70833, Czech Republic
Ivan Zelinka
, Department of Civil Engineering, University of Burgos, Campus Vena (Edif.C), C/ Francisco de Vitoria, s/n, Burgos, 09006, Spain
Bruno Baruque
Universidad de Salamanca, Plaza de la Merced S/N, Salamanca, 37008, Spain
Héctor Quintián
University of Coruña, Avda. 19 de febrero, s/n, Coruña, 15405 A, Spain
José Luis Calvo
y León, Pol. Ind. Villalonquéjar, Instituto Tecnológico de Castilla, Lopez Bravo 70, Burgos, 09001, Spain
Javier Sedano
Universidad de Salamanca, Plaza de la Merced S/N, Salamanca, 37008, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Indyk, W., Kajdanowicz, T., Kazienko, P., Plamowski, S. (2013). Web Spam Detection Using MapReduce Approach to Collective Classification. In: Herrero, Á., et al. International Joint Conference CISIS’12-ICEUTE´12-SOCO´12 Special Sessions. Advances in Intelligent Systems and Computing, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33018-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-33018-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33017-9
Online ISBN: 978-3-642-33018-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics