skip to main content
10.1145/1242572.1242606acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Detectives: detecting coalition hit inflation attacks in advertising networks streams

Published: 08 May 2007 Publication History

Abstract

Click fraud is jeopardizing the industry of Internet advertising. Internet advertising is crucial for the thriving of the entire Internet, since it allows producers to advertise their products, and hence contributes to the well being of e-commerce. Moreover, advertising supports the intellectual value of the Internet by covering the running expenses of publishing content. Some content publishers are dishonest, and use automation to generate traffic to defraud the advertisers. Similarly, some advertisers automate clicks on the advertisements of their competitors to deplete their competitors' advertising budgets. This paper describes the advertising network model, and focuses on the most sophisticated type of fraud, which involves coalitions among fraudsters. We build on several published theoretical results to devise the Similarity-Seeker algorithm that discovers coalitions made by pairs of fraudsters. We then generalize the solution to coalitions of arbitrary sizes. Before deploying our system on a real network, we conducted comprehensive experiments on data samples for proof of concept. The results were very accurate. We detected several coalitions, formed using various techniques, and spanning numerous sites. This reveals the generality of our model and approach.

References

[1]
J. Abello, M. Resende, and S. Sudarsky. Massive Quasi-Clique Detection. In Proceedings of the 5th LATIN Latin American Symposium on Theoretical Informatics, pages 598--612, 2002.
[2]
E. Akkoyunlu. The Enumeration of Maximal Cliques of Large Graphs. SIAM Journal on Computing, 2(1):1--6, 1973.
[3]
V. Anupam, A. Mayer, K. Nissim, B. Pinkas, and M. Reiter. On the Security of Pay-Per-Click and Other Web Advertising Schemes. In Proceedings of the 8th WWW International Conference on World Wide Web, pages 1091--1100, 1999.
[4]
Burton H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7):422--426, 1970.
[5]
C. Blundo and S. Cimato. SAWM: A Tool for Secure and Authenticated Web Metering. In Proceedings of the 14th ACM SEKE International Conference on Software Engineering and Knowledge Engineering, pages 641--648, 2002.
[6]
T. Bohman, C. Cooper, and A. Frieze. Min-Wise Independent Linear Permutations. Electronic Journal of Combinatorics, 7:R26, 2000.
[7]
A. Bowker and G. Lieberman. Engineering Statistics, 2nd Edition. Prentice Hall, 1972.
[8]
U. Brandes, M. Gaertler, and D. Wagner. Experiments on Graph Clustering Algorithms. In Proceedings of the 11th ESA European Symposium on Algorithms, pages 568--579, 2003.
[9]
A. Broder. On the Resemblance and Containment of Documents. In Proceedings of the IEEE SEQUENCES Compression and Complexity of Sequences, pages 21--29, 1997.
[10]
A. Broder. Identifying and Filtering Near-Duplicate Documents. In Proceedings of the 11th COM Symposium on Combinatorial Pattern Matching, pages 1--10, 2000.
[11]
A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-Wise Independent Permutations (Extended Abstract). In Proceedings of the 30th ACM STOC Symposium on Theory Of Computing, pages 327--336, 1998.
[12]
A. Broder and U. Feige. Min-Wise versus Linear Independence (Extended Abstract). In Proceedings of the 11th ACM-SIAM SODA Symposium on Discrete Algorithms, pages 147--154, 2000.
[13]
A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the Web. In Proceedings of the 6th WWW International Conference on World Wide Web, pages 391--404, 1997.
[14]
C. Bron and J. Kerbosch. Algorithm 457: Finding All Cliques of an Undirected Graph. Communications of the ACM, 16(9):575--577, 1973.
[15]
CERT Coordination Center. CERT Advisory CA-1996-21 TCP SYN Flooding and IP Spoofing Attacks. http://www.cert.org/advisories/CA-1996-21.html, September 19 1996.
[16]
Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th ACM STOC Symposium on Theory Of Computing, pages 380--388, 2002.
[17]
S. Chaudhuri, R. Motwani, and V. Narasayya. On Random Sampling over Joins. In Proceedings of the 18th ACM SIGMOD International Conference on Management of Data, pages 263--274, 1999.
[18]
D. Cheng, S. Vempala, R. Kannan, and G. Wang. A Divide-and-Merge Methodology for Clustering. In Proceedings of the 24th ACM PODS Symposium on Principles of Database Systems, pages 196--205, 2005.
[19]
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 14(1):210--223, 1985.
[20]
L. Gerhards and W. Lindenberg. Clique Detection for Nondirected Graphs: Two New Algorithms. Computing, Volume 21(4):295--322, 1979.
[21]
D. Gibson, R. Kumar, and A. Tomkins. Discovering Large Dense Subgraphs in Massive Graphs. In Proceedings of the 31st VLDB International Conference on Very Large Data Bases, pages 721--732, 2005.
[22]
C. Gkantsidis, M. Mihail, and A. Saberi. Conductance and Congestion in Power Law Graphs. In Proceedings of the 22nd ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 148--159, 2003.
[23]
K. Holzapfel, S. Kosub, M. Maaß, and H. Täubig. The Complexity of Detecting Fixed-Density Clusters. In Proceedings of the 5th CIAC Italian Conference on Algorithms and Complexity, pages 201--212, 2003.
[24]
P. Indyk. A small Approximately Min-Wise Independent Family of Hash Functions. In Proceedings of the 10th ACM-SIAM SODA Symposium On Discrete Algorithms, pages 454--456, 1999.
[25]
M. Jakobsson, P. MacKenzie, and J. Stern. Secure and Lightweight Advertising on the Web. In Proceedings of the 8th WWW International Conference on World Wide Web, pages 1101--1109, 1999.
[26]
H. Johnston. Cliques of a Graph-Variations on the Bron-Kerbosch Algorithm. International Journal of Computer and Information Sciences, 5(3):209--238, 1976.
[27]
R. Kannan, S. Vempala, and A. Veta. On Clusterings: Good, Bad and Spectral. In Proceedings of the 41st IEEE FOCS Annual Symposium on Foundations of Computer Science, pages 367--377, 2000.
[28]
D. Klein. Defending Against the Wily Surfer-Web-based Attacks and Defenses. In Proceedings of the 1st USENIX ID Workshop on Intrusion Detection and Network Monitoring, pages 81--92, 1999.
[29]
M. Liedtke. Google to Pay $90M in `Click Fraud' Case. Washington Post Magazine, March 9 2006.
[30]
M. Liedtke. Yahoo Settles `Click Fraud' Lawsuit. MSNBC News, June 28 2006.
[31]
E. Loukakis. A New Backtracking Algorithm for Generating the Family of Maximal Independent Sets of a Graph. Computers & Mathematics with Applications, 9(4):583--589, 1983.
[32]
C. Mann. How Click Fraud Could Swallow the Internet. Wired Magazine, January 2006.
[33]
R. McGann. Study: Consumers Delete Cookies at Surprising Rate. ClickZ News, March 14 2005.
[34]
A. Metwally, D. Agrawal, and A. El Abbadi. Duplicate Detection in Click Streams. In Proceedings of the 14th WWW International World Wide Web Conference, pages 12--21, 2005.
[35]
A. Metwally, D. Agrawal, and A. El Abbadi. Using Association Rules for Fraud Detection in Web Advertising Networks. In Proceedings of the 31st VLDB International Conference on Very Large Data Bases, pages 169--180, 2005.
[36]
A. Metwally, D. Agrawal, and A. El Abbadi. Hide and Seek: Detecting Hit Inflation Fraud in Streams of Web Advertising Networks. Technical Report 2006-06, University of California, Santa Barbara, Department of Computer Science, 2006.
[37]
J. Moon and L. Moser. On cliques in graphs. Israel journal of Mathematics, 3:23--28, 1965.
[38]
M. Naor and B. Pinkas. Secure and Efficient Metering. In Proceedings EUROCRYPT International Conference on the Theory and Application of Cryptographic Techniques, pages 576--590, 1998.
[39]
S. Olsen. Click Fraud Roils Search Advertisers. CNET News, March 4 2005.
[40]
M. Reiter, V. Anupam, and A. Mayer. Detecting Hit-Shaving in Click-Through Payment Schemes. In Proceedings of the 3rd USENIX Workshop on Electronic Commerce, pages 155--166, 1998.
[41]
R. Shamir, R. Sharan, and D. Tsur. Cluster Graph Modification Problems. Discrete Applied Mathematics, 144(1-2):173--182, 2004.
[42]
J. Sima and S. Schaeffer. On the NP-Completeness of Some Graph Cluster Measures. In Proceedings of the 32nd SOFSEM Conference on Current Trends in Theory and Practice of Informatics, pages 530--537, 2006.
[43]
E. Tomita, A. Tanaka, and H. Takahashi. The Worst-Case Time Complexity for Generating All Maximal Cliques. In Proceedings of the 10th COCOON Annual International Conference on Computing and Combinatorics, pages 161--170, 2004.
[44]
S. Tsukiyama, M. Ide, H. Ariyoshi, and I. Shirakawa. A New Algorithm for Generating All the Maximal Independent Sets. SIAM Journal on Computing, 6(3):505--517, 1977.
[45]
D. Vise. Clicking To Steal. Washington Post Magazine, page F01, April 17 2005.
[46]
J. Vitter. External Memory Algorithms and Data Structures: Dealing with Massive Data. ACM Computing Surveys, 33(2):209--271, 2001.
[47]
T. Zeller Jr. With Each Technology Advance, a Scourge. The New York Times, October 18 2004.

Cited By

View all
  • (2024)Similarity Joins of Sparse FeaturesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653370(80-92)Online publication date: 9-Jun-2024
  • (2024)Mobile ad fraud: Empirical patterns in publisher and advertising campaign dataInternational Journal of Research in Marketing10.1016/j.ijresmar.2023.09.00341:2(265-281)Online publication date: Jun-2024
  • (2024)Neural Networks Based Fuzzy Join Algorithm In Big Data ProcessingFuture Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications10.1007/978-981-96-0437-1_13(171-182)Online publication date: 27-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate set similarity
  2. click spam detection
  3. cliques enumeration
  4. coalition fraud attacks
  5. real data experiments
  6. similarity-sensitive sampling

Qualifiers

  • Article

Conference

WWW'07
Sponsor:
WWW'07: 16th International World Wide Web Conference
May 8 - 12, 2007
Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Similarity Joins of Sparse FeaturesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653370(80-92)Online publication date: 9-Jun-2024
  • (2024)Mobile ad fraud: Empirical patterns in publisher and advertising campaign dataInternational Journal of Research in Marketing10.1016/j.ijresmar.2023.09.00341:2(265-281)Online publication date: Jun-2024
  • (2024)Neural Networks Based Fuzzy Join Algorithm In Big Data ProcessingFuture Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications10.1007/978-981-96-0437-1_13(171-182)Online publication date: 27-Nov-2024
  • (2022)Privacy in targeted advertising on mobile devices: a surveyInternational Journal of Information Security10.1007/s10207-022-00655-x22:3(647-678)Online publication date: 24-Dec-2022
  • (2021)Detecting and Analyzing Collusive Entities on YouTubeACM Transactions on Intelligent Systems and Technology10.1145/347730012:5(1-28)Online publication date: 24-Nov-2021
  • (2021)AdSherlock: Efficient and Deployable Click Fraud Detection for Mobile ApplicationsIEEE Transactions on Mobile Computing10.1109/TMC.2020.296699120:4(1285-1297)Online publication date: 1-Apr-2021
  • (2021)Online Advertising Security: Issues, Taxonomy, and Future DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2021.311827123:4(2494-2524)Online publication date: Dec-2022
  • (2021)Click Fraud Detection Approaches to analyze the Ad Clicks Performed by Malicious CodeJournal of Physics: Conference Series10.1088/1742-6596/2089/1/0120772089:1(012077)Online publication date: 1-Nov-2021
  • (2021)Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web SearchJournal of Computer Science and Technology10.1007/s11390-021-0218-236:5(1167-1183)Online publication date: 30-Sep-2021
  • (2021)Clickedroid: A Methodology Based on Heuristic Approach to Detect Mobile Ad-Click FraudsProceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences10.1007/978-981-15-7533-4_68(853-864)Online publication date: 20-Feb-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media