Elsevier

Procedia Computer Science

Volume 31, 2014, Pages 1084-1091
Procedia Computer Science

Analysis and Detection of Bogus Behavior in Web Crawler Measurement

https://doi.org/10.1016/j.procs.2014.05.363Get rights and content
Under a Creative Commons license
open access

Abstract

With the development of the Internet, search engine technology is becoming more and more popular. Web Crawlers have taken up a great deal of Internet bandwidth. The Internet is filled with “bogus” web crawlers besides Google, Baidu and some other famous search engines. Coded roughly, these crawlers hazard the Internet seriously. Correct analysis of the traffic characteristics of Google web crawler and shielding the “bogus” web crawlers can improve the performance of a site and enhance the quality of service of the network. In this paper, we measured massive of web crawler traffic in the real high speed network, compared the differences of statistical characteristics between Google web crawler and the “bogus” web crawlers. We proposed a model to detect real and “bogus” web crawlers, with accuracy rate of about 95%.

Keywords

web crawler
bogus behavior
measurement
high speed network
traffic

Cited by (0)

Selection and peer-review under responsibility of the Organizing Committee of ITQM 2014.