Analysis and Detection of Bogus Behavior in Web Crawler Measurement

doi:10.1016/j.procs.2014.05.363

Abstract

With the development of the Internet, search engine technology is becoming more and more popular. Web Crawlers have taken up a great deal of Internet bandwidth. The Internet is filled with “bogus” web crawlers besides Google, Baidu and some other famous search engines. Coded roughly, these crawlers hazard the Internet seriously. Correct analysis of the traffic characteristics of Google web crawler and shielding the “bogus” web crawlers can improve the performance of a site and enhance the quality of service of the network. In this paper, we measured massive of web crawler traffic in the real high speed network, compared the differences of statistical characteristics between Google web crawler and the “bogus” web crawlers. We proposed a model to detect real and “bogus” web crawlers, with accuracy rate of about 95%.

Procedia Computer Science

Abstract

Keywords

Cited by (0)

Procedia Computer Science

Analysis and Detection of Bogus Behavior in Web Crawler Measurement☆

Abstract

Keywords