Skip to main content

Google Dorks: Analysis, Creation, and New Defenses

  • Conference paper
  • First Online:
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9721))

Abstract

With the advent of Web 2.0, many users started to maintain personal web pages to show information about themselves, their businesses, or to run simple e-commerce applications. This transition has been facilitated by a large number of frameworks and applications that can be easily installed and customized. Unfortunately, attackers have taken advantage of the widespread use of these technologies – for example by crafting special search engines queries to fingerprint an application framework and automatically locate possible targets. This approach, usually called Google Dorking, is at the core of many automated exploitation bots.

In this paper we tackle this problem in three steps. We first perform a large-scale study of existing dorks, to understand their typology and the information attackers use to identify their target applications. We then propose a defense technique to render URL-based dorks ineffective. Finally we study the effectiveness of building dorks by using only combinations of generic words, and we propose a simple but effective way to protect web applications against this type of fingerprinting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Not all dorks have been correctly classified automatically, so we manually inspected the results to ensure a correct classification.

  2. 2.

    Here we assume that search engines do not try to disguise their requests, as it is the case for all the popular ones we encountered in our study.

  3. 3.

    For efficiency reasons, we compute the hit rank by visiting a random sample that covers 30 % of the first 1000 results.

References

  1. Long, J., Skoudis, E.: Google Hacking for Penetration Testers. Syngress, Rockland (2005)

    Google Scholar 

  2. Provos, N., McClain, J., Wang, K.: Search worms. In: Proceedings of the 4th ACM Workshop on Recurring Malcode, pp. 1–8 (2006)

    Google Scholar 

  3. Christodorescu, M., Fredrikson, M., Jha, S., Giffin, J.: End-to-end software diversification of internet services. Moving Target Defense 54, 117–130 (2011)

    Article  Google Scholar 

  4. Zhang, J., Notani, J., Gu, G.: Characterizing Google hacking: a first large-scale quantitative study. In: Tian, J., et al. (eds.) SecureComm 2014. LNICST, vol. 152, pp. 602–622. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23829-6_46

    Chapter  Google Scholar 

  5. Johnny Google hacking database. http://johnny.ihackstuff.com/ghdb/

  6. Exploit database. https://www.exploit-db.com/

  7. Yandex cloacking condition. https://yandex.com/support/webmaster/yandex-indexing/webmaster-advice.xml

  8. Baidu cloacking condition. http://baike.baidu.com/item/Cloaking

  9. Google cloacking condition. https://support.google.com/webmasters/answer/66355?hl=en

  10. Wappalyzer-python. https://github.com/scrapinghub/wappalyzer-python

  11. meanpath. https://meanpath.com/

  12. Blind elephant. https://community.qualys.com/community/blindelephant

  13. Whatweb. http://www.morningstarsecurity.com/research/whatweb

  14. Moore, T., Clayton, R.: Evil searching: compromise and recompromise of internet hosts for phishing. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 256–272. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. John, J.P., Yu, F., Xie, Y., Abadi, M., Krishnamurthy, A.: Searching the searchers with searchaudit. In: Proceedings of the 19th USENIX Conference on Security, Berkeley, CA, USA, p. 9 (2010)

    Google Scholar 

  16. John, J.P., Yu, F., Xie, Y., Krishnamurthy, A., Abadi, M.: Heat-seeking honeypots: design and experience. In: Proceedings of WWW, pp. 207–216 (2011)

    Google Scholar 

  17. Michael, K.: Hacking: The Next Generation. Elsevier Advanced Technology, Oxford (2012)

    Google Scholar 

  18. Google advanced operators. https://support.google.com/websearch/answer/2466433?hl=en

  19. Bing advanced operators. https://msdn.microsoft.com/en-us/library/ff795667.aspx

  20. Lancor, L., Workman, R.: Using Google hacking to enhance defense strategies. In: Proceedings of the 38th SIGCSE Technical Symposium on Computer Science Education, pp. 491–495 (2007)

    Google Scholar 

  21. Pelizzi, R., Tran, T., Saberi, A.: Large-scale, automatic XSS detection using Google dorks (2011)

    Google Scholar 

  22. Invernizzi, L., Comparetti, P.M., Benvenuti, S., Kruegel, C., Cova, M., Vigna, G.: Evilseed: a guided approach to finding malicious web pages. In: IEEE Symposium on Security and Privacy, pp. 428–442 (2012)

    Google Scholar 

  23. Zhang, J., Yang, C., Xu, Z., Gu, G.: PoisonAmplifier: a guided approach of discovering compromised websites through reversing search poisoning attacks. In: Balzarotti, D., Stolfo, S.J., Cova, M. (eds.) RAID 2012. LNCS, vol. 7462, pp. 230–253. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Billig, J., Danilchenko, Y., Frank, C.E.: Evaluation of Google hacking. In: Proceedings of the 5th Annual Conference on Information Security Curriculum Development, pp. 27–32. ACM (2008)

    Google Scholar 

  25. Gooscan. http://www.aldeid.com/wiki/Gooscan

  26. Keßler, M., Lucks, S., Tatlı, E.I.: Tracking dog-a privacy tool against Google hacking. In: CoseC b-it, p. 8 (2007)

    Google Scholar 

  27. Pulp google hacking: the next generation search engine hacking arsenal

    Google Scholar 

  28. Sahito, F., Slany, W., Shahzad, S.: Search engines: the invader to our privacy - a survey. In: International Conference on Computer Sciences and Convergence Information Technology, pp. 640–646, November 2011

    Google Scholar 

  29. Tatlı, E.I.: Google hacking against privacy (2007)

    Google Scholar 

  30. Tatlı, E.I.: Google reveals cryptographic secrets. In: Kryptowochenende 2006-Workshop über Kryptographie Universität Mannheim, p. 33 (2006)

    Google Scholar 

  31. Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: Proceedings of USENIX Security, San Diego, CA, pp. 625–640 (2014)

    Google Scholar 

  32. Vasek, M., Moore, T.: Identifying risk factors for webserver compromise. In: Financial Cryptography and Data Security, pp. 326–345 (2014)

    Google Scholar 

  33. Cho, C.Y., Caballero, J., Grier, C., Paxson, V., Song, D.: Insights from the inside: a view of botnet management from infiltration. In: Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats, San Jose, CA, April 2010

    Google Scholar 

  34. Yu, F., Xie, Y., Ke, Q.: Sbotminer: large scale search bot detection. In: ACM International Conference on Web Search and Data Mining, February 2010

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Flavio Toffalini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Toffalini, F., Abbà, M., Carra, D., Balzarotti, D. (2016). Google Dorks: Analysis, Creation, and New Defenses. In: Caballero, J., Zurutuza, U., Rodríguez, R. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2016. Lecture Notes in Computer Science(), vol 9721. Springer, Cham. https://doi.org/10.1007/978-3-319-40667-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40667-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40666-4

  • Online ISBN: 978-3-319-40667-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics