Skip to main content

Domain Classifier: Compromised Machines Versus Malicious Registrations

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11496))

Abstract

In “phishing attacks”, phishing websites disguised as trustworthy websites attempt to steal sensitive information. Remediation and mitigation options differ depending on whether the phishing website is hosted on a legitimate but compromised domain, in which case the domain owner is also a victim, or whether the domain itself is maliciously registered. We accordingly attempt to tackle here the important question of classifying known phishing sites as either compromised or maliciously registered. Following the recent adoption of GDPR standards now putting off-limits any personal data, few relevant literature criteria still satisfy those standards. We propose here a machine-learning based domain classifier, introducing nine novel features which exploit the internet presence and history of a domain, using only publicly available information. Evaluation of our domain classifier was performed with a corpus of phishing websites hosted on over 1,000 compromised domains and 10,000 malicious domains. In the randomized evaluation, our domain classifier achieved over 92% accuracy with under 8% false positive rate, with compromised cases as the positive class. We have also collected over 180,000 phishing website instances over the past 3 years. Using our classifier we show that 73% of the websites hosting attacks are compromised while the remaining 27% belong to the attackers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://safebrowsing.google.com/.

  2. 2.

    https://support.microsoft.com/en-us/help/17443/windows-internet-explorer-smartscreen-filter-faq.

  3. 3.

    http://archive.org/.

  4. 4.

    https://www.freenom.com/.

  5. 5.

    http://www.alexa.com/.

  6. 6.

    https://scikit-learn.org/.

References

  1. APWG: Phishing Activity Trends Report 1st Half 2017. bit.ly/2KKTUzw

    Google Scholar 

  2. APWG: Phishing Activity Trends Report 1st Quarter in 2016. bit.ly/1qNLrk5

    Google Scholar 

  3. APWG: Phishing Activity Trends Report 1st Quarter in 2018. bit.ly/2HfK0Ik

    Google Scholar 

  4. APWG: Phishing Activity Trends Report 3rd Quarter in 2018. bit.ly/2VTVYuh

    Google Scholar 

  5. APWG: Trends and Domain Name Use in 2016. bit.ly/2TvHyE6

    Google Scholar 

  6. Catakoglu, O., Balduzzi, M., Balzarotti, D.: Automatic extraction of indicators of compromise for web applications. In: Proceedings of the 25th International Conference on World Wide Web, pp. 333–343. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  7. Corona, I., et al.: DeltaPhish: detecting phishing webpages in compromised websites. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10492, pp. 370–388. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66402-6_22

    Chapter  Google Scholar 

  8. Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Advances in Neural Information Processing Systems, pp. 313–320 (2004)

    Google Scholar 

  9. Cui, Q., Jourdan, G.V., Bochmann, G.V., Couturier, R., Onut, I.V.: Tracking phishing attacks over time. In: Proceedings of the 26th International Conference on World Wide Web. pp. 667–676. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  10. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  11. Forbes: The Internet Archive Behind the Scenes (2016). bit.ly/2CjomPa

    Google Scholar 

  12. Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)

    Article  Google Scholar 

  13. Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: Predator: proactive recognition and elimination of domain abuse at time-of-registration. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1568–1579. ACM (2016)

    Google Scholar 

  14. Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 20 (2017)

    Article  Google Scholar 

  15. Krogh, A., Hertz, J.A.: Generalization in a linear perceptron in the presence of noise. J. Phys. A: Math. Gen. 25(5), 1135 (1992)

    Article  MathSciNet  Google Scholar 

  16. Liu, D., Li, Z., Du, K., Wang, H., Liu, B., Duan, H.: Don’t let one rotten apple spoil the whole barrel: towards automated detection of shadowed domains. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 537–552. ACM (2017)

    Google Scholar 

  17. Moore, T., Clayton, R.: Evil searching: compromise and recompromise of internet hosts for phishing. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 256–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03549-4_16

    Chapter  Google Scholar 

  18. Moore, T., Clayton, R.: The impact of incentives on notice and take-down. In: Johnson, M.E., et al. (eds.) Managing Information Risk and the Economics of Security, pp. 199–223. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09762-6_10

    Chapter  Google Scholar 

  19. PhishLabs: Threat intelligence & mitigation solutions (2019). https://www.phishlabs.com/

  20. Quora: Why Does Quora Block the Wayback Machine from Accessing It (2016). bit.ly/2XSbeKa

    Google Scholar 

  21. Thelwall, M., Vaughan, L.: A fair history of the web? Examining country balance in the internet archive. Libr. Inf. Sci. Res. 26(2), 162–176 (2004)

    Article  Google Scholar 

  22. Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21:1–21:28 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Guy-Vincent Jourdan , Gregor V. Bochmann , Iosif-Viorel Onut or Jason Flood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Le Page, S., Jourdan, GV., Bochmann, G.V., Onut, IV., Flood, J. (2019). Domain Classifier: Compromised Machines Versus Malicious Registrations. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-19274-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-19273-0

  • Online ISBN: 978-3-030-19274-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics