Domain Classifier: Compromised Machines Versus Malicious Registrations

Le Page, Sophie; Jourdan, Guy-Vincent; Bochmann, Gregor V.; Onut, Iosif-Viorel; Flood, Jason

doi:10.1007/978-3-030-19274-7_20

Domain Classifier: Compromised Machines Versus Malicious Registrations

Sophie Le Page¹⁷,
Guy-Vincent Jourdan¹⁷,
Gregor V. Bochmann¹⁷,
Iosif-Viorel Onut¹⁸ &
…
Jason Flood¹⁹

Conference paper
First Online: 26 April 2019

1787 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11496))

Abstract

In “phishing attacks”, phishing websites disguised as trustworthy websites attempt to steal sensitive information. Remediation and mitigation options differ depending on whether the phishing website is hosted on a legitimate but compromised domain, in which case the domain owner is also a victim, or whether the domain itself is maliciously registered. We accordingly attempt to tackle here the important question of classifying known phishing sites as either compromised or maliciously registered. Following the recent adoption of GDPR standards now putting off-limits any personal data, few relevant literature criteria still satisfy those standards. We propose here a machine-learning based domain classifier, introducing nine novel features which exploit the internet presence and history of a domain, using only publicly available information. Evaluation of our domain classifier was performed with a corpus of phishing websites hosted on over 1,000 compromised domains and 10,000 malicious domains. In the randomized evaluation, our domain classifier achieved over 92% accuracy with under 8% false positive rate, with compromised cases as the positive class. We have also collected over 180,000 phishing website instances over the past 3 years. Using our classifier we show that 73% of the websites hosting attacks are compromised while the remaining 27% belong to the attackers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

APWG: Phishing Activity Trends Report 1st Half 2017. bit.ly/2KKTUzw
Google Scholar
APWG: Phishing Activity Trends Report 1st Quarter in 2016. bit.ly/1qNLrk5
Google Scholar
APWG: Phishing Activity Trends Report 1st Quarter in 2018. bit.ly/2HfK0Ik
Google Scholar
APWG: Phishing Activity Trends Report 3rd Quarter in 2018. bit.ly/2VTVYuh
Google Scholar
APWG: Trends and Domain Name Use in 2016. bit.ly/2TvHyE6
Google Scholar
Catakoglu, O., Balduzzi, M., Balzarotti, D.: Automatic extraction of indicators of compromise for web applications. In: Proceedings of the 25th International Conference on World Wide Web, pp. 333–343. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Corona, I., et al.: DeltaPhish: detecting phishing webpages in compromised websites. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10492, pp. 370–388. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66402-6_22
Chapter Google Scholar
Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Advances in Neural Information Processing Systems, pp. 313–320 (2004)
Google Scholar
Cui, Q., Jourdan, G.V., Bochmann, G.V., Couturier, R., Onut, I.V.: Tracking phishing attacks over time. In: Proceedings of the 26th International Conference on World Wide Web. pp. 667–676. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Forbes: The Internet Archive Behind the Scenes (2016). bit.ly/2CjomPa
Google Scholar
Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)
Article Google Scholar
Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: Predator: proactive recognition and elimination of domain abuse at time-of-registration. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1568–1579. ACM (2016)
Google Scholar
Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 20 (2017)
Article Google Scholar
Krogh, A., Hertz, J.A.: Generalization in a linear perceptron in the presence of noise. J. Phys. A: Math. Gen. 25(5), 1135 (1992)
Article MathSciNet Google Scholar
Liu, D., Li, Z., Du, K., Wang, H., Liu, B., Duan, H.: Don’t let one rotten apple spoil the whole barrel: towards automated detection of shadowed domains. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 537–552. ACM (2017)
Google Scholar
Moore, T., Clayton, R.: Evil searching: compromise and recompromise of internet hosts for phishing. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 256–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03549-4_16
Chapter Google Scholar
Moore, T., Clayton, R.: The impact of incentives on notice and take-down. In: Johnson, M.E., et al. (eds.) Managing Information Risk and the Economics of Security, pp. 199–223. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09762-6_10
Chapter Google Scholar
PhishLabs: Threat intelligence & mitigation solutions (2019). https://www.phishlabs.com/
Quora: Why Does Quora Block the Wayback Machine from Accessing It (2016). bit.ly/2XSbeKa
Google Scholar
Thelwall, M., Vaughan, L.: A fair history of the web? Examining country balance in the internet archive. Libr. Inf. Sci. Res. 26(2), 162–176 (2004)
Article Google Scholar
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21:1–21:28 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, University of Ottawa, Ottawa, Canada
Sophie Le Page, Guy-Vincent Jourdan & Gregor V. Bochmann
IBM Centre for Advanced Studies, Ottawa, Canada
Iosif-Viorel Onut
IBM Security Data Matrices, Dublin, Ireland
Jason Flood

Authors

Sophie Le Page
View author publications
You can also search for this author in PubMed Google Scholar
Guy-Vincent Jourdan
View author publications
You can also search for this author in PubMed Google Scholar
Gregor V. Bochmann
View author publications
You can also search for this author in PubMed Google Scholar
Iosif-Viorel Onut
View author publications
You can also search for this author in PubMed Google Scholar
Jason Flood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Guy-Vincent Jourdan , Gregor V. Bochmann , Iosif-Viorel Onut or Jason Flood .

Editor information

Editors and Affiliations

Novosibirsk State Technical University, Novosibirsk, Russia
Maxim Bakaev
Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
In-Young Ko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le Page, S., Jourdan, GV., Bochmann, G.V., Onut, IV., Flood, J. (2019). Domain Classifier: Compromised Machines Versus Malicious Registrations. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-19274-7_20
Published: 26 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19273-0
Online ISBN: 978-3-030-19274-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics