Abstract
In “phishing attacks”, phishing websites disguised as trustworthy websites attempt to steal sensitive information. Remediation and mitigation options differ depending on whether the phishing website is hosted on a legitimate but compromised domain, in which case the domain owner is also a victim, or whether the domain itself is maliciously registered. We accordingly attempt to tackle here the important question of classifying known phishing sites as either compromised or maliciously registered. Following the recent adoption of GDPR standards now putting off-limits any personal data, few relevant literature criteria still satisfy those standards. We propose here a machine-learning based domain classifier, introducing nine novel features which exploit the internet presence and history of a domain, using only publicly available information. Evaluation of our domain classifier was performed with a corpus of phishing websites hosted on over 1,000 compromised domains and 10,000 malicious domains. In the randomized evaluation, our domain classifier achieved over 92% accuracy with under 8% false positive rate, with compromised cases as the positive class. We have also collected over 180,000 phishing website instances over the past 3 years. Using our classifier we show that 73% of the websites hosting attacks are compromised while the remaining 27% belong to the attackers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
APWG: Phishing Activity Trends Report 1st Half 2017. bit.ly/2KKTUzw
APWG: Phishing Activity Trends Report 1st Quarter in 2016. bit.ly/1qNLrk5
APWG: Phishing Activity Trends Report 1st Quarter in 2018. bit.ly/2HfK0Ik
APWG: Phishing Activity Trends Report 3rd Quarter in 2018. bit.ly/2VTVYuh
APWG: Trends and Domain Name Use in 2016. bit.ly/2TvHyE6
Catakoglu, O., Balduzzi, M., Balzarotti, D.: Automatic extraction of indicators of compromise for web applications. In: Proceedings of the 25th International Conference on World Wide Web, pp. 333–343. International World Wide Web Conferences Steering Committee (2016)
Corona, I., et al.: DeltaPhish: detecting phishing webpages in compromised websites. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10492, pp. 370–388. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66402-6_22
Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Advances in Neural Information Processing Systems, pp. 313–320 (2004)
Cui, Q., Jourdan, G.V., Bochmann, G.V., Couturier, R., Onut, I.V.: Tracking phishing attacks over time. In: Proceedings of the 26th International Conference on World Wide Web. pp. 667–676. International World Wide Web Conferences Steering Committee (2017)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Forbes: The Internet Archive Behind the Scenes (2016). bit.ly/2CjomPa
Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)
Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: Predator: proactive recognition and elimination of domain abuse at time-of-registration. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1568–1579. ACM (2016)
Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 20 (2017)
Krogh, A., Hertz, J.A.: Generalization in a linear perceptron in the presence of noise. J. Phys. A: Math. Gen. 25(5), 1135 (1992)
Liu, D., Li, Z., Du, K., Wang, H., Liu, B., Duan, H.: Don’t let one rotten apple spoil the whole barrel: towards automated detection of shadowed domains. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 537–552. ACM (2017)
Moore, T., Clayton, R.: Evil searching: compromise and recompromise of internet hosts for phishing. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 256–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03549-4_16
Moore, T., Clayton, R.: The impact of incentives on notice and take-down. In: Johnson, M.E., et al. (eds.) Managing Information Risk and the Economics of Security, pp. 199–223. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09762-6_10
PhishLabs: Threat intelligence & mitigation solutions (2019). https://www.phishlabs.com/
Quora: Why Does Quora Block the Wayback Machine from Accessing It (2016). bit.ly/2XSbeKa
Thelwall, M., Vaughan, L.: A fair history of the web? Examining country balance in the internet archive. Libr. Inf. Sci. Res. 26(2), 162–176 (2004)
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21:1–21:28 (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Le Page, S., Jourdan, GV., Bochmann, G.V., Onut, IV., Flood, J. (2019). Domain Classifier: Compromised Machines Versus Malicious Registrations. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-19274-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19273-0
Online ISBN: 978-3-030-19274-7
eBook Packages: Computer ScienceComputer Science (R0)