Abstract
Certificate misissuance is a growing issue in the context of phishing attacks, as it leads inexperienced users to further trust fraudulent websites, if they are equipped with a technically valid certificate. Certificate Transparency (CT) aims at increasing the visibility of such malicious actions by requiring certificate authorities (CAs) to log every certificate they issue in public, tamper-proof, append-only logs. This work introduces Phish-Hook, a novel approach towards detecting phishing websites based on machine learning. Phish-Hook analyses certificates submitted to the CT system based on a conceptually simple, well-understood classification mechanism to effectively attest the phishing likelihood of newly issued certificates. Phish-Hook relies solely on CT log data and foregoes intricate analyses of websites’ source code and traffic. As a consequence, we are able to provide classification results in near real-time and in a resource-efficient way. Our approach advances the state of the art by classifying websites according to five different incremental certificate risk labels, instead of assigning a binary label. Evaluation results demonstrate the effectiveness of our approach, achieving a success rate of over 90%, while requiring fewer, less complex input data, and delivering results in near real-time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aas, J.: The CA’S role in fighting phishing and malware. https://letsencrypt.org/2015/10/29/phishing-and-malware.html. Accessed 29 Apr 2019
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Ca/browser forum baseline requirements documents. https://cabforum.org/baseline-requirements-documents/. Accessed 13 Apr 2019
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002). http://dl.acm.org/citation.cfm?id=1622407.1622416
Homoglyph advanced phishing attacks. https://www.cisco.com/c/en/us/support/docs/security/email-security-appliance/200146-Homoglyph-Advanced-Phishing-Attacks.pdf. Accessed 13 Apr 2019
Unicode Consortium: Recommended confusable mapping for IDN (2015). https://www.unicode.org/Public/security/8.0.0/confusables.txt. Accessed 13 Apr 2019
Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., Polk, W.: RFC 5280: Internet X.509 public key infrastructure certificate and certificate revocation list (CRL) profile. IETF, May 2008
Dong, Z., Kane, K., Camp, L.J.: Detection of rogue certificates from trusted certificate authorities using deep neural networks. ACM Trans. Priv. Secur. (TOPS) 19(2), 5 (2016)
Ghafir, I., Prenosil, V., Hammoudeh, M., Han, L., Raza, U.: Gmalicious SSL certificate detection: a step towards advanced persistent threat defence. In: Proceedings of the International Conference on Future Networks and Distributed Systems, p. 27. ACM (2017)
Hoogstraaten, H.: Black tulip report of the investigation into the DigiNotar certificate authority breach, August 2012
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Kumar, D., et al.: Tracking certificate misissuance in the wild. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 785–798. IEEE (2018)
Laurie, B., Langley, A., Kasper, E.: Certificate transparency. Technical report (2013)
Merkle, R.C.: A digital signature based on a conventional encryption function. In: Pomerance, C. (ed.) CRYPTO 1987. LNCS, vol. 293, pp. 369–378. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-48184-2_32
Mishari, M.A., De Cristofaro, E., Defrawy, K.E., Tsudik, G.: Harvesting SSL certificate data to identify web-fraud. arXiv preprint arXiv:0909.3688 (2009)
Phishiest certificate authorities. https://toolbar.netcraft.com/stats/certificate_authorities. Accessed 29 Apr 2019
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rescorla, E., Dierks, T.: The Transport Layer Security (TLS) Protocol Version 1.2. RFC 5246, August 2008. 10.17487/RFC5246, https://rfc-editor.org/rfc/rfc5246.txt
Scheitle, Q., et al.: The rise of certificate transparency and its implications on the internet ecosystem. In: Proceedings of the Internet Measurement Conference 2018, pp. 343–349. ACM (2018)
Spamhaus: The 10 most abused top level domains. https://www.spamhaus.org/statistics/tlds/. Accessed 30 Apr 2019
Szurdi, J., Kocso, B., Cseh, G., Spring, J., Felegyhazi, M., Kanich, C.: The long “taile” of typosquatting domain names. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 191–206 (2014)
Torroledo, I., Camacho, L.D., Bahnsen, A.C.: Hunting malicious TLS certificates with deep neural networks. In: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pp. 64–73. ACM (2018)
Volkman, E.: 49 percent of phishing sites now use https. Technical report (2018). https://info.phishlabs.com/blog/49-percent-of-phishing-sites-now-use-https
x0rz: Phishing catcher. https://github.com/x0rz/phishing_catcher
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Fasllija, E., Enişer, H.F., Prünster, B. (2019). Phish-Hook: Detecting Phishing Certificates Using Certificate Transparency Logs. In: Chen, S., Choo, KK., Fu, X., Lou, W., Mohaisen, A. (eds) Security and Privacy in Communication Networks. SecureComm 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 305. Springer, Cham. https://doi.org/10.1007/978-3-030-37231-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-37231-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37230-9
Online ISBN: 978-3-030-37231-6
eBook Packages: Computer ScienceComputer Science (R0)