Towards Homograph-Confusable Domain Name Detection Using Dual-Channel CNN

Yu, Guangxi; Yang, Xinghua; Zhang, Yan; Cui, Huajun; Yang, Huiran; Li, Yang

doi:10.1007/978-3-030-41579-2_32

Guangxi Yu^12,13,
Xinghua Yang¹²,
Yan Zhang^12,13,
Huajun Cui^12,13,
Huiran Yang¹² &
…
Yang Li^12,13

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11999))

Included in the following conference series:

International Conference on Information and Communications Security

2523 Accesses

Abstract

Homograph attack is a common way of phishing attacks, which aims to generate visual spoofing domain names by replacing a single character or combinations of characters. To analyze and detect homograph domain names, former works mainly consider about distance based methods, analyzing edit distance or Euclidean distance between two domain names, or utilize OCR (Optical Character Recognition) technique. However, these methods may not only have a large number of false positive cases, but they also increase processing overhead. In this paper, we proposed a dual-channel CNN classifier with retrieving algorithm of minimum hash (MinHash) and locality sensitive hash (LSH) to detect homograph domain names. The dual-channel CNN classifier was trained to analyze dual-channel domain images. The MinHash and LSH were designed to search domain name with similar characters, which can reduce the large data efficiently. By comparing with other detection methods, our method can distinguish homograph domain names from normal ones effectively, which can achieve 98.5% detection rates. Experiments on DNS real log datasets indicate that MinHash and LSH scheme can perform well in reducing the large data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Holgers, T., Watson, D.E., Gribble, S.D.: Cutting through the confusion: a measurement study of homograph attacks. In: USENIX Annual Technical Conference, General Track, pp. 261–266 (2006)
Google Scholar
Wang, Y.-M., Beck, D., Wang, J., Verbowski, C., Daniels, B.: Strider typo-patrol: discovery and analysis of systematic typo-squatting. SRUTI 6, 2.2–2.3 (2006)
Google Scholar
Nikiforakis, N., Balduzzi, M., Desmet, L., Piessens, F., Joosen, W.: Soundsquatting: uncovering the use of homophones in domain squatting. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 291–308. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13257-0_17
Chapter Google Scholar
Kintis, P., et al.: Hiding in plain sight: a longitudinal study of combosquatting abuse. In: CCS 2017 (2017)
Google Scholar
Sawabe, Y., Chiba, D., Akiyama, M., Goto, S.: Detecting homograph IDNs using OCR. Proc. Asia Pac. Adv. Netw. 46, 56–64 (2018)
Google Scholar
Tian, K., Jan, S.T., Hu, H., Yao, D., Wang, G.: Needle in a haystack: tracking down elite phishing domains in the wild. In: Proceedings of the Internet Measurement Conference 2018, pp. 429–442. ACM (2018)
Google Scholar
Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Detecting homoglyph attacks with a Siamese neural network. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 22–28. IEEE (2018)
Google Scholar
Quinkert, F., Lauinger, T., Robertson, W., Kirda, E., Holz, T.: It’s not what it looks like: measuring attacks and defensive registrations of homograph domains. In: 2019 IEEE Conference on Communications and Network Security (CNS), pp. 259–267. IEEE (2019)
Google Scholar
Liu, T., Zhang, Y., Shi, J., Ya, J., Li, Q., Guo, L.: Towards quantifying visual similarity of domain names for combating typosquatting abuse. In: MILCOM 2016 - 2016 IEEE Military Communications Conference, pp. 770–775. IEEE (2016)
Google Scholar
Black, P.E.: Compute visual similarity of top-level domains (2014). https://hissa.nist.gov/~black/GTLD/
Ya, J., Liu, T., Li, Q., Lv, P., Shi, J., Guo, L.: Fast and accurate typosquatting domains evaluation with Siamese networks. In: MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM), pp. 58–63. IEEE (2018)
Google Scholar
Roshanbin, N., Miller, J.: Finding homoglyphs - a step towards detecting unicode-based visual spoofing attacks. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds.) WISE 2011. LNCS, vol. 6997, pp. 1–14. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24434-6_1
Chapter Google Scholar
Le Pochat, V., Van Goethem, T., Joosen, W.: Funny accents: exploring genuine interest in internationalized domain names. In: Choffnes, D., Barcellos, M. (eds.) PAM 2019. LNCS, vol. 11419, pp. 178–194. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15986-3_12
Chapter Google Scholar
Elsayed, Y., Shosha, A.: Large scale detection of IDN domain name masquerading. In: 2018 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–11. IEEE (2018)
Google Scholar
Levine, J., Hoffman, P.: Variants in second-level names registered in top-level domains (2013)
Google Scholar
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60, 630–659 (2000)
Article MathSciNet Google Scholar
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: ICCV, pp. 2130–2137 (2009)
Google Scholar

Download references

Acknowledgments

The work was supported in part by Innovative Project of Cutting-edge Science and Technology (Grant No. Y750171201).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Guangxi Yu, Xinghua Yang, Yan Zhang, Huajun Cui, Huiran Yang & Yang Li
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Guangxi Yu, Yan Zhang, Huajun Cui & Yang Li

Authors

Guangxi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xinghua Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huajun Cui
View author publications
You can also search for this author in PubMed Google Scholar
Huiran Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinghua Yang .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Jianying Zhou
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Xiapu Luo
Peking University, Beijing, China
Qingni Shen
Institute of Information Engineering, Beijing, China
Zhen Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, G., Yang, X., Zhang, Y., Cui, H., Yang, H., Li, Y. (2020). Towards Homograph-Confusable Domain Name Detection Using Dual-Channel CNN. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds) Information and Communications Security. ICICS 2019. Lecture Notes in Computer Science(), vol 11999. Springer, Cham. https://doi.org/10.1007/978-3-030-41579-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-41579-2_32
Published: 18 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41578-5
Online ISBN: 978-3-030-41579-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics