Skip to main content

Human Factors in Homograph Attack Recognition

  • Conference paper
  • First Online:
Applied Cryptography and Network Security (ACNS 2020)

Abstract

Homograph attack is a way that attackers deceive victims about which website domain name they are communicating with by exploiting the fact that many characters look alike. The attack becomes serious and is raising broad attention when recently many brand domains have been attacked such as Apple Inc., Adobe Inc., Lloyds Bank, etc. We first design a survey of human demographics, brand familiarity, and security backgrounds and apply it to 2,067 participants. We build a regression model to study which factors affect participants’ ability in recognizing homograph domains. We find that for different levels of visual similarity, the participants exhibit different abilities. 13.95% of participants can recognize non-homographs while 16.60% of participants can recognize homographs whose the visual similarity with the target brand domains is under 99.9%; but when the similarity increases to 99.9%, the number of participants who can recognize homographs significantly drops down to only 0.19%; and for the homographs with 100% of visual similarity, there is no way for the participants to recognize. We also find that female participants tend to recognize homographs better the male but male participants tend to able to recognize non-homographs better than females. Security knowledge is a significant factor affecting both homographs and non-homographs; surprisingly, people who have strong security knowledge tend to be able to recognize homographs but not non-homographs. Furthermore, people who work or are educated in computer science or computer engineering do not appear as a factor affecting the ability in recognizing homographs; however, interestingly, right after they are explained about the homograph attack, people who work or are educated in computer science or computer engineering are the ones who can capture the situation the most quickly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    International Domain Names (IDNs) contain non-ASCII characters (e.g., Arabic, Chinese, Cyrillic alphabet). Therefore, they are encoded to ASCII strings using Punycode transcription known as IDNA encoding and appear under ASCII strings starting with “xn–". For example, the domain xn–ggle-0qaa.com is displayed as g\(\tilde{\text {o}}\tilde{\text {o}}\)gle.com.

  2. 2.

    The Appendix in this paper describes the questions in English but the survey is designed in Japanese language and distributed to Japanese, so there is no translation problem for the preservation of the survey’s reliability and structure validity.

  3. 3.

    Although there are lucky and neutral answers, they actually happened (these answers are the actual samples in the dataset) and we would want to know how the factors are in this extra analysis.

References

  1. Evgeniy, G., Alex, G.: The homograph attack. Commun. ACM 45(2), 128–129 (2002)

    Google Scholar 

  2. Zheng, X.: Phishing with unicode domains (2017). https://www.xudongz.com/blog/2017/idn-phishing/?_ga=2.53371112.1302505681.1542677803-1987638994.1542677803

  3. Michael, M.: IDN homograph attack spreading betabot backdoor (2017). https://threatpost.com/idn-homograph-attack-spreading-betabot-backdoor/127839/

  4. Graham, C.: Lloydsbank, IIoydsbank - researcher highlights the homographic phishing problem (2015). https://www.grahamcluley.com/lloydsbank-homographic-phishing-problem/

  5. NTT-Security: IDN Homograph Attacks (2017). https://www.solutionary.com/resource-center/blog/2017/01/idn-homograph-attacks/

  6. Baojun, L., et al.: A reexamination of internationalized domain names: the good, the bad and the ugly. In: 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2018) (2018)

    Google Scholar 

  7. Tian, K., Steve, J., Hang, H., Danfeng, Y., Gang, W.: Needle in a haystack: tracking down elite phishing domains in the wild. In: Internet Measurement Conference (IMC 2018), pp. 429–442 (2018)

    Google Scholar 

  8. Yuta, S., Daiki, C., Mitsuaki, A., Shigeki, G.: Detecting homograph IDNs using OCR. In: 46th Asia Pacific Advanced Network (APAN) (2018)

    Google Scholar 

  9. Pieter, A., Wouter, J., Frank, P., Nick, N.: Seven months’ worth of mistakes: a longitudinal study of typosquatting abuse. In: Proceedings of the 22nd Network and Distributed System Security Symposium (NDSS 2015). Internet Society (2015)

    Google Scholar 

  10. Thao, T.P., Sawaya, Y., Nguyen, S.H.Q., Yamada, A., Omote, K., Kubota, A.: Hunting brand domain forgery: a scalable classification for homograph attack. In: Dhillon, G., Karlsson, F., Hedström, K., Zúquete, A. (eds.) SEC 2019. IAICT, vol. 562, pp. 3–18. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22312-0_1

    Chapter  Google Scholar 

  11. Unicode-Inc.: Unicode Security Mechanisms for UTS #39 (2018). http://www.unicode.org/Public/security/11.0.0/confusables.txt

  12. Mark, M.: Chrome and firefox phishing attack uses domains identical to known safe sites (2017). https://www.wordfence.com/blog/2017/04/chrome-firefox-unicode-phishing/

  13. Apple Inc.: About Safari International Domain Name support (2016). https://support.apple.com/kb/TA22996?locale=en_US

  14. Microsoft: Changes to IDN in IE7 to now allow mixing of scripts (2006). https://blogs.msdn.microsoft.com/ie/2006/07/31/changes-to-idn-in-ie7-to-now-allow-mixing-of-scripts/

  15. Opera: Advisory: Internationalized domain names (IDN) can be used for spoofing (2007). https://web.archive.org/web/20070219070826/www.opera.com/support/search/view/788/

  16. IDN World Report: Internationalised domains show negative growth in 2017 (2017). https://idnworldreport.eu/

  17. Marcin, U.: Dnstwist: domain name permutation engine for detecting typo squatting, phishing and corporate espionage (2018). https://github.com/elceef/dnstwist

  18. Timo, F.: IDN Homograph Attack (2017). https://github.com/timofurrer/idn-homograph-attack

  19. Alisson, M., Vandre, A.: EvilURL: generate unicode evil domains for IDN homograph attack and detect them (2018). https://github.com/UndeadSec/EvilURL

  20. Remco, V.: Homographs: brutefind homographs within a font (2017). https://github.com/dutchcoders/homographs

  21. Domain Name Generator. https://instantdomainsearch.com/domain/generator/

  22. DNPedia, Search Domain Zones. https://dnpedia.com/tlds/search.php

  23. Adrian, C.: Homoglyph Attack Generator. http://www.irongeek.com/homoglyph-attack-generator.php

  24. Timothy, K., Mary, J.A., Bennett, B.: Statistical models for predicting threat detection from human behavior. Front Psychol. 9, 466 (2018). https://doi.org/10.3389/fpsyg.2018.00466

    Article  Google Scholar 

  25. Yukiko, S., Mahmood, S., Nicolas, C., Ayumu, K., Akihiro, N., Akira, Y.: Self-confidence trumps knowledge: a cross-cultural study of security behavior. In: Conference on Human Factors in Computing Systems, pp. 2202–2214 (2017)

    Google Scholar 

  26. Serge, E., Eyal, P.: Scaling the security wall: developing a security behavior intentions scale (SeBIS). In: 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI 2015), pp. 2873–2882 (2015). http://dx.doi.org/10.1145/2702123.2702249

  27. Iacovos, K., Angela, S.: Security education against phishing: a modest proposal for a major rethink. IEEE Secur. Priv. 10(2), 24–32 (2012)

    Article  Google Scholar 

  28. Mahmood, S., Jumpei, U., Nicolas, C., Ayumu, K., Akira, Y.: Predicting impending exposure to malicious content from user behavior. In: 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS 2018), pp. 1487–1501 (2018)

    Google Scholar 

  29. Sauvik, D., Tiffany, H.J.K., Laura, A.D., Jason, I.H.: The effect of social influence on security sensitivity. In: 10th USENIX Conference on Usable Privacy and Security (SOUPS 2014), pp. 143–157 (2014)

    Google Scholar 

  30. Erika, C., Adrienne, P.F., Vyas, S., David, W.: Measuring user confidence in smartphone security and privacy. In: Eighth Symposium on Usable Privacy and Security (SOUPS 2012) (2012)

    Google Scholar 

  31. Iulia, I., Rob, R., Sunny, C.: No one can hack my mind: comparing expert and non-expert security practices. In: 11th USENIX Conference on Usable Privacy and Security (SOUPS 2015), pp. 327–346 (2015)

    Google Scholar 

  32. Adrienne, P.F., Elizabeth, H., Serge, E.: Android permissions: user attention, comprehension, and behavior. In: Eighth Symposium on Usable Privacy and Security (SOUPS 2012) (2012)

    Google Scholar 

  33. Zhou, W., Alan, C.B., Hamid, R.S., Eero, S.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  34. Ericsson, K.A., Prietula, M.J., Cokely, E.T.: The making of an expert. Harv. Bus. Rev. 85(7–8), 114–21 (2007)

    Google Scholar 

  35. Gigerenzer, G., Gaissmaier, W.: Heuristic decision making. Annu. Rev. Psychol. 62(1), 451–482 (2011)

    Article  Google Scholar 

  36. Klein, G.: A naturalistic decision making perspective on studying intuitive decision making. J. Appl. Res. Mem. Cogn. 4(3), 164–168 (2015)

    Article  Google Scholar 

  37. Richard, M.: The Story of Mathematics. Princeton University Press, Princeton (2004). ISBN 9780691120461

    MATH  Google Scholar 

  38. Derrick, B., Toher, D., White, P.: How to compare the means of two samples that include paired observations and independent observations: a companion to Derrick, Russ, Toher and White. Quant. Methods Psychol. 13(2), 120–126 (2017). https://doi.org/10.20982/tqmp.13.2.p120

    Article  Google Scholar 

  39. ComScore: The Japan Digital Audience Report in 2015 (2016). https://www.comscore.com/layout/set/popup/Request/Presentations/2015/2015-Japan-Digital-Audience-Report?req=slides&pre=2015+Japan+Digital+Audience+Report

  40. Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)

    Article  Google Scholar 

  41. Jerry, J.V., Beaman, J., Sponarski, C.: Rethinking internal consistency in Cronbach’s alpha. Leis. Sci. 39(2), 163–173 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tran Phuong Thao .

Editor information

Editors and Affiliations

Appendices

A Appendix: Security Behavior

The question of security behavior consists of the following sixteen sub-questions:

  1. 1.

    I set my computer screen to automatically lock if I don’t use it for a prolonged period of time.

  2. 2.

    I use a password/passcode to unlock my laptop or tablet.

  3. 3.

    I manually lock my computer screen when I step away from it.

  4. 4.

    I use a PIN or passcode to unlock my mobile phone.

  5. 5.

    I change my passwords even if it is not needed.

  6. 6.

    I use different passwords for different accounts that I have.

  7. 7.

    When I create a new online account, I try to use a password that goes beyond the site’s minimum.

  8. 8.

    I include special characters in my password even if it’s not required. requirements.

  9. 9.

    When someone sends me a link, I open it only after verifying where it goes.

  10. 10.

    I know what website I’m visiting by looking at the URL bar, rather than by the website’s look and feel.

  11. 11.

    I verify that information will be sent securely (e.g., SSL, “https://", a lock icon) before I submit it to websites.

  12. 12.

    When browsing websites, I mouseover links to see where they go, before clicking them.

  13. 13.

    If I discover a security problem, I fix or report it rather than assuming somebody else will.

  14. 14.

    When I’m prompted about a software update, I install it right away.

  15. 15.

    I try to make sure that the programs I use are up-to-date.

  16. 16.

    I verify that my anti-virus software has been regularly updating itself.

Answer Options. There are five answer options for each sub-question. The order numbers are also the actual values used in the experiment.

  1. 1.

    Not at all

  2. 2.

    Not much

  3. 3.

    Sometimes

  4. 4.

    Often

  5. 5.

    Always.

B Appendix: Security Knowledge

The question of security knowledge consists of the following eighteen sub-questions:

  1. 1.

    My Internet provider and location can be disclosed from my IP address.

  2. 2.

    My telephone number can be disclosed from my IP addresses.

  3. 3.

    The web browser information of my device can be disclosed to the operators of websites.

  4. 4.

    Since Wi-Fi networks in coffee shops are secured by the coffee shop owners, I can use them to send sensitive data such as credit card information.

  5. 5.

    Password comprised of random characters are harder for attackers to guess than passwords comprised of common words and phrases.

  6. 6.

    If I receive an email that tells me to change my password, and links me to the web page, I should change my password immediately.

  7. 7.

    My devices are safe from being infected while browsing the web because web browsers only display information.

  8. 8.

    It is impossible to confirm whether secure communication is being used between my device and a website.

  9. 9.

    My information can be stolen if a website that I visit masquerades as a famous website (e.g., amazon.com).

  10. 10.

    I may suffer from monetary loss if a website that I visit masquerades as a famous website.

  11. 11.

    My devices and accounts may be put at risk if I make a typing mistake while entering the address of a website.

  12. 12.

    My IP address is secret and it is unsafe to share it with anyone.

  13. 13.

    If my web browser does not show a green lock when I visit a website, then I can deduce that the website it is malicious.

  14. 14.

    It is safe to open links that appear in emails in my inbox.

  15. 15.

    It is safe to open attachments received via email.

  16. 16.

    I use private browsing mode to protect my machine from being infected.

  17. 17.

    It is safe to use anti-virus software downloaded through P2P file sharing services.

  18. 18.

    Machines are safe from infections unless participants actively download malware.

Answer Options. There are two answer options for each sub-question. The order numbers are also the actual values used in the experiment.

  1. 1.

    True (the value used in the experiment: 1)

  2. 2.

    False (the value used in the experiment: 0)

Correct Answers. The correct answers for the eighteen sub-questions are: true for sub-questions 1, 3, 5, 9, 10, 11, and false for the others.

C Appendix: Security Self-Confidence

The question of security self-confidence consists of the following six sub-questions:

  1. 1.

    I know about countermeasures for keeping the data on my device from being exploited.

  2. 2.

    I know about countermeasures to protect myself from monetary loss when using the Internet.

  3. 3.

    I know about countermeasures to prevent my IDs or Passwords being stolen.

  4. 4.

    I know about countermeasures to prevent my devices from being compromised.

  5. 5.

    I know about countermeasures to protect me from being deceived by fake web sites.

  6. 6.

    I know about countermeasures to prevent my data from being stolen during web browsing.

Answer Options. There are five answer options for each sub-question. The order numbers are also the actual values used in the experiment.

  1. 1.

    Not at all

  2. 2.

    Not applicable

  3. 3.

    Neither agree nor disagree

  4. 4.

    Applicable

  5. 5.

    Very applicable

D Appendix: Ability of Homograph Recognition

The question of homograph recognition consists of the following eighteen sub-questions:

  1. 1.

    Domain #1: xn–mazon-zjc.com (displayed as the sample 1 in Fig. 1).

  2. 2.

    Domain #2: amazonaws.com.

  3. 3.

    Domain #3: xn–mazon-3ve.com (displayed as the sample 3 in Fig. 1).

  4. 4.

    Domain #4: xn–gogle-m29a.com (displayed as the sample 4 in Fig. 1).

  5. 5.

    Domain #5: google.com.vn.

  6. 6.

    Domain #6: goole.co.jp.

  7. 7.

    Domain #7: xn–coinbas-z8a.com (displayed as the sample 7 in Fig. 1).

  8. 8.

    Domain #8: wikimedia.org.

  9. 9.

    Domain #9: xn–wikipdia-f1a.org (displayed as the sample 9 in Fig. 1).

  10. 10.

    Domain #10: xn–bookin-n0c.com (displayed as the sample 10 in Fig. 1).

  11. 11.

    Domain #11: jbooking.jp.

  12. 12.

    Domain #12: xn–expeda-fwa.com (displayed as the sample 12 in Fig. 1).

  13. 13.

    Domain #13: expedia.co.jp.

  14. 14.

    Domain #14: xn–paypl-6qa.com (displayed as the sample 14 in Fig. 1).

  15. 15.

    Domain #15: xn–pypal-4ve.com (displayed as the sample 15 in Fig. 1).

  16. 16.

    Domain #16: sex.com.

  17. 17.

    Domain #17: faeceb0ok.com.

  18. 18.

    Domain #18: vi-vn.facebook.com.

Answer Questions. There are two answer options for each sub-question. The order numbers are also the actual values used in the experiment.

  1. 1.

    Homograph (the value used in the experiment: 1)

  2. 2.

    Non-homograph (the value used in the experiment: 0)

Correct Answers. The eighteen domains are displayed respectively in Fig. 1. The correct answers for the eighteen domains are as follows:

  • Homograph: the domains #1, #3, #4, #6, #7, #9, #10, #12, #14, #15, #17.

  • Non-homographs: the others.

The homographs #1 and #3 target to the brand Amazon. The homographs #4 and #6 target to the brand Google. The homograph #7 targets to the brand Coinbase; the homograph #9 targets to the brand Wikipedia. The homograph #10 targets to the brand Booking. The homograph #12 targets to the brand Expedia. The homographs #14 and #15 target to the brand Paypal. The homograph #17 targets to the brand Facebook.

E Appendix: Homograph Explanation

The description about the homograph attack is given as follows:

“Homograph attack is a way that the attackers deceive victims about what domain they are communicating with by exploiting the fact that many domains look alike. There are several kinds of homographs in the wild, we thus synthesize them into 5 categories. The first is visual homograph which uses different characters but visually look alike, for example: facebook.com and facebôok.com. The second is semantic homograph which use synonyms or contextual similar words, for example: facebook.com and markzuckerbergsocialnetwork.com. The third is TLD homograph which uses the same main domain names, but different the top-level-domain (TLD), for example: facebook.com and facebook.biz. The fourth is typosquatting which relies on mistakes such as typos made by Internet users when typing the domain names, for example: facebook.com and faceboook.com. The last is the combination of the previous 4 categories. Also, note that the homographs in which certain characters are inserted or replaced (known as bitsquatting) in the brand domains are listed in the fourth type (typosquatting homograph); for instance, travelgoogle.com targeting to google.com”.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thao, T.P. et al. (2020). Human Factors in Homograph Attack Recognition. In: Conti, M., Zhou, J., Casalicchio, E., Spognardi, A. (eds) Applied Cryptography and Network Security. ACNS 2020. Lecture Notes in Computer Science(), vol 12147. Springer, Cham. https://doi.org/10.1007/978-3-030-57878-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57878-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57877-0

  • Online ISBN: 978-3-030-57878-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics