Reading Between the Lines: Content-Agnostic Detection of Spear-Phishing Emails

Gascon, Hugo; Ullrich, Steffen; Stritter, Benjamin; Rieck, Konrad

doi:10.1007/978-3-030-00470-5_4

Hugo Gascon¹⁷,
Steffen Ullrich¹⁸,
Benjamin Stritter¹⁹ &
…
Konrad Rieck¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11050))

Included in the following conference series:

International Symposium on Research in Attacks, Intrusions, and Defenses

5151 Accesses
16 Citations

Abstract

Spear-phishing is an effective attack vector for infiltrating companies and organisations. Based on the multitude of personal information available online, an attacker can craft seemingly legit emails and trick his victims into opening malicious attachments and links. Although anti-spoofing techniques exist, their adoption is still limited and alternative protection approaches are needed. In this paper, we show that a sender leaves content-agnostic traits in the structure of an email. Based on these traits, we develop a method capable of learning profiles for a large set of senders and identifying spoofed emails as deviations thereof. We evaluate our approach on over 700,000 emails from 16,000 senders and demonstrate that it can discriminate thousands of senders, identifying spoofed emails with 90% detection rate and less than 1 false positive in 10,000 emails. Moreover, we show that individual traits are hard to guess and spoofing only succeeds if entire emails of the sender are available to the attacker.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amin, R.M.: Detecting targeted malicious email through supervised classification of persistent threat and recipient oriented features. Ph.D. thesis, George Washington University, Washington, DC, USA (2010). aAI3428188
Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: International Conference on Machine Learning (ICML), pp. 97–104 (2006)
Google Scholar
Buildwith technology lookup. https://builtwith.com. Accessed November 2017
Callas, J., Donnerhacke, L., Finney, H., Shaw, D., Thayer, R.: OpenPGP Message Format. RFC 4880 (Proposed Standard), November 2007. https://doi.org/10.17487/RFC4880. Updated by RFC 5581
Caputo, D.D., Pfleeger, S.L., Freeman, J.D., Johnson, M.E.: Going spear phishing: exploring embedded training and awareness. IEEE Secur. Priv. 12(1), 28–38 (2014)
Article Google Scholar
Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44885-4_5
Chapter Google Scholar
Crocker, D., Hansen, T., Kucherawy, M.: DomainKeys Identified Mail (DKIM) Signatures. RFC 6376 (Internet Standard), September 2011. https://doi.org/10.17487/RFC6376
Lawrence, N.D., Schölkopf, B.: Estimating a kernel fisher discriminant in the presence of label noise. In: ICML, vol. 1, pp. 306–313 (2001)
Google Scholar
Duda, R., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2001)
MATH Google Scholar
Duman, S., Cakmakci, K.K., Egele, M., Robertson, W., Kirda, E.: EmailProfiler: spearphishing filtering with header and stylometric features of emails. In: COMPSAC (2016)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)
MATH Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Foster, I.D., Larson, J., Masich, M., Snoeren, A.C., Savage, S., Levchenko, K.: Security by any other name: on the effectiveness of provider based email security. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 450–464. ACM, New York (2015). https://doi.org/10.1145/2810103.2813607
Freed, N., Borenstein, N.: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. RFC 2045 (Draft Standard), November 1996. https://doi.org/10.17487/RFC2045. Updated by RFCs 2184, 2231, 5335, 6532
Freed, N., Moore, K.: MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations. RFC 2231 (Proposed Standard), November 1997. https://doi.org/10.17487/RFC2231
Gupta, S., Singhal, A., Kapoor, A.: A literature survey on social engineering attacks: phishing attack. In: 2016 International Conference on Computing, Communication and Automation (ICCCA), pp. 537–540. IEEE (2016)
Google Scholar
Han, F., Shen, Y.: Accurate spear phishing campaign attribution and early detection. In: SAC, pp. 2079–2086 (2016)
Google Scholar
Hardy, S., et al.: Targeted threat index: characterizing and quantifying politically-motivated targeted malware. In: USENIX Security, pp. 527–541 (2014)
Google Scholar
Ho, G., et al.: Detecting credential spearphishing attacks in enterprise settings. In: USENIX Security Symposium (2017)
Google Scholar
Trend Micro Incorporated: Spear-Phishing Email: Most Favored APT Attack Bait. Technical report, Trend Micro Inc. (2012)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Technical report 23, LS VIII, University of Dortmund (1997)
Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers (2002)
Google Scholar
Josefsson, S.: The Base16, Base32, and Base64 Data Encodings. RFC 4648 (Proposed Standard), October 2006. https://doi.org/10.17487/RFC4648
Kitterman, S.: Sender Policy Framework (SPF) for Authorizing Use of Domains in Email, Version 1. RFC 7208 (Proposed Standard), April 2014. https://doi.org/10.17487/RFC7208. Updated by RFC 7372
Kucherawy, M., Zwicky, E.: Domain-based Message Authentication, Reporting, and Conformance (DMARC). RFC 7489 (Informational), March 2015. https://doi.org/10.17487/RFC7489
Le Blond, S., Uritesc, A., Gilbert, C.: A look at targeted attacks through the lense of an NGO. In: USENIX Security, pp. 543–558 (2014)
Google Scholar
Lin, E., Aycock, J., Mannan, M.: Lightweight client-side methods for detecting email forgery. In: Lee, D.H., Yung, M. (eds.) WISA 2012. LNCS, vol. 7690, pp. 254–269. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35416-8_18
Chapter Google Scholar
Mori, T., Sato, K., Takahashi, Y., Ishibashi, K.: How is e-mail sender authentication used and misused? In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2011, pp. 31–37. ACM, New York (2011). https://doi.org/10.1145/2030376.2030380
Ramsdell, B., Turner, S.: Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification. RFC 5751 (Proposed Standard), January 2010. https://doi.org/10.17487/RFC5751
Resnick, P.: Internet Message Format. RFC 5322 (Draft Standard), October 2008. https://doi.org/10.17487/RFC5322. Updated by RFC 6854
Rieck, K., Wressnegger, C., Bikadorov, A.: Sally: a tool for embedding strings in vector spaces. J. Mach. Learn. Res. (JMLR) 13(Nov), 3247–3251 (2012)
MathSciNet MATH Google Scholar
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
Stringhini, G., Thonnard, O.: That ain’t you: blocking spearphishing through behavioral modelling. In: Almgren, M., Gulisano, V., Maggi, F. (eds.) DIMVA 2015. LNCS, vol. 9148, pp. 78–97. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20550-2_5
Chapter Google Scholar
Wang, J., Herath, T., Chen, R., Vishwanath, A., Rao, H.R.: Research article phishing susceptibility: an investigation into the processing of a targeted spear phishing email. IEEE Trans. Prof. Commun. 55(4), 345–362 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

TU Braunschweig, Braunschweig, Germany
Hugo Gascon & Konrad Rieck
Genua GmbH, Kirchheim bei München, Germany
Steffen Ullrich
Friedrich Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Benjamin Stritter

Authors

Hugo Gascon
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Ullrich
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Stritter
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Rieck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hugo Gascon .

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Michael Bailey
Ruhr-Universität Bochum, Bochum, Germany
Thorsten Holz
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Manolis Stamatogiannakis
Foundation for Research & Technology – Hellas, Heraklion, Crete, Greece
Sotiris Ioannidis

A Appendix

Tables 4, 5 and 6 provide an overview of the different traits characterizing the behavior, composition and transport of emails, respectively.

Table 4. List of behavior features.

Full size table

Table 5. List of composition features.

Full size table

Table 6. List of transport features.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gascon, H., Ullrich, S., Stritter, B., Rieck, K. (2018). Reading Between the Lines: Content-Agnostic Detection of Spear-Phishing Emails. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham. https://doi.org/10.1007/978-3-030-00470-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-00470-5_4
Published: 07 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00469-9
Online ISBN: 978-3-030-00470-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reading Between the Lines: Content-Agnostic Detection of Spear-Phishing Emails

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation