Guilt-by-Association: Detecting Malicious Entities via Graph Mining

Najafi, Pejman; Sapegin, Andrey; Cheng, Feng; Meinel, Christoph

doi:10.1007/978-3-319-78813-5_5

Pejman Najafi²⁰,
Andrey Sapegin²⁰,
Feng Cheng²⁰ &
…
Christoph Meinel²⁰

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 238))

Included in the following conference series:

International Conference on Security and Privacy in Communication Systems

1652 Accesses
1 Citations

Abstract

In this paper, we tackle the problem of detecting malicious domains and IP addresses using graph inference. In this regard, we mine proxy and DNS logs to construct an undirected graph in which vertices represent domain and IP address nodes, and the edges represent relationships describing an association between those nodes. More specifically, we investigate three main relationships: subdomainOf, referredTo, and resolvedTo. We show that by providing minimal ground truth information, it is possible to estimate the marginal probability of a domain or IP node being malicious based on its association with other malicious nodes. This is achieved by adopting belief propagation, i.e., an efficient and popular inference algorithm used in probabilistic graphical models. We have implemented our system in Apache Spark and evaluated using one day of proxy and DNS logs collected from a global enterprise spanning over 2 terabytes of disk space. In this regard, we show that our approach is not only efficient but also capable of achieving high detection rate (96% TPR) with reasonably low false positive rates (8% FPR). Furthermore, it is also capable of fixing errors in the ground truth as well as identifying previously unknown malicious domains and IP addresses. Our proposal can be adopted by enterprises to increase both the quality and the quantity of their threat intelligence and blacklists using only proxy and DNS logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 143.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., Feamster, N.: Building a dynamic reputation system for DNS. In: USENIX Security Symposium, pp. 273–290 (2010)
Google Scholar
Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou II, N., Dagon, D.: Detecting malware domains at the upper DNS hierarchy. In: USENIX Security Symposium, vol. 11, pp. 1–16 (2011)
Google Scholar
Bilge, L., Kirda, E., Kruegel, C., Balduzzi, M.: Exposure: finding malicious domains using passive DNS analysis. In: NDSS (2011)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1), 107–117 (1998)
Article Google Scholar
Cao, Q., Sirivianos, M., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large scale social online services. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 15. USENIX Association (2012)
Google Scholar
Chau, D.H.P., Nachenberg, C., Wilhelm, J., Wright, A., Faloutsos, C.: Polonium: tera-scale graph mining and inference for malware detection. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 131–142. SIAM (2011)
Chapter Google Scholar
Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. LEET 10, 6 (2010)
Google Scholar
Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Int. J. Comput. Vis. 40(1), 25–47 (2000)
Article Google Scholar
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 576–587. VLDB Endowment (2004)
Chapter Google Scholar
Holz, T., Gorecki, C., Rieck, K., Freiling, F.C.: Measuring and detecting fast-flux service networks. In: NDSS (2008)
Google Scholar
Howard, F.: A closer look at the Angler exploit kit (2015). https://news.sophos.com/en-us/2015/07/21/a-closer-look-at-the-angler-exploit-kit/
Huang, Y., Greve, P.: Large scale graph mining for web reputation inference. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2015)
Google Scholar
Scarfone, K.A., Hoffman, P.: Guidelines on firewalls and firewall policy (2009). https://www.nist.gov/publications/guidelines-firewalls-and-firewall-policy
Kotov, V., Massacci, F.: Anatomy of exploit kits. In: Jürjens, J., Livshits, B., Scandariato, R. (eds.) ESSoS 2013. LNCS, vol. 7781, pp. 181–196. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36563-8_13
Chapter Google Scholar
Koutra, D., Ke, T.-Y., Kang, U., Chau, D.H.P., Pao, H.-K.K., Faloutsos, C.: Unifying guilt-by-association approaches: theorems and fast algorithms. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6912, pp. 245–260. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23783-6_16
Chapter Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. ACM (2009)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 681–688. ACM (2009)
Google Scholar
Manadhata, P.K., Yadav, S., Rao, P., Horne, W.: Detecting malicious domains via graph inference. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712, pp. 1–18. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11203-9_1
Chapter Google Scholar
Manners, D.: The user agent field: analyzing and detecting the abnormal or malicious in your organization (2011)
Google Scholar
Mavrommatis, N.P.P., Monrose, M.A.R.F.: All your iframes point to us (2008)
Google Scholar
McEliece, R.J., MacKay, D.J.C., Cheng, J.F.: Turbo decoding as an instance of pearl’s “belief propagation” algorithm. IEEE J. Sel. Areas Commun. 16(2), 140–152 (1998)
Article Google Scholar
Mockapetris, P.: Domain names - concepts and facilities (1987). https://www.ietf.org/rfc/rfc1034.txt
Mockapetris, P.: Domain names - implementation and specification (1987). https://www.ietf.org/rfc/rfc1034.txt
Murphy, K.P., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 467–475. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Oprea, A., Li, Z., Yen, T.F., Chin, S.H., Alrwais, S.: Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 45–56. IEEE (2015)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Burlington (2014)
MATH Google Scholar
Perdisci, R., Corona, I., Dagon, D., Lee, W.: Detecting malicious flux service networks through passive analysis of recursive DNS traces. In: Annual Computer Security Applications Conference, ACSAC 2009, pp. 311–320. IEEE (2009)
Google Scholar
Rahbarinia, B., Perdisci, R., Antonakakis, M.: Segugio: efficient behavior-based tracking of malware-control domains in large ISP networks. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 403–414. IEEE (2015)
Google Scholar
Rocha, L.: Neutrino exploit kit analysis and threat indicator (2016)
Google Scholar
Tamersoy, A., Roundy, K., Chau, D.H.: Guilt by association: large scale malware detection by mining file-relation graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1524–1533. ACM (2014)
Google Scholar
Weimer, F.: Passive DNS replication. In: First Conference on Computer Security Incident, p. 98 (2005)
Google Scholar
Wu, B., Goel, V., Davison, B.D.: Propagating trust and distrust to demote web spam. MTW 190 (2006)
Google Scholar
Xu, W., Sanders, K., Zhang, Y.: We know it before you do: predicting malicious domains. In: Proceedings of the 2014 Virus Bulletin International Conference, pp. 73–77 (2014)
Google Scholar
Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Trans. Netw. 20(5), 1663–1677 (2012)
Article Google Scholar
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Understanding belief propagation and its generalizations. Exploring Artif. Intell. New Millennium 8, 236–239 (2003)
Google Scholar
Zhang, Y., Hong, J.I., Cranor, L.F.: CANTINA: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648. ACM (2007)
Google Scholar
Zhao, P., Hoi, S.C.: Cost-sensitive online active learning with application to malicious URL detection. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 919–927. ACM (2013)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J., et al.: Semi-supervised learning using Gaussian fields and harmonic functions. ICML 3, 912–919 (2003)
Google Scholar
Zhu, X., Lafferty, J., Rosenfeld, R.: Semi-supervised learning with graphs. Carnegie Mellon University, Language Technologies Institute, School of Computer Science (2005)
Google Scholar
Zou, F., Zhang, S., Rao, W., Yi, P.: Detecting malware based on DNS graph mining. Int. J. Distrib. Sens. Netw. (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Hasso Plattner Institute (HPI), Prof.-Dr.-Helmert-Straße 2-3, 14482, Potsdam, Germany
Pejman Najafi, Andrey Sapegin, Feng Cheng & Christoph Meinel

Authors

Pejman Najafi
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Sapegin
View author publications
You can also search for this author in PubMed Google Scholar
Feng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pejman Najafi .

Editor information

Editors and Affiliations

Wilfrid Laurier University, Waterloo, Ontario, Canada
Xiaodong Lin
University of New Brunswick, Fredericton, New Brunswick, Canada
Ali Ghorbani
University at Buffalo, Buffalo, New York, USA
Kui Ren
Pennsylvania State University, Philadelphia, Pennsylvania, USA
Sencun Zhu
Anhui Normal University, Wuhu, China
Aiqing Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Najafi, P., Sapegin, A., Cheng, F., Meinel, C. (2018). Guilt-by-Association: Detecting Malicious Entities via Graph Mining. In: Lin, X., Ghorbani, A., Ren, K., Zhu, S., Zhang, A. (eds) Security and Privacy in Communication Networks. SecureComm 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 238. Springer, Cham. https://doi.org/10.1007/978-3-319-78813-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-78813-5_5
Published: 11 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78812-8
Online ISBN: 978-3-319-78813-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics