MalShoot: Shooting Malicious Domains Through Graph Embedding on Passive DNS Data

Peng, Chengwei; Yun, Xiaochun; Zhang, Yongzheng; Li, Shuhao

doi:10.1007/978-3-030-12981-1_34

MalShoot: Shooting Malicious Domains Through Graph Embedding on Passive DNS Data

Chengwei Peng^19,20,
Xiaochun Yun^19,21,
Yongzheng Zhang²¹ &
…
Shuhao Li²¹

Conference paper
First Online: 07 February 2019

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 268))

Abstract

Malicious domains are key components to a variety of illicit online activities. We propose MalShoot, a graph embedding technique for detecting malicious domains using passive DNS database. We base its design on the intuition that a group of domains that share similar resolution information would have the same property, namely malicious or benign. MalShoot represents every domain as a low-dimensional vector according to its DNS resolution information. It automatically maps the domains that share similar resolution information to similar vectors while unrelated domains to distant vectors. Based on the vectorized representation of each domain, a machine-learning classifier is trained over a labeled dataset and is further applied to detect other malicious domains. We evaluate MalShoot using real-world DNS traffic collected from three ISP networks in China over two months. The experimental results show our approach can effectively detect malicious domains with a 96.08% true positive rate and a 0.1% false positive rate. Moreover, MalShoot scales well even in large datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Grier, C., Thomas, K., Paxson, V., Zhang, M.: @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)
Google Scholar
Plohmann, D., Yakdan, K., Klatt, M., Bader, J., Gerhards-Padilla, E.: A comprehensive measurement study of domain generating malware. In: USENIX Security Symposium, pp. 263–278 (2016)
Google Scholar
Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: Presented as Part of the 21st USENIX Security Symposium (USENIX Security 2012), pp. 491–506 (2012)
Google Scholar
Cisco 2016 annual security (2016). http://www.cisco.com/c/m/en_us/offers/sc04/2016-annual-security-report/index.html
Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., Feamster, N.: Building a dynamic reputation system for DNS. In: USENIX Security Symposium, pp. 273–290 (2010)
Google Scholar
Bilge, L., Sen, S., Balzarotti, D., Kirda, E., Kruegel, C.: Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans. Inf. Syst. Secur. (TISSEC) 16(4), 14 (2014)
Article Google Scholar
Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou II, N., Dagon, D.: Detecting malware domains at the upper DNS hierarchy. In: USENIX Security Symposium, vol. 11, pp. 1–16 (2011)
Google Scholar
Rahbarinia, B., Perdisci, R., Antonakakis, M.: Segugio: efficient behavior-based tracking of malware-control domains in large ISP networks. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 403–414. IEEE (2015)
Google Scholar
Stinson, E., Mitchell, J.C.: Towards systematic evaluation of the evadability of bot/botnet detection methods. WOOT 8, 1–9 (2008)
Google Scholar
Khalil, I., Yu, T., Guan, B.: Discovering malicious domains through passive DNS data graph analysis. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 663–674. ACM (2016)
Google Scholar
Peng, C., Yun, X., Zhang, Y., Li, S., Xiao, J.: Discovering malicious domains through alias-canonical graph. In: 2017 IEEE Trustcom/BigDataSE/ICESS, pp. 225–232. IEEE (2017)
Google Scholar
Weimer, F.: Passive DNS replication. In: FIRST Conference on Computer Security Incident, p. 98 (2005)
Google Scholar
Manadhata, P.K., Yadav, S., Rao, P., Horne, W.: Detecting malicious domains via graph inference. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712, pp. 1–18. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11203-9_1
Chapter Google Scholar
Cao, S., Lu, W., Xu, Q.: Grarep: learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 891–900. ACM (2015)
Google Scholar
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114. ACM (2016)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Google Scholar
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Recht, B., Re, C., Wright, S., Niu, F.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 693–701 (2011)
Google Scholar
Malware domain block list (2018). http://www.malwaredomains.com
Phishtank (2018). http://www.phishtank.com
Openphish (2018). https://openphish.com
Alexa top 1 million (2017). http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
Zeus domain blocklist (2016). https://zeustracker.abuse.ch/blocklist.php
Porras, P.A., Saïdi, H., Yegneswaran, V.: A foray into Conficker’s logic and rendezvous points. In: LEET (2009)
Google Scholar
Google safe browsing (2018). https://www.google.com/transparencyreport/safebrowsing/diagnostic/
CNCERT/CC (2018). https://www.cert.org.cn
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The research leading to these results has received funding from the National Key Research and Development Program of China (No. 2016YFB0801502) and National Natural Science Foundation of China (No. U1736218). We thank the CNCERT/CC for providing the DNS data used in our experiments. The corresponding author is Xiaochun Yun.

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Chengwei Peng & Xiaochun Yun
University of Chinese Academy of Sciences, Beijing, China
Chengwei Peng
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Xiaochun Yun, Yongzheng Zhang & Shuhao Li

Authors

Chengwei Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Yun
View author publications
You can also search for this author in PubMed Google Scholar
Yongzheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuhao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochun Yun .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
University of West London, London, UK
Xinheng Wang
Hangzhou Dianzi University, Hangzhou Shi, Zhejiang, China
Yuyu Yin
London South Bank University, London, UK
Muddesar Iqbal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, C., Yun, X., Zhang, Y., Li, S. (2019). MalShoot: Shooting Malicious Domains Through Graph Embedding on Passive DNS Data. In: Gao, H., Wang, X., Yin, Y., Iqbal, M. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 268. Springer, Cham. https://doi.org/10.1007/978-3-030-12981-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-12981-1_34
Published: 07 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12980-4
Online ISBN: 978-3-030-12981-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics