research-article

Exposing the Rat in the Tunnel: Using Traffic Analysis for Tor-based Malware Detection

Authors:

Priyanka Dodia,

Mashael AlSabah,

Tao WangAuthors Info & Claims

CCS '22: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security

Pages 875 - 889

https://doi.org/10.1145/3548606.3560604

Published: 07 November 2022 Publication History

Abstract

Tor~\citetor is the most widely used anonymous communication network with millions of daily users~\citetormetrics. Since Tor provides server and client anonymity, hundreds of malware binaries found in the wild rely on it to hide their presence and hinder Command & Control (C&C) takedown operations. We believe Tor is a paramount tool enabling online freedom and privacy, and blocking it to defend against such malware is infeasible for both users and organizations. In this work, we present effective traffic analysis approaches that can accurately identify Tor-based malware communication. We collect hundreds of Tor-based malware binaries, execute and examine more than 47,000 active encrypted malware connections and compare them with benign browsing traffic. In addition to traditional traffic analysis features (which work at the connection level), we propose global host-level network features to capture peculiar malware communication fingerprints across host logs. Our experiments confirm that our models are able to detect "zero-day'' malware connections with 0.7% FPR even when malware connections constitute less than 5% of Tor traces in the test set. Using multi-labeling approaches, we are able to accurately detect the malware behavior-based classes (grayware, ransomware, etc). Finally, we evaluate the robustness of our models on real-world enterprise logs and show that the classifiers can identify infected hosts even with missing features.

References

[1]

2013. How to Handle Millions of New Tor Clients. https://blog.torproject.org/ how-handle-millions-new-tor-clients/#comment-34624.

[2]

2016. Tor-nonTor Dataset (ISCXTor2016). https://www.unb.ca/cic/datasets/tor. html.

[3]

2017. WannaCry Ransomware Campaign: Threat Details and Risk Management. https://www.fireeye.com/blog/products-and-services/2017/05/wannacryransomware- campaign.html.

[4]

2021. Hybrid Analysis. https://www.hybrid-analysis.com/.

[5]

2021. Tor Hidden Services Deprecation Timeline. https://blog.torproject.org/v2- deprecation-timeline/.

[6]

2021. Tor Metrics. https://metrics.torproject.org.

[7]

2022. Ahmia - Search Tor Hidden Services. https://ahmia.fi/.

[8]

2022. Autogluon Predictors. https://auto.gluon.ai/stable/api/autogluon.predictor. html?highlight=p_value.

[9]

2022. Autogluon Tabular Models (Documentation). https://auto.gluon.ai/stable/ api/autogluon.tabular.models.html?highlight=weighted%20ensemble%20l2.

[10]

2022. AutoGluon Tasks. https://auto.gluon.ai/stable/api/autogluon.task.html.

[11]

2022. BinaryRelevance: scikit-multilearn. http://scikit.ml/api/skmultilearn. problem_transform.br.html.

[12]

2022. ClassifierChains: scikit-multilearn. https://scikit-learn.org/stable/auto_ examples/multioutput/plot_classifier_chain_yeast.html.

[13]

2022. EternnalRocks-The Malware Wiki. https://malwiki.org/index.php?title= EternalRocks.

[14]

2022. Grayware- The Malware Wiki. https://malwiki.org/index.php?title= Grayware.

[15]

2022. LabelPowerset: scikit-multilearn. http://scikit.ml/api/skmultilearn. problem_transform.lp.html#skmultilearn.problem_transform.LabelPowerset.

[16]

2022. Python dpkt. https://dpkt.readthedocs.io/en/latest/.

[17]

2022. Spyware- The Malware Wiki. https://malwiki.org/index.php?title= Spyware.

[18]

2022. SystemBC -- a RAT in the Pipeline. https://blogs.blackberry.com/en/2021/ 06/threat-thursday-systembc-a-rat-in-the-pipeline.

[19]

2022. Top 1M Alexa. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip.

[20]

2022. Trojan- The Malware Wiki. https://malwiki.org/index.php?title=Adware.

[21]

2022. Zeek, An Open Source Network Security Monitoring Tool. https://zeek. org/.

[22]

Bushra A Alahmadi, Enrico Mariconti, Riccardo Spolaor, Gianluca Stringhini, and Ivan Martinovic. 2020. BOTection: Bot Detection by Building Markov Chain Models of Bots Network Behavior. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 652--664.

Digital Library

[23]

Omar Alrawi, Moses Ike, Matthew Pruett, Ranjita Pai Kasturi, Srimanta Barua, Taleb Hirani, Brennan Hill, and Brendan Saltaformaggio. 2021. Forecasting Malware Capabilities From Cyber Attack Memory Images. In 30th USENIX Security Symposium (USENIX Security 21). 3523--3540.

[24]

Omar Alrawi, Charles Lever, Kevin Valakuzhy, Kevin Snow, Fabian Monrose, Manos Antonakakis, et al. 2021. The Circle Of Life: A {Large-Scale} Study of The {IoT} Malware Lifecycle. In 30th USENIX Security Symposium (USENIX Security . 3505--3522.

[25]

Athanasios Avgetidis, Omar Alrawi, Kevin Valakuzhy, Charles Lever, Paul Burbage, Angelos Keromytis, Fabian Monrose, and Manos Antonakakis. 2023. Beyond The Gates: An Empirical Analysis of HTTP-Managed Password Stealers and Operators. In 32nd USENIX Security Symposium (USENIX Security 23).

[26]

Sanjit Bhat, David Lu, Albert Hyukjae Kwon, and Srinivas Devadas. 2019. Varcnn: A Data-efficient Website Fingerprinting Attack based on Deep Learning. (2019).

[27]

Xiang Cai, Xin Cheng Zhang, Brijesh Joshi, and Rob Johnson. 2012. Touching from a Distance: Website Fingerprinting Attacks and Defenses. In Proceedings of the 2012 ACM conference on Computer and communications security. 605--616.

Digital Library

[28]

Pitpimon Choorod and GeorgeWeir. 2021. Tor Traffic Classification Based on Encrypted Payload Characteristics. In 2021 National Computing Colleges Conference (NCCC). IEEE, 1--6.

[29]

Lucian Constantin. 2012. Tor Network Used to Command Skynet Botnet. https://www.computerworld.com/article/2493980/tor-network-used-tocommand- skynet-botnet.html.

[30]

Alfredo Cuzzocrea, Fabio Martinelli, Francesco Mercaldo, and Gianni Vercelli. 2017. Tor Traffic Analysis and Detection via Machine Learning Techniques. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 4474--4480.

[31]

Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The Second- Generation Onion Router. In 13th USENIX Security Symposium (USENIX Security . USENIX Association, San Diego, CA. https://www.usenix.org/conference/ 13th-usenix-security-symposium/tor-second-generation-onion-router

[32]

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. 2020. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv preprint arXiv:2003.06505 (2020).

[33]

Oluwatobi Fajana, Gareth Owenson, and Mihaela Cocea. 2018. Torbot Stalker: Detecting Tor Botnets through Intelligent Circuit Data Analysis. In 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA). IEEE, 1--8.

[34]

Ibrahim Ghafir, Vaclav Prenosil, Mohammad Hammoudeh, Thar Baker, Sohail Jabbar, Shehzad Khalid, and Sardar Jaf. 2018. BotDet: A System for Real Time Botnet Command and Control Traffic Detection. IEEE Access 6 (2018), 38947-- 38958.

[35]

Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. 2008. Botminer: Clustering Analysis of Network Traffic for Protocol-and Structure-independent Botnet Detection. (2008).

[36]

Jamie Hayes and George Danezis. 2016. k-fingerprinting: A Robust ScalableWebsite Fingerprinting Technique. In 25th {USENIX} Security Symposium ({USENIX} Security 16). 1187--1203.

[37]

Jamie Hayes and George Danezis. 2016. k-fingerprinting: A Robust ScalableWebsite Fingerprinting Technique. In 25th {USENIX} Security Symposium ({USENIX} Security 16). 1187--1203.

[38]

Dominik Herrmann, Rolf Wendolsky, and Hannes Federrath. 2009. Website Fingerprinting: Attacking Popular Privacy Enhancing Technologies with the Multinomial Naïve-Bayes Classifier. In Proceedings of the 2009 ACM workshop on Cloud computing security. 31--42.

Digital Library

[39]

Lazaros Alexios Iliadis and Theodoros Kaifas. 2021. Darknet Traffic Classification Using Machine Learning Techniques. In 2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST). IEEE, 1--4.

[40]

Gregoire Jacob, Ralf Hund, Christopher Kruegel, and Thorsten Holz. 2011. JACKSTRAWS: Picking Command and Control Connections from Bot Traffic. In USENIX Security Symposium, Vol. 2011. San Francisco, CA, USA.

Digital Library

[41]

Arash Habibi Lashkari, Gerard Draper-Gil, Mohammad Saiful Islam Mamun, and Ali A Ghorbani. 2017. Characterization of Tor Traffic using Time Based Features. In ICISSp. 253--262.

[42]

Zhen Ling, Junzhou Luo, Kui Wu, Wei Yu, and Xinwen Fu. 2015. Torward: Discovery, Blocking, and Traceback of Malicious Traffic over Tor. IEEE Transactions on Information Forensics and Security 10, 12 (2015), 2515--2530.

Digital Library

[43]

Liming Lu, Ee-Chien Chang, and Mun Choon Chan. 2010. Website Fingerprinting and Identification using Ordered Feature Sequences. In European Symposium on Research in Computer Security. Springer, 199--214.

[44]

Haoyu Ma, Jianqiu Cao, Bo Mi, Darong Huang, Yang Liu, and Zhenyuan Zhang. 2021. Dark Web Traffic Detection Method Based on Deep Learning. In 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS). IEEE, 842--847.

[45]

Brad Miller, Ling Huang, Anthony D. Joseph, and J. Doug Tygar. 2014. I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis. ArXiv abs/1403.0297 (2014).

[46]

Abedelaziz Mohaisen and Omar Alrawi. 2013. Unveiling Zeus: Automated Classification of Malware Samples. In Proceedings of the 22nd International Conference on World Wide Web. 829--832.

Digital Library

[47]

Aziz Mohaisen and Omar Alrawi. 2014. Av-meter: An Evaluation of Antivirus Scans and Labels. In International conference on detection of intrusions and malware, and vulnerability assessment. Springer, 112--131.

[48]

Aziz Mohaisen, Omar Alrawi, Matt Larson, and Danny McPherson. 2013. Towards a Methodical Evaluation of Antivirus Scans and Labels. In InternationalWorkshop on Information Security Applications. Springer, 231--241.

[49]

Aziz Mohaisen, Omar Alrawi, and Manar Mohaisen. 2015. AMAL: High-fidelity, Hehavior-based Automated Malware Analysis and Classification. computers & security 52 (2015), 251--266.

[50]

Aziz Mohaisen, Omar Alrawi, Andrew GWest, and Allison Mankin. 2013. Babble: Identifying Malware by its Dialects. In 2013 IEEE Conference on Communications and Network Security (CNS). IEEE, 407--408.

[51]

Aziz Mohaisen, Andrew G West, Allison Mankin, and Omar Alrawi. 2014. Chatter: Classifying Malware Families using System Event Ordering. In 2014 IEEE Conference on Communications and Network Security. IEEE, 283--291.

[52]

Palo Alto Networks. 2012. Threat Assessment: Egregor Ransomware. https: //unit42.paloaltonetworks.com/egregor-ransomware-courses-of-action/.

[53]

Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. 2011. Website Fingerprinting in Onion Routing based Anonymization Networks. In Proceedings of the 10th annual ACM workshop on Privacy in the electronic society. 103--114.

Digital Library

[54]

Eva Papadogiannaki and Sotiris Ioannidis. 2021. A Survey on Encrypted Network Traffic Analysis Applications, Techniques, and Countermeasures. 54, 6 (2021). https://doi.org/10.1145/3457904

Digital Library

[55]

Michal Piskozub, Riccardo Spolaor, and Ivan Martinovic. 2019. Malalert: Detecting Malware in Large-scale Network Traffic using Statistical Features. ACM SIGMETRICS Performance Evaluation Review 46, 3 (2019), 151--154.

Digital Library

[56]

Ferry Astika Saputra, Isbat Uzzin Nadhori, and Balighani Fathul Barry. 2016. Detecting and Blocking Onion Router Traffic Using Deep Packet Inspection. In 2016 International Electronics Symposium (IES). IEEE, 283--288.

[57]

Debmalya Sarkar, P Vinod, and Suleiman Y Yerima. 2020. Detection of Tor traffic using Deep Learning. In 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA). IEEE, 1--8.

[58]

Roei Schuster, Vitaly Shmatikov, and Eran Tromer. 2017. Beauty and the Burst: Remote Identification of Encrypted Video Streams. In 26th USENIX Security Symposium (USENIX Security 17). USENIX Association, Vancouver, BC, 1357-- 1374. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/ presentation/schuster

[59]

Silvia Sebastián and Juan Caballero. 2020. AVclass2: Massive Malware Tag Extraction from AV Labels. In Annual Computer Security Applications Conference (Austin, USA) (ACSAC '20). Association for Computing Machinery, New York, NY, USA, 42--53. https://doi.org/10.1145/3427228.3427261

Digital Library

[60]

Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AVclass: A Tool for Massive Malware Labeling, Vol. 9854. 230--253. https: //doi.org/10.1007/978--3--319--45719--2_11

[61]

Payap Sirinam, Mohsen Imani, Marc Juarez, and MatthewWright. 2018. Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1928--1943.

Digital Library

[62]

Payap Sirinam, Nate Mathews, Mohammad Saidur Rahman, and MatthewWright. 2019. Triplet Fingerprinting: More Practical and Portable Website Fingerprinting with n-shot Learning. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1131--1148.

Digital Library

[63]

Qixiang Sun, Daniel R Simon, Yi-Min Wang, Wilf Russell, Venkata N Padmanabhan, and Lili Qiu. 2002. Statistical Identification of Encrypted Web Browsing Traffic. In Proceedings 2002 IEEE Symposium on Security and Privacy. IEEE, 19--30.

Digital Library

[64]

Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson, and Ian Goldberg. 2014. Effective Attacks and Provable Defenses for Website Fingerprinting. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 143--157.

[65]

Tao Wang and Ian Goldberg. 2013. Improved Website Fingerprinting on Tor. In Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society. 201--212.

Digital Library

[66]

Tao Wang and Ian Goldberg. 2013. Improved Website Fingerprinting on Tor. In Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society. 201--212.

Digital Library

[67]

Tao Wang and Ian Goldberg. 2016. On Realistically Attacking Tor with Website Fingerprinting. Proc. Priv. Enhancing Technol. 2016, 4 (2016), 21--36.

[68]

Charles V. Wright, Lucas Ballard, Fabian Monrose, and Gerald M. Masson. 2007. Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob?. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (Boston, MA) (SS'07). USENIX Association, USA, Article 4, 12 pages.

[69]

Yixiao Xu, Tao Wang, Qi Li, Qingyuan Gong, Yang Chen, and Yong Jiang. 2018. A Multi-tab Website Fingerprinting Attack. In Proceedings of the 34th Annual Computer Security Applications Conference. 327--341.

Digital Library

Cited By

Rendall KMylonas AVidalis SGritzalis D(2025)MIDASComputers and Security10.1016/j.cose.2024.104154148:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.cose.2024.104154
Mandela NSonia Mistry NNagpal A(2025)Efficient Dark Web traffic classification using a hybrid CNN-LSTM modelInternational Journal of Information Technology10.1007/s41870-025-02427-xOnline publication date: 21-Feb-2025
https://doi.org/10.1007/s41870-025-02427-x
Wang XZhuang ZMeng WCheng JChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Detecting and Understanding Self-Deleting JavaScript CodeProceedings of the ACM Web Conference 202410.1145/3589334.3645540(1768-1778)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645540
Show More Cited By

Index Terms

Exposing the Rat in the Tunnel: Using Traffic Analysis for Tor-based Malware Detection
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation
  2. Security services
    1. Privacy-preserving protocols

Recommendations

Detecting, validating and characterizing computer infections in the wild
IMC '11: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference

Although network intrusion detection systems (IDSs) have been studied for several years, their operators are still overwhelmed by a large number of false-positive alerts. In this work we study the following problem: from a large archive of intrusion ...
Correlation Analysis between Spamming Botnets and Malware Infected Hosts
SAINT '11: Proceedings of the 2011 IEEE/IPSJ International Symposium on Applications and the Internet

Many of recent cyber attacks are being launched by botnets for the purpose of carrying out large-scale cyber attacks such as spam emails, Distributed Denial of Service (DDoS), network scanning and so on. In many cases, these botnets consist of a lot of ...
Countering cyber threats for industrial applications

The widespread adoption of Internet of Things (IoT) in industrial systems has made malware propagation more voluminous and sophisticated. Detection and prevention against these malware threats rely on automated dynamic analysis techniques. Malware ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '22: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security

November 2022

3598 pages

ISBN:9781450394505

DOI:10.1145/3548606

General Chairs:
Heng Yin
University of California, Riverside
,
Angelos Stavrou
Virginia Tech
,
Program Chairs:
Cas Cremers
CISPA Helmholtz Center for Information Security
,
Elaine Shi
Carnegie Mellon University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '22

Sponsor:

SIGSAC

CCS '22: 2022 ACM SIGSAC Conference on Computer and Communications Security

November 7 - 11, 2022

CA, Los Angeles, USA

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
1,460
Total Downloads

Downloads (Last 12 months)407
Downloads (Last 6 weeks)31

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rendall KMylonas AVidalis SGritzalis D(2025)MIDASComputers and Security10.1016/j.cose.2024.104154148:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.cose.2024.104154
Mandela NSonia Mistry NNagpal A(2025)Efficient Dark Web traffic classification using a hybrid CNN-LSTM modelInternational Journal of Information Technology10.1007/s41870-025-02427-xOnline publication date: 21-Feb-2025
https://doi.org/10.1007/s41870-025-02427-x
Wang XZhuang ZMeng WCheng JChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Detecting and Understanding Self-Deleting JavaScript CodeProceedings of the ACM Web Conference 202410.1145/3589334.3645540(1768-1778)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645540
Deng YPeng TWang BWu G(2024)ANDE: Detect the Anonymity Web Traffic With Comprehensive ModelIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345391721:6(6924-6936)Online publication date: Dec-2024
https://doi.org/10.1109/TNSM.2024.3453917
Zhao RZhan MDeng XLi FWang YWang YGui GXue Z(2024)A Novel Self-Supervised Framework Based on Masked Autoencoder for Traffic ClassificationIEEE/ACM Transactions on Networking10.1109/TNET.2023.333525332:3(2012-2025)Online publication date: Jun-2024
https://doi.org/10.1109/TNET.2023.3335253
Xie RCao JZhu YZhang YHe YPeng HWang YXu MSun KDong ELi QZhang MLi J(2024)Cactus: Obfuscating Bidirectional Encrypted TCP Traffic at Client SideIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344253019(7659-7673)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3442530
Wang CRedino CRahman AClark RRadke DCody TNandakumar DBowen E(2024)Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement LearningSoutheastCon 202410.1109/SoutheastCon52093.2024.10500045(427-433)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500045
Saleem JIslam RIslam M(2024)Darknet Traffic Analysis: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.337376912(42423-42452)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3373769
Gao ZLi JWang LHe YYuan P(2024)CM-UTC: A Cost-sensitive Matrix based Method for Unknown Encrypted Traffic ClassificationThe Computer Journal10.1093/comjnl/bxae01767:7(2441-2452)Online publication date: 26-Feb-2024
https://doi.org/10.1093/comjnl/bxae017
Chen YYang JCui SDong CJiang BLiu YLu Z(2024)Unveiling encrypted traffic types through hierarchical network characteristicsComputers and Security10.1016/j.cose.2023.103645138:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.cose.2023.103645
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten