skip to main content
10.1145/2590296.2590346acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

Towards automated protocol reverse engineering using semantic information

Published: 04 June 2014 Publication History

Abstract

Network security products, such as NIDS or application firewalls, tend to focus on application level communication flows. However, adding support for new proprietary and often undocumented protocols, implies the reverse engineering of these protocols. Currently, this task is performed manually. Considering the difficulty and time needed for manual reverse engineering of protocols, one can easily understand the importance of automating this task. This is even given more significance in today's cybersecurity context where reaction time and automated adaptation become a priority. Several studies were carried out to infer protocol's specifications from traces. As shown in this article, they do not provide accurate results on complex protocols and are often not applicable in an operational context to provide parsers or traffic generators, some key indicators of the quality of obtained specifications. In addition, too few previous works have resulted in the publication of tools that would allow the scientific community to experimentally validate and compare the different approaches.
In this paper, we infer the specifications out of complex protocols by means of an automated approach and novel techniques. Based on communication traces, we reverse the vocabulary of a protocol by considering embedded contextual information. We also use this information to improve message clustering and to enhance the identification of fields boundaries. We then show the viability of our approach through a comparative study including our reimplementation of three other state-of-the-art approaches (ASAP, Discoverer and ScriptGen).

References

[1]
J. Antunes, N. Neves, and P. Verissimo. Reverse engineering of protocols from network traces. In Proceedings of WCRE, 2011.
[2]
M. A. Beddoe. Network protocol analysis using bioinformatics algorithms. In Toorcon, 2004.
[3]
J. Caballero, P. Poosankam, C. Kreibich, and D. Song. Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In Proceedings of CCS, 2009.
[4]
J. Caballero, H. Yin, Z. Liang, and D. Song. Polyglot: Automatic extraction of protocol format using dynamic binary analysis. In Proceedings of CCS, 2007.
[5]
P. M. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda. Prospex: Protocol specification extraction. In Proceedings of SSP, 2009.
[6]
W. Cui. Discoverer: Automatic protocol reverse engineering from network traces. In Proceedings of USENIX Security Symposium, 2007.
[7]
W. Cui, V. Paxson, N. C. Weaver, and Y. H. Katz. Protocol-independent adaptive replay of application dialog. In Proceedings of NDSS, 2006.
[8]
J. Freeman. Hacking a closed ecosystem. In O'Reilly Android Open Conference, 2011.
[9]
U. Gargi. Consumer media capture: Time-based analysis and event clustering. Technical report, HP Laboratories Palo Alto, aug 2003.
[10]
C. Guarnieri, M. Schloesser, J. Bremer, and A. Tanasi. Cuckoo sandbox - open source automated malware analysis. In Black Hat USA, 2013.
[11]
G. J. Holzmann. Design and validation of computer protocols. Prentice-Hall, Inc., 1991.
[12]
T. Krueger, H. Gascon, N. Kramer, and K. Rieck. Learning stateful models for network honeypots. In Proceedings of the 5th ACM workshop on Security and artificial intelligence, 2012.
[13]
T. Krueger, N. Kramer, and K. Rieck. Asap: automatic semantics-aware analysis of network payloads. In Proceedings of ECML/PKDD, 2011.
[14]
F. Leder and P. Martini. Ngbpa next generation botnet protocol analysis. In Emerging Challenges for Security, Privacy and Trust, volume 297 of IFIP Advances in Information and Communication Technology. Springer Berlin Heidelberg, 2009.
[15]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS. MIT Press, 2000.
[16]
C. Leita, K. Mermoud, and M. Dacier. Scriptgen: an automated script generation tool for honeyd. In Proceedings of ACSAC, 2005.
[17]
Z. Lin, X. Jiang, D. Xu, and X. Zhang. Automatic protocol format reverse engineering through context-aware monitored execution. In Procedings of NDSS, 2008.
[18]
K. McNamee. Malware analysis report - new c&c protocol for zeroacess/siref. Technical report, Kindsight Security Lab, 2012.
[19]
S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48, 1970.
[20]
R. Pang and V. Paxson. A high-level programming environment for packet trace anonymization and transformation. In Proceedings of SIGCOMM, 2003.
[21]
D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. Detecting novel associations in large data sets. Science, 334, 2011.
[22]
K. Rieck, C. Wressnegger, and A. Bikadorov. Sally: A tool for embedding strings in vector spaces. Journal of Machine Learning Research, 2012.
[23]
J. Shearer. Trojan.zeroaccess threat report. Technical report, Symantec, 2011.
[24]
R. R. Sokal and C. D. Michener. A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin, 1958.
[25]
A. Syropoulos. Mathematics of multisets. In Proceedings of the Workshop on Multiset Processing: Multiset Processing, Mathematical, Computer Science, and Molecular Computing Points of View, 2001.
[26]
Y. Wang, X. Yun, M. Z. Shafiq, L. Wang, A. X. Liu, Z. Zhang, D. Yao, Y. Zhang, and L. Guo. A semantics aware approach to automated reverse engineering unknown protocols. In Proceedings of ICNP, 2012.
[27]
Y. Wang, Z. Zhang, D. D. Yao, B. Qu, and L. Guo. Inferring protocol state machine from network traces: a probabilistic approach. In Proceedings of ACNS, 2011.
[28]
C. Willems, T. Holz, and F. Freiling. Toward automated dynamic malware analysis using cwsandbox. IEEE Security and Privacy, 5(2):32--39, mar 2007.
[29]
T. Yeh, T.-H. Chang, and R. C. Miller. Sikuli: Using gui screenshots for search and automation. In Proceedings of the 22Nd Annual ACM Symposium on User Interface Software and Technology, UIST '09, pages 183--192, New York, NY, USA, 2009. ACM.

Cited By

View all
  • (2025)Protocol syntax recovery via knowledge transferComputer Networks10.1016/j.comnet.2024.111022258(111022)Online publication date: Feb-2025
  • (2024)Protocol Reverse Analysis of Ethernet for Control Automation Technology Based on Sequence Alignment and Pearson Correlation CoefficientSensors10.3390/s2424792224:24(7922)Online publication date: 11-Dec-2024
  • (2024)POSTER: Packet Field Tree: a hybrid approach, open database and evaluation methodology for Automated Protocol Reverse-EngineeringProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673718(13-15)Online publication date: 4-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASIA CCS '14: Proceedings of the 9th ACM symposium on Information, computer and communications security
June 2014
556 pages
ISBN:9781450328005
DOI:10.1145/2590296
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contextual clustering
  2. protocol reverse engineering
  3. semantic sequence alignment

Qualifiers

  • Research-article

Conference

ASIA CCS '14
Sponsor:

Acceptance Rates

ASIA CCS '14 Paper Acceptance Rate 50 of 255 submissions, 20%;
Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)26
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Protocol syntax recovery via knowledge transferComputer Networks10.1016/j.comnet.2024.111022258(111022)Online publication date: Feb-2025
  • (2024)Protocol Reverse Analysis of Ethernet for Control Automation Technology Based on Sequence Alignment and Pearson Correlation CoefficientSensors10.3390/s2424792224:24(7922)Online publication date: 11-Dec-2024
  • (2024)POSTER: Packet Field Tree: a hybrid approach, open database and evaluation methodology for Automated Protocol Reverse-EngineeringProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673718(13-15)Online publication date: 4-Aug-2024
  • (2024)BinPRE: Enhancing Field Inference in Binary Analysis Based Protocol Reverse EngineeringProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690299(3689-3703)Online publication date: 2-Dec-2024
  • (2024)Crafting Binary Protocol Reversing via Deep Learning With Knowledge-Driven AugmentationIEEE/ACM Transactions on Networking10.1109/TNET.2024.346835032:6(5399-5414)Online publication date: Dec-2024
  • (2024)Toward Automated Field Semantics Inference for Binary Protocol Reverse EngineeringIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332666619(764-776)Online publication date: 2024
  • (2024)MDIplier: Protocol Format Recovery via Hierarchical Inference2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00058(547-557)Online publication date: 28-Oct-2024
  • (2024)Reverse Engineering Industrial Protocols Driven By Control FieldsIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621405(2408-2417)Online publication date: 20-May-2024
  • (2024)Industrial Control Protocol Type Inference Using Transformer and Rule-based Re-ClusteringIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621186(1011-1020)Online publication date: 20-May-2024
  • (2024)Classification Method of Industrial Control Protocols Based on Statistical Model Matching and Association Rule Analysis2024 9th International Conference on Electronic Technology and Information Science (ICETIS)10.1109/ICETIS61828.2024.10593980(264-267)Online publication date: 17-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media