skip to main content
10.1145/2976749.2978370acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Scalable Graph-based Bug Search for Firmware Images

Published: 24 October 2016 Publication History

Abstract

Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.

References

[1]
Cybersecurity and the Internet of Things. http://www.ey.com/Publication/vwLUAssets/EY-cybersecurity-and-the-internet-of-things.pdf.
[2]
DDWRT ftp. http://download1.dd-wrt.com/dd-wrtv2/downloads/others/eko/BrainSlayer-V24-preSP2/.
[3]
Industrial Utilities and Devices Where the Cyber Threat Lurks. http://www.cyactive.com/industrial-utilities-devices-cyber-threat-lurks/.
[4]
Iot when cyberattacks have physical effects. http://www.federaltimes.com/story/government/solutions-ideas/2016/04/08/internet-things-when-cyberattacks-have physical-effects/82787430/.
[5]
mongodb. https://www.mongodb.com.
[6]
Nearpy. https://pypi.python.org/pypi/NearPy.
[7]
DD-WRT Firmware Image r21676. ftp://ftp.dd-wrt.com/others/eko/BrainSlayer-V24-preSP2/2013/05--27--2013-r21676/senao-eoc5610/linux.bin, 2013.
[8]
ReadyNAS Firmware Image v6.1.6. http://www.downloads.netgear.com/files/GDC/READYNAS-100/ReadyNASOS-6.1.6-arm.zip, 2013.
[9]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM Commun., 51, 2008.
[10]
R. Arandjelovic and A. Zisserman. All about vlad. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1578--1585, 2013.
[11]
T. Avgerinos, S. K. Cha, A. Rebert, E. J. Schwartz, M. Woo, and D. Brumley. Automatic exploit generation. Communications of the ACM, 57(2):74--84, 2014.
[12]
M.-F. Balcan, A. Blum, and A. Gupta. Approximate clustering without the approximation. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1068--1077, 2009.
[13]
M. Bourquin, A. King, and E. Robbins. Binslayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, 2013.
[14]
H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern recognition letters, 19(3):255--259, 1998.
[15]
S. K. Cha, M. Woo, and D. Brumley. Program-adaptive mutational fuzzing. In Oakland, 2015.
[16]
K. Chatfield, V. S. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC, volume 2, page 8, 2011.
[17]
D. D. Chen, M. Egele, M. Woo, and D. Brumley. Towards automated dynamic analysis for linux-based embedded firmware. In NDSS, 2016.
[18]
K. Chen, P. Wang, Y. Lee, X. Wang, N. Zhang, H. Huang, W. Zou, and P. Liu. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In USENIX Security, 2015.
[19]
A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti. A large-scale analysis of the security of embedded firmwares. In USENIX Security, 2014.
[20]
Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014.
[21]
T. Dullien and R. Rolles. Graph-based comparison of executable objects (english version). SSTIC, 5:1--3, 2005.
[22]
M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In USENIX Security, 2014.
[23]
S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: Efficient cross-architecture identification of bugs in binary code. In NDSS, 2016.
[24]
Q. Feng, A. Prakash, M. Wang, C. Carmony, and H. Yin. Origen: Automatic extraction of offset-revealing instructions for cross-version memory analysis. In ASIACCS, 2016.
[25]
H. Flake. Structural comparison of executable objects. In DIMVA, volume 46, 2004.
[26]
D. Gao, M. K. Reiter, and D. Song. Binhunt: Automatically finding semantic differences in binary programs. In Information and Communications Security. 2008.
[27]
J. Holcombe. Soho network equipment (technical report). https://securityevaluators.com/knowledge/case_studies/routers/soho_techreport.pdf.
[28]
The IDA Pro Disassembler and Debugger. http://www.datarescue.com/idabase/.
[29]
J. Jang, A. Agrawal, and D. Brumley. Redebug: finding unpatched code clones in entire os distributions. In Oakland, 2012.
[30]
L. Jiang, T. Mitamura, S.-I. Yu, and A. G. Hauptmann. Zero-example event search using multimodal pseudo relevance feedback. In ICMR, 2014.
[31]
L. Jiang, W. Tong, and A. G. Meng, Deyu andHauptmann. Towards efficient learning of optimal spatial bag-of-words representations. In ICMR, 2014.
[32]
L. Jiang, S.-I. Yu, D. Meng, T. Mitamura, and A. G. Hauptmann. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In ICMR, 2015.
[33]
T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28(7):654--670, 2002.
[34]
W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, 2013.
[35]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, volume 4, pages 289--302, 2004.
[36]
W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, 2011.
[37]
McCabe. More Complex = Less Secure. Miss a Test Path and You Could Get Hacked. http://www.mccabe.com/sqe/books.htm, 2012.
[38]
A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In the workshop on learning for text categorization, 1998.
[39]
J. Ming, M. Pan, and D. Gao. ibinhunt: binary hunting with inter-procedural control flow. In Information Security and Cryptology, pages 92--109. Springer, 2012.
[40]
F. Murtagh. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354--359, 1983.
[41]
G. Myles and C. Collberg. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, 2005.
[42]
M. Newman. Networks: an introduction. 2010.
[43]
A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849--856, 2002.
[44]
H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In CCS, 2015.
[45]
J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In Oakland, 2015.
[46]
J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In ACSAC, 2014.
[47]
G. Qian, S. Sural, Y. Gu, and S. Pramanik. Similarity between euclidean and cosine angle distance for nearest neighbor queries. In Proceedings of the symposium on Applied computing, pages 1232--1237, 2004.
[48]
A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G. Grieco, and D. Brumley. Optimizing seed selection for fuzzing. In USENIX Security, 2014.
[49]
K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and vision computing, 27(7):950--959, 2009.
[50]
M. Shahrokh Esfahani. Effect of separate sampling on classification accuracy. Bioinformatics, 30:242--250, 2014.
[51]
E. C. R. Shin, D. Song, and R. Moazzezi. Recognizing functions in binaries with neural networks. In USENIX Security, 2015.
[52]
Y. Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In NDSS, 2015.
[53]
J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, 2003.
[54]
M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors. Signal Processing Magazine, IEEE, 25(2):128--131, 2008.
[55]
N. Stephens, J. Grosen, C. Salls, A. Dutcher, and R. Wang. Driller: Augmenting fuzzing through selective symbolic execution. In NDSS, 2016.
[56]
M. Wall. Galib: A c+ library of genetic algorithm components. Mechanical Engineering Department, Massachusetts Institute of Technology, 87:54, 1996.
[57]
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, volume 98, pages 194--205, 1998.
[58]
F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Oakland, 2015.
[59]
J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In International workshop on Workshop on multimedia information retrieval, 2007.
[60]
S.-I. Yu, L. Jiang, Z. Xu, Y. Yang, and A. G. Hauptmann. Content-based video search over 1 million videos with 1 core in 1 second. In ICMR, 2015.
[61]
J. Zaddach, L. Bruno, A. Francillon, and D. Balzarotti. Avatar: A framework to support dynamic security analysis of embedded systems' firmwares. In NDSS, 2014.
[62]
M. Zhang, Y. Duan, Q. Feng, and H. Yin. Towards automatic generation of security-centric descriptions for android apps. In CCS, 2015.
[63]
M. Zhang, Y. Duan, H. Yin, and Z. Zhao. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In CCS, 2014.

Cited By

View all
  • (2025)MSSA: multi-stage semantic-aware neural network for binary code similarity detectionPeerJ Computer Science10.7717/peerj-cs.250411(e2504)Online publication date: 17-Jan-2025
  • (2025)A Comparative Study on the Accuracy and the Speed of Static and Dynamic Program ClassifiersProceedings of the 34th ACM SIGPLAN International Conference on Compiler Construction10.1145/3708493.3712680(13-24)Online publication date: 25-Feb-2025
  • (2025)SFO-CID: Structural Feature Optimization Based Command Injection Vulnerability Discovery for Internet of ThingsIEEE Transactions on Industrial Informatics10.1109/TII.2024.347756321:2(1429-1438)Online publication date: Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
October 2016
1924 pages
ISBN:9781450341394
DOI:10.1145/2976749
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. firmware security
  2. graph encoding
  3. machine learning

Qualifiers

  • Research-article

Funding Sources

  • Air Force Research Lab Grant
  • DARPA CGC Grant
  • National Science Foundation Grant

Conference

CCS'16
Sponsor:

Acceptance Rates

CCS '16 Paper Acceptance Rate 137 of 831 submissions, 16%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)224
  • Downloads (Last 6 weeks)15
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)MSSA: multi-stage semantic-aware neural network for binary code similarity detectionPeerJ Computer Science10.7717/peerj-cs.250411(e2504)Online publication date: 17-Jan-2025
  • (2025)A Comparative Study on the Accuracy and the Speed of Static and Dynamic Program ClassifiersProceedings of the 34th ACM SIGPLAN International Conference on Compiler Construction10.1145/3708493.3712680(13-24)Online publication date: 25-Feb-2025
  • (2025)SFO-CID: Structural Feature Optimization Based Command Injection Vulnerability Discovery for Internet of ThingsIEEE Transactions on Industrial Informatics10.1109/TII.2024.347756321:2(1429-1438)Online publication date: Feb-2025
  • (2025)LuaTaint: A Static Analysis System for Web Configuration Interface Vulnerability of Internet of Things DevicesIEEE Internet of Things Journal10.1109/JIOT.2024.349066112:5(5970-5984)Online publication date: 1-Mar-2025
  • (2024)TYGRProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699140(4283-4300)Online publication date: 14-Aug-2024
  • (2024)Improving ML-based binary function similarity detection by assessing and deprioritizing control flow graph featuresProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699139(4265-4282)Online publication date: 14-Aug-2024
  • (2024)Syntactic–Semantic Detection of Clone-Caused Vulnerabilities in the IoT DevicesSensors10.3390/s2422725124:22(7251)Online publication date: 13-Nov-2024
  • (2024)A Review of IoT Firmware Vulnerabilities and Auditing TechniquesSensors10.3390/s2402070824:2(708)Online publication date: 22-Jan-2024
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNNChinese Journal of Electronics10.23919/cje.2022.00.22833:1(128-138)Online publication date: Jan-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media