Skip to main content

BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape

  • Conference paper
  • First Online:
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10327))

Abstract

Identifying library functions in program binaries is important to many security applications, such as threat analysis, digital forensics, software infringement, and malware detection. Today’s program binaries normally contain a significant amount of third-party library functions taken from standard libraries or free open-source software packages. The ability to automatically identify such library functions not only enhances the quality and the efficiency of threat analysis and reverse engineering tasks, but also improves their accuracy by avoiding false correlations between irrelevant code bases. Existing methods are found to either lack efficiency or are not robust enough to identify different versions of the same library function caused by the use of different compilers, different compilation settings, or obfuscation techniques. To address these limitations, we present a scalable and robust system called BinShape to identify standard library functions in binaries. The key idea of BinShape is twofold. First, we derive a robust signature for each library function based on heterogeneous features covering CFGs, instruction-level characteristics, statistical characteristics, and function-call graphs. Second, we design a novel data structure to store such signatures and facilitate efficient matching against a target function. We evaluate BinShape on a diverse set of C/C++ binaries, compiled with GCC and Visual Studio compilers on x86-x64 CPU architectures, at optimization levels \(O0-O3\). Our experiments show that BinShape is able to identify library functions in real binaries both efficiently and accurately, with an average accuracy of \(89\%\) and taking about 0.14 s to identify one function out of three million candidates. We also show that BinShape is robust enough when the code is subjected to different compilers, slight modification, or some obfuscation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/Visgean/Zeus.

References

  1. C Language Library. http://www.cplusplus.com/reference/clibrary/

  2. Exeinfo PE. http://exeinfo.atwebpages.com

  3. HexRays: IDA F.L.I.R.T. Technology. https://www.hex-rays.com/products/ida/tech/flirt/in_depth.shtml

  4. HexRays: IDA Pro. https://www.hex-rays.com/products/ida/index.shtml

  5. MongoDB. https://www.mongodb.com/

  6. NIST/SEMATECH e-Handbook of Statistical Methods. http://www.itl.nist.gov/div898/handbook/

  7. WEKA. https://weka.wikispaces.com/

  8. Alrabaee, S., Saleem, N., Preda, S., Wang, L., Debbabi, M.: OBA2: an Onion approach to binary code authorship attribution. Digital Invest. 11, S94–S103 (2014)

    Article  Google Scholar 

  9. Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Invest. 12, S61–S71 (2015)

    Article  Google Scholar 

  10. Alrabaee, S., Wang, L., Debbabi, M.: BinGold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Invest. 18, S11–S22 (2016)

    Article  Google Scholar 

  11. Bourquin, M., King, A., Robbins, E.: BinSlayer: accurate comparison of binary executables. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, p. 4. ACM (2013)

    Google Scholar 

  12. David, Y., Partush, N., Yahav, E.: Statistical similarity of binaries. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 266–280. ACM (2016)

    Google Scholar 

  13. David, Y., Yahav, E.: Tracelet-based code search in executables. In: ACM SIGPLAN Notices, vol. 49, pp. 349–360. ACM (2014)

    Google Scholar 

  14. Dullien, T., Rolles, R.: Graph-based comparison of executable objects (English version). SSTIC 5, 1–3 (2005)

    Google Scholar 

  15. Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press, San Francisco (2011)

    Google Scholar 

  16. Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44(2), 6 (2012)

    Google Scholar 

  17. Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. In: Usenix Security, pp. 303–317 (2014)

    Google Scholar 

  18. Elmore, K.L., Richman, M.B.: Euclidean distance as a similarity metric for principal component analysis. Mon. Weather Rev. 129(3), 540–549 (2001)

    Article  Google Scholar 

  19. Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In Proceedings of the 23th Symposium on Network and Distributed System Security (NDSS) (2016)

    Google Scholar 

  20. Farhadi, M.R., Fung, B.C., Fung, Y.B., Charland, P., Preda, S., Debbabi, M.: Scalable code clone search for malware analysis. Digital Invest. 15, 46–60 (2015)

    Article  Google Scholar 

  21. Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 480–491. ACM (2016)

    Google Scholar 

  22. Frank, E., Wang, Y., Inglis, S., Holmes, G., Witten, I.H.: Using model trees for classification. Mach. Learn. 32(1), 63–76 (1998)

    Article  MATH  Google Scholar 

  23. Gascon, H., Yamaguchi, F., Arp, D., Rieck, K.: Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AISec), pp. 45–54. ACM (2013)

    Google Scholar 

  24. Griffin, C., Theory, G.: Penn State Math 485 Lecture Notes (2012). http://www.personal.psu.edu/cxg286/Math485.pdf

  25. Hido, S., Kashima, H.: A linear-time graph kernel. In: Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 179–188. IEEE (2009)

    Google Scholar 

  26. Hu, X., Chiueh, T.-C., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS), pp. 611–620. ACM (2009)

    Google Scholar 

  27. Huang, H., Youssef, A.M., Debbabi, M.: BinSequence: fast, accurate and scalable binary code reuse detection. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS), pp. 155–166. ACM (2017)

    Google Scholar 

  28. Jacobson, E.R., Rosenblum, N., Miller, B.P.: Labeling library functions in stripped binaries. In: Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE), pp. 1–8. ACM (2011)

    Google Scholar 

  29. Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-LLVM: software protection for the masses. In: Proceedings of the 1st International Workshop on Software PROtection (SPRO), pp. 3–9. IEEE Press (2015)

    Google Scholar 

  30. Khoo, W.M.: Decompilation as search. Technical report, University of Cambridge, Computer Laboratory (2013)

    Google Scholar 

  31. Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp. 329–338. IEEE Press (2013)

    Google Scholar 

  32. Kolbitsch, C., Holz, T., Kruegel, C., Kirda, E.: Inspector gadget: automated extraction of proprietary gadgets from malware binaries. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 29–44. IEEE (2010)

    Google Scholar 

  33. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006). doi:10.1007/11663812_11

    Chapter  Google Scholar 

  34. Kührer, M., Rossow, C., Holz, T.: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). doi:10.1007/978-3-319-11379-1_1

    Google Scholar 

  35. Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)

    Article  Google Scholar 

  36. Livi, L., Rizzi, A.: The graph matching problem. Pattern Anal. Appl. 16(3), 253–283 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  37. Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 431–441. IEEE (2007)

    Google Scholar 

  38. Nouh, L., Rahimian, A., Mouheb, D., Debbabi, M., Hanna, A.: BinSign: fingerprinting binary functions to support automated analysis of code executables. In: IFIP International Information Security and Privacy Conference (IFIP SEC). Springer (2017)

    Google Scholar 

  39. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  40. Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy (SP), pp. 709–724. IEEE (2015)

    Google Scholar 

  41. Qiu, J., Su, X., Ma, P.: Using reduced execution flow graph to identify library functions in binary code. IEEE Trans. Softw. Eng. (TSE) 42(2), 187–202 (2016)

    Article  Google Scholar 

  42. Rad, B.B., Masrom, M., Ibrahim, S.: Opcodes histogram for classifying metamorphic portable executables malware. In: 2012 International Conference on e-Learning and e-Technologies in Education (ICEEE), pp. 209–213. IEEE (2012)

    Google Scholar 

  43. Ramaswami, M., Bhaskaran, R.: A study on feature selection techniques in educational data mining. arXiv preprint arXiv:0912.3924 (2009)

  44. Roobaert, D., Karakoulas, G., Chawla, N.V.: Information gain, correlation and support vector machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 463–470. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  45. Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? identifying the authors of program binaries. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 172–189. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23822-2_10

    Chapter  Google Scholar 

  46. Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2013)

    Article  Google Scholar 

  47. van der Veen, V., Göktas, E., Contag, M., Pawoloski, A., Chen, X., Rawat, S., Bos, H., Holz, T., Athanasopoulos, E., Giuffrida, C.: A tough call: mitigating advanced code-reuse attacks at the binary level. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 934–953. IEEE (2016)

    Google Scholar 

  48. Yokoyama, A., et al.: SandPrint: fingerprinting malware sandboxes to provide intelligence for sandbox evasion. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 165–187. Springer, Cham (2016). doi:10.1007/978-3-319-45719-2_8

    Chapter  Google Scholar 

  49. Zeng, J., Fu, Y., Miller, K.A., Lin, Z., Zhang, X., Xu, D.: Obfuscation resilient binary code reuse through trace-oriented programming. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (CCS), pp. 487–498. ACM (2013)

    Google Scholar 

  50. Ziegel, E.R.: Probability and Statistics for Engineering and the Sciences. Technometrics (2012)

    Google Scholar 

Download references

Acknowledgment

We would like to thank our shepherd, Dr. Cristiano Giuffrida, and the anonymous reviewers for providing us very precious comments. This research is the result of a fruitful collaboration between the Security Research Center (SRC) of Concordia University, Defence Research and Development Canada (DRDC) and Google under a National Defence/NSERC Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paria Shirani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Shirani, P., Wang, L., Debbabi, M. (2017). BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape. In: Polychronakis, M., Meier, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2017. Lecture Notes in Computer Science(), vol 10327. Springer, Cham. https://doi.org/10.1007/978-3-319-60876-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60876-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60875-4

  • Online ISBN: 978-3-319-60876-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics