Abstract
Accurate Control Flow Graphs are crucial for effective binary program analysis, while solving indirect function call targets is its major challenge. Existing static analysis methods heavily rely on domain-specific patterns, resulting in an abundance of false positive edges due to limited expert knowledge. Concurrently, learning-based approaches often depend on heuristic analysis during the code representation stage, which prevents the model from fully comprehending program semantics.
To address these limitations, this paper presents AttnCall, a novel neural network learning framework that leverages the attention mechanism to automatically learn the matching relationship between function callsites and callees’ context semantics. AttnCall refines the identification of indirect call targets through the learned matching patterns, eliminating the drawbacks of existing techniques. Additionally, we propose an end-to-end code representation scheme that effectively embeds the semantics of callsites and callees without relying on heuristic rules.
The evaluation of AttnCall focuses on the task of predicting indirect function call targets. The results demonstrate that AttnCall surpasses state-of-the-art approaches, achieving 31.4% higher precision and 5% higher recall. Moreover, AttnCall enhances model interpretability, allowing for a better understanding of the underlying analysis process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., Budiu, M., Erlingsson, Ú., Ligatti, J.: Control-flow integrity. In: Proceedings of the 12th ACM Conference on Computer and Communications Security, CCS ’05, pp. 340–353. Association for Computing Machinery, New York (2005). https://doi.org/10.1145/1102120.1102165
Abadi, M., Budiu, M., Erlingsson, Ú., Ligatti, J.: Control-flow integrity principles, implementations, and applications. ACM Trans. Inf. Syst. Secur. 13(1), 1–40 (2009). https://doi.org/10.1145/1609956.1609960
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2015). https://doi.org/10.48550/arXiv.1409.0473
Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 executables. In: Duesterwald, E. (ed.) CC 2004. LNCS, vol. 2985, pp. 5–23. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24723-4_2
Burow, N., Zhang, X., Payer, M.: SoK: shining light on shadow stacks. In: 2019 IEEE Symposium on Security and Privacy (SP), Oakland, pp. 985–999 (2019). https://doi.org/10.1109/SP.2019.00076
Chua, Z.L., Shen, S., Saxena, P., Liang, Z.: Neural nets can learn function type signatures from binaries. In: Proceedings of the 26th USENIX Security Symposium, Security, pp. 99–116 (2017). https://doi.org/10.5555/3241189.3241199
Debray, S., Muth, R., Weippert, M.: Alias analysis of executable code. In: Conference Record of the Annual ACM Symposium on Principles of Programming Languages, POPL, pp. 12–24. ACM (1998). https://doi.org/10.1145/268946.268948
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019). https://doi.org/10.48550/arXiv.1810.04805
Ding, S.H.H., Fung, B.C.M., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), SP, pp. 472–489 (2019). https://doi.org/10.1109/SP.2019.00003
Emami, M., Ghiya, R., Hendren, L.J.: Context-sensitive interprocedural points-to analysis in the presence of function pointers. ACM SIGPLAN Not. 29(6), 242–256 (1994). https://doi.org/10.1145/773473.178264
Farkhani, R.M., Robertson, W., Jafari, S., Kirda, E., Arshad, S., Okhravi, H.: On the effectiveness of type-based control flow integrity. ACM Int. Conf. Proc. Ser. 12, 28–39 (2018). https://doi.org/10.1145/3274694.3274739
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages (2020). https://doi.org/10.48550/arXiv.2002.08155
Google: create production-grade machine learning models with TensorFlow. https://www.tensorflow.org//
He, W., Das, S., Zhang, W., Liu, Y.: BBB-CFI: lightweight CFI approach against code-reuse attacks using basic block information. ACM Trans. Embed. Comput. Syst. 19(1), 1–22 (2020). https://doi.org/10.1145/3371151
Hex-Rays: The IDA pro disassembler and debugger (2008). https://www.hex-rays.com/products/ida/
Hu, H., et al.: Enforcing unique code target property for control-flow integrity. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS, pp. 1470–1486. ACM, New York (2018). https://doi.org/10.1145/3243734.3243797
Jiang, S., Fu, C., Qian, Y., He, S., Lv, J., Han, L.: IFAttn: binary code similarity analysis based on interpretable features with attention. Comput. Secur. 120, 102804 (2022). https://doi.org/10.1016/j.cose.2022.102804
Khandaker, M.R., Liu, W., Naser, A., Wang, Z., Yang, J.: Origin-sensitive control flow integrity. In: Proceedings of the 28th USENIX Security Symposium, Security, pp. 195–211 (2019). https://doi.org/10.5555/3361338.3361353
Kim, S.H., Sun, C., Zeng, D., Tan, G.: Refining indirect call targets at the binary level. In: Proceedings 2021 Network and Distributed System Security Symposium. No. February in NDSS, Internet Society, Reston, VA (2021). https://doi.org/10.14722/ndss.2021.24386
Lee, Y.J., Choi, S.H., Kim, C., Lim, S.H., Park, K.W.: Learning binary code with deep learning to detect software weakness. In: In KSII The 9th International Conference on Internet (ICONI) 2017 Symposium, ICONI, p. 5 (2017)
Li, X., Qu, Y., Yin, H.: PalmTree: learning an assembly language model for instruction embedding. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, CCS, pp. 3236–3251. ACM, Virtual Event Republic of Korea (2021). https://doi.org/10.1145/3460120.3484587
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). https://doi.org/10.48550/arXiv.1907.11692
Lu, K., Hu, H.: Where does it go? Refining indirect-call targets with multi-layer type analysis. In: Proceedings of the ACM Conference on Computer and Communications Security, CCS, pp. 1867–1881. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3319535.3354244
Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40(6), 190–200 (2005). https://doi.org/10.1145/1064978.1065034
Muntean, P., Fischer, M., Tan, G., Lin, Z., Grossklags, J., Eckert, C.: \(\tau \)CFI: type-assisted control flow integrity for x86-64 binaries. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 423–444. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_20
Payer, M., Barresi, A., Gross, T.R.: Fine-grained control-flow integrity through binary hardening. In: Almgren, M., Gulisano, V., Maggi, F. (eds.) DIMVA 2015. LNCS, vol. 9148, pp. 144–164. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20550-2_8
Pei, K., et al.: StateFormer: fine-grained type recovery from binaries using generative state modeling. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, pp. 690–702 (2021). https://doi.org/10.1145/3468264.3468607
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training, p. 12. arXiv (2018)
Ramalingam, G.: The undecidability of aliasing. ACM Trans. Program. Lang. Syst. (TOPLAS) 16(5), 1467–1471 (1994). https://doi.org/10.1145/186025.186041
Shacham, H.: The geometry of innocent flesh on the bone. In: Proceedings of the 14th ACM Conference on Computer and Communications Security - CCS ’07, CCS, p. 552. ACM Press, New York (2007). https://doi.org/10.1145/1315245.1315313
Shoshitaishvili, Y., et al.: SOK: (state of) the art of war: Offensive techniques in binary analysis. In: Proceedings - 2016 IEEE Symposium on Security and Privacy, SP 2016, SP, pp. 138–157 (2016). https://doi.org/10.1109/SP.2016.17
Tice, C., et al.: Enforcing forward-edge control-flow integrity in \(\{\)GCC\(\}\)\(\{\) &\(\}\)\(\{\)LLVM\(\}\). \(\{\)USENIX\(\}\) Security, pp. 941–955 (2014). https://doi.org/10.5555/2671225.2671285
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Van der Veen, V., et al.: A tough call: mitigating advanced code-reuse attacks at the binary level. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 934–953. IEEE, Oakland (2016). https://doi.org/10.1109/SP.2016.60
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, NIPS, vol. 30. Curran Associates, Inc. (2017)
Wang, H., et al.: jTrans: jump-aware transformer for binary code similarity detection. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA, pp. 1–13. ACM, Virtual South Korea (2022). https://doi.org/10.1145/3533767.3534367
Wang, M., Yin, H., Vasisht Bhaskar, A., Su, P., Feng, D.: Binary code continent: finer-grained control flow integrity for stripped binaries. In: Proceedings of the 31st Annual Computer Security Applications Conference on - ACSAC 2015, ACSAC. ACM Press, New York (2015). https://doi.org/10.1145/2818000.2818017
Yang, G., Chen, X., Zhou, Y., Yu, C.: DualSC: automatic generation and summarization of shellcode via transformer and dual learning. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 361–372 (2022). https://doi.org/10.1109/SANER53432.2022.00052
Yu, L., Hermann, K.M., Blunsom, P., Pulman, S.: Deep learning for answer sentence selection. arXiv preprint: arXiv:1412.1632 (2014). https://doi.org/10.48550/arXiv.1412.1632
Zhang, C., et al.: Practical control flow integrity and randomization for binary executables. In: Proceedings - IEEE Symposium on Security and Privacy, Oakland, pp. 559–573 (2013). https://doi.org/10.1109/SP.2013.44
Zhu, W., et al.: CALLEE: recovering call graphs for binaries with transfer and contrastive learning. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 2357–2374. IEEE (2023). https://doi.org/10.1109/SP46215.2023.10179482
Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: neural machine translation inspired binary code similarity comparison beyond function Pairs. In: Proceedings 2019 Network and Distributed System Security Symposium, NDSS (2019). https://doi.org/10.14722/ndss.2019.23492
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, R., Guo, Y., Wang, Z., Zeng, Q. (2024). AttnCall: Refining Indirect Call Targets in Binaries with Attention. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14347. Springer, Cham. https://doi.org/10.1007/978-3-031-51482-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-51482-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51481-4
Online ISBN: 978-3-031-51482-1
eBook Packages: Computer ScienceComputer Science (R0)