PromeTrans: Bootstrap binary functionality classification with knowledge transferred from pre-trained models

Sha, Zihan; Zhang, Chao; Wang, Hao; Gao, Zeyu; Zhang, Bolun; Lan, Yang; Shu, Hui

doi:10.1007/s10664-024-10593-y

PromeTrans: Bootstrap binary functionality classification with knowledge transferred from pre-trained models

Published: 27 November 2024

Volume 30, article number 32, (2025)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Zihan Sha ORCID: orcid.org/0000-0002-1020-9006¹,
Chao Zhang²,
Hao Wang²,
Zeyu Gao²,
Bolun Zhang³,
Yang Lan² &
…
Hui Shu²

139 Accesses
Explore all metrics

Abstract

Pre-trained models have witnessed significant progress in nature language (including source code) and binary code comprehension. However, none of them are suitable for binary functionality classification (BFC). In this paper, we present the first pre-trained model-based solution to BFC, namely PromeTrans, by fusing the knowledge of pre-trained models. Specifically, it overcomes the token size limitation of pre-trained models with a novel function outlining scheme and utilizes existing pre-trained assembly language models (AsmLMs) to generate embeddings for binary functions. Then, it utilizes a Graph Attention Network (GAT) to aggregate function embeddings following the call graph into a functionality embedding for each function. Lastly, it leverages existing pre-trained large natural language models (LLMs, e.g., GPT-3.5) to classify the functionality of source code functions and align the labels to binary functions. Based on the functionality embedding provided by AsmLMs and GAT and the functionality label knowledge provided by LLMs, a simple multi-layer perceptron (MLP) model is trained to classify the functionality of binary functions. Our prototype PromeTrans yields state-of-the-art (SOTA) performance on various datasets and achieves low overhead. PromeTrans also exhibits exceptional results in real-world applications (e.g., malware analysis). Additionally, by analyzing PromeTrans’s training history, we confirm the quality of knowledge transferred from LLMs is high. It shows that transferring knowledge from pre-trained models has a strong potential to bootstrap binary program comprehension tasks beyond BFC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Code Action Network for Binary Function Scope Identification

Decompilation Based Deep Binary-Source Function Matching

AttnCall: Refining Indirect Call Targets in Binaries with Attention

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

Our code and data are available at https://github.com/Sandspeare/prometrans.

References

Alrabaee S, Debbabi M, Shirani P, Wang L, Youssef A, Rahimian A, Nouh L, Mouheb D, Huang H, Hanna A (2020) Library Function Identification, pp 79–99. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_4
Arpit D, Jastrzundefinedbski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y, Lacoste-Julien S (2017) A closer look at memorization in deep networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17, pp. 233–242
Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, Lee YT, Li Y, Lundberg S, Nori H, Palangi H, Ribeiro MT, Zhang Y (2023) Sparks of Artificial General Intelligence: Early experiments with GPT-4. https://www.microsoft.com/en-us/research/publication/sparks-of-artificial-general-intelligence-early-experiments-with-gpt-4/
Carbone M, Cui W, Lu L, Lee W, Peinado M, Jiang X (2009) Mapping kernel objects to enable systematic integrity checking. In: Proceedings of the 16th ACM conference on Computer and Communications Security. CCS ’09, pp. 555–565. Association for Computing Machinery, New York, NY, USA.https://doi.org/10.1145/1653662.1653729
Chen L, He Z, Wu H, Xu F, Qian Y, Mao B (2022) Dicomp: lightweight data-driven inference of binary compiler provenance with high accuracy. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 112–122. https://doi.org/10.1109/SANER53432.2022.00025
Chen P, Liao BB, Chen G, Zhang S (2019) Understanding and utilizing deep neural networks trained with noisy labels. In: International conference on machine learning, pp 1062–1070
Comparetti PM, Salvaneschi G, Kirda E, Kolbitsch C, Kruegel C, Zanero S (2010) Identifying dormant functionality in malware programs. In: 2010 IEEE Symposium on security and privacy, pp 61–76. https://doi.org/10.1109/SP.2010.12
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J, Doran, C, Solorio, T (eds.) Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423 . https://aclanthology.org/N19-1423
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J, Doran, C, Solorio, T (eds.) Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423 . https://aclanthology.org/N19-1423
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423
Ding SHH, Fung BCM, Charland P (2019) Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. https://doi.org/10.1109/SP.2019.00003
Ding SHH, Fung BCM, Charland P (2019) Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp 472–489. https://doi.org/10.1109/SP.2019.00003
Downing E, Wang R, Liang Z, Li J, Wang X (2021) Deepreflect: discovering malicious functionality through binary reconstruction. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 1059–1076
Duan R, Bijlani A, Xu M, Kim T, Lee W (2017) Identifying open-source license violation and 1-day security risk at large scale. In: Proceedings of the 2017 ACM SIGSAC conference on Computer and Communications Security. CCS ’17, pp. 2169–2185. Association for Computing Machinery, New York, NY, USA .https://doi.org/10.1145/3133956.3134048
Duan Y, Li X, Wang J, Yin H (2020) Deepbindiff: Learning program-wide code representations for binary diffing. Proceedings 2020 Network and Distributed System Security Symposium
Du Z, Qian Y, Liu X, Ding M, Qiu J, Yang Z, Tang J (2022) Glm: general language model pretraining with autoregressive blank infilling. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 320–335
Farkhani RM, Jafari S, Arshad S, Robertson W, Kirda E, Okhravi H (2018) On the effectiveness of type-based control flow integrity. In: Proceedings of the 34th Annual Computer Security Applications Conference. ACSAC ’18, pp 28–39. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3274694.3274739
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) CodeBERT: A Pre-Trained Model for Programming and Natural Languages. https://arxiv.org/abs/2002.08155
francisck: DanderSpritz docs. GitHub (2017)
Fratantonio Y, Bianchi A, Robertson W, Kirda E, Kruegel C, Vigna G (2016) Triggerscope: towards detecting logic bombs in Android applications. In: 2016 IEEE Symposium on Security and Privacy (SP), pp 377–396. https://doi.org/10.1109/SP.2016.30
Gao J, Yang X, Fu Y, Jiang Y, Sun J (2018) Vulseeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899. https://doi.org/10.1145/3238147.3240480
Gröbert F, Willems C, Holz T (2011) Automated identification of cryptographic primitives in binary programs. In: Sommer R, Balzarotti D, Maier G (eds) Recent Advances in Intrusion Detection. Springer, Berlin, Heidelberg, pp 41–60
Chapter MATH Google Scholar
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd international conference on Neural Information Processing Systems. NIPS’18, pp. 8536–8546. Curran Associates Inc., Red Hook, NY, USA
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778
Hoffmann J, Borgeaud S, Mensch A, Buchatskaya E, Cai T, Rutherford E, Las Casas D, Hendricks LA, Welbl J, Clark A, Hennigan T, Noland E, Millican K, Driessche G, Damoc B, Guy A, Osindero S, Simonyan K, Elsen E, Vinyals O, Rae J, Sifre L (2022) An empirical analysis of compute-optimal large language model training. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds.) Advances in Neural Information Processing Systems
Hosfelt DD (2015) Automated detection and classification of cryptographic algorithms in binary programs through machine learning. ArXiv. arXiv:1503.01186
Karande V, Chandra S, Lin Z, Caballero J, Khan L, Hamlen K (2018) Bcd: decomposing binary code into components using graph-based clustering. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ASIACCS ’18, pp. 393–398. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3196494.3196504
Kipf TN, Welling M (2017) Semi-Supervised Classification with Graph Convolutional Networks
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR)
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. In: Advances in Neural Information Processing Systems
Kumar M, Kumar S, Kumar S (2016) Cryptographic algorithm identification using machine learning and massive processing. 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1579–1584. https://doi.org/10.1109/ICACCI.2016.7732377
Li M, Wang P, Wang W, Wang S, Wu D, Liu J, Xue R, Huo W, Zou W (2020) Large-scale third-party library detection in Android markets. IEEE Trans Software Eng 46(9):981–1003. https://doi.org/10.1109/TSE.2018.2872958
Article MATH Google Scholar
Li Y, Kang F, Shu H, Xiong X, Sha Z, Sui Z, Nassar M (2022) Coops: A code obfuscation method based on obscuring program semantics. Sec Commun Netw 2022:6903370. https://doi.org/10.1155/2022/6903370
Article Google Scholar
Li Y, Gu C, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: International conference on machine learning, pp 3835–3845. PMLR
Li X, Qu Y, Yin H (2021) Palmtree: learning an assembly language model for instruction embedding. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. CCS ’21, pp 3236–3251. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3460120.3484587
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692
Li X, Yu Q, Yin H (2021) Palmtree: Learning an assembly language model for instruction embedding. In: Proceedings of the 2021 ACM SIGSAC conference on computer and communications security, pp 2145–2160
Luo Z, Wang P, Wang B, Tang Y, Xie W, Zhou X, Liu D, Lu K (2023) VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search
Martin J-P, Hicks M, Costa M, Akritidis P, Castro M (2010) Dynamically checking ownership policies in concurrent c/c++ programs. In: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages. POPL ’10, pp 457–470. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1706299.1706351
Massarelli L, Di Luna GA, Petroni F, Querzoni L, Baldoni R (2019) Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis. In: Proceedings 2019 Workshop on Binary Analysis Research. Internet Society, San Diego, CA. https://doi.org/10.14722/bar.2019.23020
Massarelli L, Di Luna GA, Petroni F, Querzoni L, Baldoni R (2019) Safe: self-attentive function embeddings for binary similarity. In: Proceedings of 16th conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA)
McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157. https://doi.org/10.1007/BF02295996
Article MATH Google Scholar
Mou L, Li G, Zhang L, Wang T, Jin Z (2016) Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, pp. 1287–1293
Nathani D, Chauhan J, Sharma C, Kaul M (2019) Learning attention-based embeddings for relation prediction in knowledge graphs. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4710–4723. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1466
OpenAI (2023) OpenAI API. https://platform.openai.com/docs/
Ortego D, Arazo E, Albert P, O’Connor NE, McGuinness K (2021) Towards robust learning with different label noise distributions. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7020–7027. https://doi.org/10.1109/ICPR48806.2021.9412747
Pei K, Xuan Z, Yang J, Jana SS, Ray B (2020) Trex: Learning execution semantics from micro-traces for binary similarity. ArXiv. arXiv:2012.08680
Peng T, Zhu C, Luo Y, Liu J, Wang Y, Jin M (2020) Noise robust learning with hard example aware for pathological image classification. In: 2020 IEEE 6th International conference on computer and communications (ICCC), pp 1903–1907. https://doi.org/10.1109/ICCC51575.2020.9344937
Qiu J, Su X, Ma P (2015) Library functions identification in binary code by using graph isomorphism testings. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp 261–270. https://doi.org/10.1109/SANER.2015.7081836
Reina A, Fattori A, Pagani F, Cavallaro L, Bruschi D (2012) When hardware meets software: a bulletproof solution to forensic memory acquisition. In: Proceedings of the 28th Annual Computer Security Applications Conference. ACSAC ’12, pp 79–88. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2420950.2420962
Schulte E, Brown MD, Folts V (2022) A broad comparative evaluation of x86-64 binary rewriters. In: Cyber Security Experimentation and Test Workshop . https://doi.org/10.1145/3546096.3546112
Schwartz EJ, Lee J, Woo M, Brumley D (2013) Native $\times 86$ decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: Proceedings of the 22nd USENIX Conference on Security. SEC’13, pp. 353–368. USENIX Association, USA
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Erk, K, Smith, NA (eds.) Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1715–1725. Association for Computational Linguistics, Berlin, Germany. https://doi.org/10.18653/v1/P16-1162 . https://aclanthology.org/P16-1162
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Erk, K, Smith, NA (eds.) Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1715–1725. Association for Computational Linguistics, Berlin, Germany. https://doi.org/10.18653/v1/P16-1162 . https://aclanthology.org/P16-1162
Shanahan M (2022) Talking about large language models. arXiv:2212.03551
Shirani P, Wang L, Debbabi M (2017) Binshape: scalable and robust binary library function identification using function shape. In: Polychronakis M, Meier M (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, Cham, pp 301–324
Chapter MATH Google Scholar
Shoshitaishvili Y, Wang R, Salls C, Stephens N, Polino M, Dutcher A, Grosen J, Feng S, Hauser C, Kruegel C, Vigna G (2016) Sok: (state of) the art of war: offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and Privacy (SP), pp 138–157. https://doi.org/10.1109/SP.2016.17
Srivastava S, Singh A, Singh A, Singh R, Singh MK (2021) Cryptographic algorithm identification using deep learning techniques. In: Kumar, D., Tiwari, SK, Trivedi, M. (eds.) Proceedings of international conference on smart innovations in communications and computational sciences, pp 789–797. Springer, Singapore
Tang W, Chen D, Luo P (2018) Bcfinder: a lightweight and platform-independent tool to find third-party components in binaries. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp 288–297. https://doi.org/10.1109/APSEC.2018.00043
Tang W, Wang Y, Zhang H, Han S, Luo P, Zhang D (2022) Libdb: an effective and efficient framework for detecting third-party libraries in binaries. In: 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), pp 423–434. https://doi.org/10.1145/3524842.3528442
Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, Guestrin C, Liang P, Hashimoto TB (2023) Stanford Alpaca: An Instruction-following LLaMA model. GitHub
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M-A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G (2023) Llama: Open and efficient foundation language models. arXiv:2302.13971
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, Bikel D, Blecher L, Ferrer CC, Chen M, Cucurull G, Esiobu D, Fernandes J, Fu J, Fu W, Fuller B, Gao C, Goswami V, Goyal N, Hartshorn A, Hosseini S, Hou R, Inan H, Kardas M, Kerkez V, Khabsa M, Kloumann I, Korenev A, Koura PS, Lachaux M-A, Lavril T, Lee J, Liskovich D, Lu Y, Mao Y, Martinet X, Mihaylov T, Mishra P, Molybog I, Nie Y, Poulton A, Reizenstein J, Rungta R, Saladi K, Schelten A, Silva R, Smith EM, Subramanian R, Tan XE, Tang B, Taylor R, Williams A, Kuan JX, Xu P, Yan Z, Zarov I, Zhang Y, Fan A, Kambadur M, Narang S, Rodriguez A, Stojnic R, Edunov S, Scialom T (2023) Llama 2: Open Foundation and Fine-Tuned Chat Models
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17, pp 6000–6010. Curran Associates Inc., Red Hook, NY, USA
Veličković, P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2017) Graph attention networks. In: ICLR 2018
Wang H, Gao Z, Zhang C, Sun M, Zhou Y, Qiu H, Xiao,X (2024) CEBin: a Cost-Effective Framework for Large-Scale Binary Code Similarity Detection . https://arxiv.org/abs/2402.18818
Wang Y, Liu W, Ma X, Bailey J, Zha H, Song L, Xia S-T (2018) Iterative learning with open-set noisy labels. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 8688–8696. https://doi.org/10.1109/CVPR.2018.00906
Wang H, Qu W, Katz G, Zhu W, Gao Z, Qiu H, Zhuge J, Zhang C (2022) Jtrans: jump-aware transformer for binary code similarity detection. In: Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis. ISSTA 2022, pp 1–13. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3533767.3534367
Wang Y, Wang W, Joty S, Hoi SCH (2021) CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens, M.-F, Huang, X, Specia, L, Yih, SW-t (eds.) Proceedings of the 2021 conference on empirical methods in natural language processing, pp 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. https://doi.org/10.18653/v1/2021.emnlp-main.685 . https://aclanthology.org/2021.emnlp-main.685
Xu X, Liu C, Feng Q, Yin H, Song L, Song, D (2017) Neural Network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 363–376. ACM, Dallas Texas USA. https://doi.org/10.1145/3133956.3134018
Xu D, Ming J, Wu D (2017) Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping. In: 2017 IEEE Symposium on Security and Privacy (SP), pp 921–937. https://doi.org/10.1109/SP.2017.56
Yang C, Xu Z, Chen H, Liu Y, Gong X, Liu B (2022) Modx: binary level partially imported third-party library detection via program modularization and semantic matching. In: 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), pp. 1393–1405. https://doi.org/10.1145/3510003.3510627
You W, Zhang Z, Kwon Y, Aafer Y, Peng F, Shi Y, Harmon C, Zhang X (2020) Pmp: cost-effective forced execution with probabilistic memory pre-planning. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1121–1138. https://doi.org/10.1109/SP40000.2020.00035
Yuan Z, Feng M, Li F, Ban G, Xiao Y, Wang S, Tang Q, Su H, Yu C, Xu J, Piao A, Xuey J, Huo W (2019) B2sfinder: detecting open-source software reuse in cots software. In: 2019 34th IEEE/ACM International conference on Automated Software Engineering (ASE), pp. 1038–1049. https://doi.org/10.1109/ASE.2019.00100
Yu Z, Cao R, Tang Q, Nie S, Huang J, Wu S (2020) Order matters: semantic-aware neural networks for binary code similarity detection. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, USA, February 7-12, 2020, pp 1145–1152
Zeng A, Liu X, Du Z, Wang Z, Lai H, Ding M, Yang Z, Xu Y, Zheng W, Xia X, Tam WL, Ma Z, Xue Y, Zhai J, Chen W, Liu Z, Zhang P, Dong Y, Tang J (2023) GLM-130b: an open bilingual pre-trained model. In: The Eleventh International Conference on Learning Representations (ICLR)
Zhang C, Li Y, Chen H, Luo X, Li M, Nguyen AQ, Liu Y (2021) Biff: practical binary fuzzing framework for programs of iot and mobile devices. In: 2021 36th IEEE/ACM International conference on Automated Software Engineering (ASE), pp 1161–1165. https://doi.org/10.1109/ASE51524.2021.9678910
Zhang D, Luo P, Tang W, Zhou M (2021) Osldetector: identifying open-source libraries through binary analysis. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. ASE ’20, pp 1312–1315. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3324884.3415303
Zhu C, Chen W, Peng T, Wang Y, Jin M (2022) Hard sample aware noise robust learning for histopathology image classification. IEEE Trans Med Imaging 41(4):881–894. https://doi.org/10.1109/TMI.2021.3125459
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Cyberspace Security, Ministry of Education, Zhengzhou, China
Zihan Sha
Tsinghua University, Beijing, China
Chao Zhang, Hao Wang, Zeyu Gao, Yang Lan & Hui Shu
Institute of Information Engineering CAS, Beijing, China
Bolun Zhang

Authors

Zihan Sha
View author publications
You can also search for this author in PubMed Google Scholar
Chao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Bolun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Lan
View author publications
You can also search for this author in PubMed Google Scholar
Hui Shu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zihan Sha.

Ethics declarations

Conflict of Interest Statement

We declare that all authors have no conflict of interest.

Additional information

Communicated by: Foutse Khomh,Bowen Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sha, Z., Zhang, C., Wang, H. et al. PromeTrans: Bootstrap binary functionality classification with knowledge transferred from pre-trained models. Empir Software Eng 30, 32 (2025). https://doi.org/10.1007/s10664-024-10593-y

Download citation

Accepted: 13 November 2024
Published: 27 November 2024
DOI: https://doi.org/10.1007/s10664-024-10593-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PromeTrans: Bootstrap binary functionality classification with knowledge transferred from pre-trained models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Code Action Network for Binary Function Scope Identification

Decompilation Based Deep Binary-Source Function Matching

AttnCall: Refining Indirect Call Targets in Binaries with Attention

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest Statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

PromeTrans: Bootstrap binary functionality classification with knowledge transferred from pre-trained models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Code Action Network for Binary Function Scope Identification

Decompilation Based Deep Binary-Source Function Matching

AttnCall: Refining Indirect Call Targets in Binaries with Attention

Explore related subjects

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest Statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation