Skip to main content

Advertisement

PromeTrans: Bootstrap binary functionality classification with knowledge transferred from pre-trained models

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Pre-trained models have witnessed significant progress in nature language (including source code) and binary code comprehension. However, none of them are suitable for binary functionality classification (BFC). In this paper, we present the first pre-trained model-based solution to BFC, namely PromeTrans, by fusing the knowledge of pre-trained models. Specifically, it overcomes the token size limitation of pre-trained models with a novel function outlining scheme and utilizes existing pre-trained assembly language models (AsmLMs) to generate embeddings for binary functions. Then, it utilizes a Graph Attention Network (GAT) to aggregate function embeddings following the call graph into a functionality embedding for each function. Lastly, it leverages existing pre-trained large natural language models (LLMs, e.g., GPT-3.5) to classify the functionality of source code functions and align the labels to binary functions. Based on the functionality embedding provided by AsmLMs and GAT and the functionality label knowledge provided by LLMs, a simple multi-layer perceptron (MLP) model is trained to classify the functionality of binary functions. Our prototype PromeTrans yields state-of-the-art (SOTA) performance on various datasets and achieves low overhead. PromeTrans also exhibits exceptional results in real-world applications (e.g., malware analysis). Additionally, by analyzing PromeTrans’s training history, we confirm the quality of knowledge transferred from LLMs is high. It shows that transferring knowledge from pre-trained models has a strong potential to bootstrap binary program comprehension tasks beyond BFC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Our code and data are available at https://github.com/Sandspeare/prometrans.

References

  • Alrabaee S, Debbabi M, Shirani P, Wang L, Youssef A, Rahimian A, Nouh L, Mouheb D, Huang H, Hanna A (2020) Library Function Identification, pp 79–99. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_4

  • Arpit D, Jastrzundefinedbski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y, Lacoste-Julien S (2017) A closer look at memorization in deep networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17, pp. 233–242

  • Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, Lee YT, Li Y, Lundberg S, Nori H, Palangi H, Ribeiro MT, Zhang Y (2023) Sparks of Artificial General Intelligence: Early experiments with GPT-4. https://www.microsoft.com/en-us/research/publication/sparks-of-artificial-general-intelligence-early-experiments-with-gpt-4/

  • Carbone M, Cui W, Lu L, Lee W, Peinado M, Jiang X (2009) Mapping kernel objects to enable systematic integrity checking. In: Proceedings of the 16th ACM conference on Computer and Communications Security. CCS ’09, pp. 555–565. Association for Computing Machinery, New York, NY, USA.https://doi.org/10.1145/1653662.1653729

  • Chen L, He Z, Wu H, Xu F, Qian Y, Mao B (2022) Dicomp: lightweight data-driven inference of binary compiler provenance with high accuracy. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 112–122. https://doi.org/10.1109/SANER53432.2022.00025

  • Chen P, Liao BB, Chen G, Zhang S (2019) Understanding and utilizing deep neural networks trained with noisy labels. In: International conference on machine learning, pp 1062–1070

  • Comparetti PM, Salvaneschi G, Kirda E, Kolbitsch C, Kruegel C, Zanero S (2010) Identifying dormant functionality in malware programs. In: 2010 IEEE Symposium on security and privacy, pp 61–76. https://doi.org/10.1109/SP.2010.12

  • Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J, Doran, C, Solorio, T (eds.) Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423 . https://aclanthology.org/N19-1423

  • Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J, Doran, C, Solorio, T (eds.) Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423 . https://aclanthology.org/N19-1423

  • Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423

  • Ding SHH, Fung BCM, Charland P (2019) Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. https://doi.org/10.1109/SP.2019.00003

  • Ding SHH, Fung BCM, Charland P (2019) Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp 472–489. https://doi.org/10.1109/SP.2019.00003

  • Downing E, Wang R, Liang Z, Li J, Wang X (2021) Deepreflect: discovering malicious functionality through binary reconstruction. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 1059–1076

  • Duan R, Bijlani A, Xu M, Kim T, Lee W (2017) Identifying open-source license violation and 1-day security risk at large scale. In: Proceedings of the 2017 ACM SIGSAC conference on Computer and Communications Security. CCS ’17, pp. 2169–2185. Association for Computing Machinery, New York, NY, USA .https://doi.org/10.1145/3133956.3134048

  • Duan Y, Li X, Wang J, Yin H (2020) Deepbindiff: Learning program-wide code representations for binary diffing. Proceedings 2020 Network and Distributed System Security Symposium

  • Du Z, Qian Y, Liu X, Ding M, Qiu J, Yang Z, Tang J (2022) Glm: general language model pretraining with autoregressive blank infilling. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 320–335

  • Farkhani RM, Jafari S, Arshad S, Robertson W, Kirda E, Okhravi H (2018) On the effectiveness of type-based control flow integrity. In: Proceedings of the 34th Annual Computer Security Applications Conference. ACSAC ’18, pp 28–39. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3274694.3274739

  • Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) CodeBERT: A Pre-Trained Model for Programming and Natural Languages. https://arxiv.org/abs/2002.08155

  • francisck: DanderSpritz docs. GitHub (2017)

  • Fratantonio Y, Bianchi A, Robertson W, Kirda E, Kruegel C, Vigna G (2016) Triggerscope: towards detecting logic bombs in Android applications. In: 2016 IEEE Symposium on Security and Privacy (SP), pp 377–396. https://doi.org/10.1109/SP.2016.30

  • Gao J, Yang X, Fu Y, Jiang Y, Sun J (2018) Vulseeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899. https://doi.org/10.1145/3238147.3240480

  • Gröbert F, Willems C, Holz T (2011) Automated identification of cryptographic primitives in binary programs. In: Sommer R, Balzarotti D, Maier G (eds) Recent Advances in Intrusion Detection. Springer, Berlin, Heidelberg, pp 41–60

    Chapter  MATH  Google Scholar 

  • Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd international conference on Neural Information Processing Systems. NIPS’18, pp. 8536–8546. Curran Associates Inc., Red Hook, NY, USA

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778

  • Hoffmann J, Borgeaud S, Mensch A, Buchatskaya E, Cai T, Rutherford E, Las Casas D, Hendricks LA, Welbl J, Clark A, Hennigan T, Noland E, Millican K, Driessche G, Damoc B, Guy A, Osindero S, Simonyan K, Elsen E, Vinyals O, Rae J, Sifre L (2022) An empirical analysis of compute-optimal large language model training. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds.) Advances in Neural Information Processing Systems

  • Hosfelt DD (2015) Automated detection and classification of cryptographic algorithms in binary programs through machine learning. ArXiv. arXiv:1503.01186

  • Karande V, Chandra S, Lin Z, Caballero J, Khan L, Hamlen K (2018) Bcd: decomposing binary code into components using graph-based clustering. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ASIACCS ’18, pp. 393–398. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3196494.3196504

  • Kipf TN, Welling M (2017) Semi-Supervised Classification with Graph Convolutional Networks

  • Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR)

  • Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. In: Advances in Neural Information Processing Systems

  • Kumar M, Kumar S, Kumar S (2016) Cryptographic algorithm identification using machine learning and massive processing. 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1579–1584. https://doi.org/10.1109/ICACCI.2016.7732377

  • Li M, Wang P, Wang W, Wang S, Wu D, Liu J, Xue R, Huo W, Zou W (2020) Large-scale third-party library detection in Android markets. IEEE Trans Software Eng 46(9):981–1003. https://doi.org/10.1109/TSE.2018.2872958

    Article  MATH  Google Scholar 

  • Li Y, Kang F, Shu H, Xiong X, Sha Z, Sui Z, Nassar M (2022) Coops: A code obfuscation method based on obscuring program semantics. Sec Commun Netw 2022:6903370. https://doi.org/10.1155/2022/6903370

    Article  Google Scholar 

  • Li Y, Gu C, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: International conference on machine learning, pp 3835–3845. PMLR

  • Li X, Qu Y, Yin H (2021) Palmtree: learning an assembly language model for instruction embedding. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. CCS ’21, pp 3236–3251. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3460120.3484587

  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692

  • Li X, Yu Q, Yin H (2021) Palmtree: Learning an assembly language model for instruction embedding. In: Proceedings of the 2021 ACM SIGSAC conference on computer and communications security, pp 2145–2160

  • Luo Z, Wang P, Wang B, Tang Y, Xie W, Zhou X, Liu D, Lu K (2023) VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search

  • Martin J-P, Hicks M, Costa M, Akritidis P, Castro M (2010) Dynamically checking ownership policies in concurrent c/c++ programs. In: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages. POPL ’10, pp 457–470. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1706299.1706351

  • Massarelli L, Di Luna GA, Petroni F, Querzoni L, Baldoni R (2019) Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis. In: Proceedings 2019 Workshop on Binary Analysis Research. Internet Society, San Diego, CA. https://doi.org/10.14722/bar.2019.23020

  • Massarelli L, Di Luna GA, Petroni F, Querzoni L, Baldoni R (2019) Safe: self-attentive function embeddings for binary similarity. In: Proceedings of 16th conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA)

  • McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157. https://doi.org/10.1007/BF02295996

    Article  MATH  Google Scholar 

  • Mou L, Li G, Zhang L, Wang T, Jin Z (2016) Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, pp. 1287–1293

  • Nathani D, Chauhan J, Sharma C, Kaul M (2019) Learning attention-based embeddings for relation prediction in knowledge graphs. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4710–4723. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1466

  • OpenAI (2023) OpenAI API. https://platform.openai.com/docs/

  • Ortego D, Arazo E, Albert P, O’Connor NE, McGuinness K (2021) Towards robust learning with different label noise distributions. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7020–7027. https://doi.org/10.1109/ICPR48806.2021.9412747

  • Pei K, Xuan Z, Yang J, Jana SS, Ray B (2020) Trex: Learning execution semantics from micro-traces for binary similarity. ArXiv. arXiv:2012.08680

  • Peng T, Zhu C, Luo Y, Liu J, Wang Y, Jin M (2020) Noise robust learning with hard example aware for pathological image classification. In: 2020 IEEE 6th International conference on computer and communications (ICCC), pp 1903–1907. https://doi.org/10.1109/ICCC51575.2020.9344937

  • Qiu J, Su X, Ma P (2015) Library functions identification in binary code by using graph isomorphism testings. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp 261–270. https://doi.org/10.1109/SANER.2015.7081836

  • Reina A, Fattori A, Pagani F, Cavallaro L, Bruschi D (2012) When hardware meets software: a bulletproof solution to forensic memory acquisition. In: Proceedings of the 28th Annual Computer Security Applications Conference. ACSAC ’12, pp 79–88. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2420950.2420962

  • Schulte E, Brown MD, Folts V (2022) A broad comparative evaluation of x86-64 binary rewriters. In: Cyber Security Experimentation and Test Workshop . https://doi.org/10.1145/3546096.3546112

  • Schwartz EJ, Lee J, Woo M, Brumley D (2013) Native \(\times 86\) decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: Proceedings of the 22nd USENIX Conference on Security. SEC’13, pp. 353–368. USENIX Association, USA

  • Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Erk, K, Smith, NA (eds.) Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1715–1725. Association for Computational Linguistics, Berlin, Germany. https://doi.org/10.18653/v1/P16-1162 . https://aclanthology.org/P16-1162

  • Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Erk, K, Smith, NA (eds.) Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1715–1725. Association for Computational Linguistics, Berlin, Germany. https://doi.org/10.18653/v1/P16-1162 . https://aclanthology.org/P16-1162

  • Shanahan M (2022) Talking about large language models. arXiv:2212.03551

  • Shirani P, Wang L, Debbabi M (2017) Binshape: scalable and robust binary library function identification using function shape. In: Polychronakis M, Meier M (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, Cham, pp 301–324

    Chapter  MATH  Google Scholar 

  • Shoshitaishvili Y, Wang R, Salls C, Stephens N, Polino M, Dutcher A, Grosen J, Feng S, Hauser C, Kruegel C, Vigna G (2016) Sok: (state of) the art of war: offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and Privacy (SP), pp 138–157. https://doi.org/10.1109/SP.2016.17

  • Srivastava S, Singh A, Singh A, Singh R, Singh MK (2021) Cryptographic algorithm identification using deep learning techniques. In: Kumar, D., Tiwari, SK, Trivedi, M. (eds.) Proceedings of international conference on smart innovations in communications and computational sciences, pp 789–797. Springer, Singapore

  • Tang W, Chen D, Luo P (2018) Bcfinder: a lightweight and platform-independent tool to find third-party components in binaries. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp 288–297. https://doi.org/10.1109/APSEC.2018.00043

  • Tang W, Wang Y, Zhang H, Han S, Luo P, Zhang D (2022) Libdb: an effective and efficient framework for detecting third-party libraries in binaries. In: 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), pp 423–434. https://doi.org/10.1145/3524842.3528442

  • Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, Guestrin C, Liang P, Hashimoto TB (2023) Stanford Alpaca: An Instruction-following LLaMA model. GitHub

  • Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M-A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G (2023) Llama: Open and efficient foundation language models. arXiv:2302.13971

  • Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, Bikel D, Blecher L, Ferrer CC, Chen M, Cucurull G, Esiobu D, Fernandes J, Fu J, Fu W, Fuller B, Gao C, Goswami V, Goyal N, Hartshorn A, Hosseini S, Hou R, Inan H, Kardas M, Kerkez V, Khabsa M, Kloumann I, Korenev A, Koura PS, Lachaux M-A, Lavril T, Lee J, Liskovich D, Lu Y, Mao Y, Martinet X, Mihaylov T, Mishra P, Molybog I, Nie Y, Poulton A, Reizenstein J, Rungta R, Saladi K, Schelten A, Silva R, Smith EM, Subramanian R, Tan XE, Tang B, Taylor R, Williams A, Kuan JX, Xu P, Yan Z, Zarov I, Zhang Y, Fan A, Kambadur M, Narang S, Rodriguez A, Stojnic R, Edunov S, Scialom T (2023) Llama 2: Open Foundation and Fine-Tuned Chat Models

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17, pp 6000–6010. Curran Associates Inc., Red Hook, NY, USA

  • Veličković, P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2017) Graph attention networks. In: ICLR 2018

  • Wang H, Gao Z, Zhang C, Sun M, Zhou Y, Qiu H, Xiao,X (2024) CEBin: a Cost-Effective Framework for Large-Scale Binary Code Similarity Detection . https://arxiv.org/abs/2402.18818

  • Wang Y, Liu W, Ma X, Bailey J, Zha H, Song L, Xia S-T (2018) Iterative learning with open-set noisy labels. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 8688–8696. https://doi.org/10.1109/CVPR.2018.00906

  • Wang H, Qu W, Katz G, Zhu W, Gao Z, Qiu H, Zhuge J, Zhang C (2022) Jtrans: jump-aware transformer for binary code similarity detection. In: Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis. ISSTA 2022, pp 1–13. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3533767.3534367

  • Wang Y, Wang W, Joty S, Hoi SCH (2021) CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens, M.-F, Huang, X, Specia, L, Yih, SW-t (eds.) Proceedings of the 2021 conference on empirical methods in natural language processing, pp 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. https://doi.org/10.18653/v1/2021.emnlp-main.685 . https://aclanthology.org/2021.emnlp-main.685

  • Xu X, Liu C, Feng Q, Yin H, Song L, Song, D (2017) Neural Network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 363–376. ACM, Dallas Texas USA. https://doi.org/10.1145/3133956.3134018

  • Xu D, Ming J, Wu D (2017) Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping. In: 2017 IEEE Symposium on Security and Privacy (SP), pp 921–937. https://doi.org/10.1109/SP.2017.56

  • Yang C, Xu Z, Chen H, Liu Y, Gong X, Liu B (2022) Modx: binary level partially imported third-party library detection via program modularization and semantic matching. In: 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), pp. 1393–1405. https://doi.org/10.1145/3510003.3510627

  • You W, Zhang Z, Kwon Y, Aafer Y, Peng F, Shi Y, Harmon C, Zhang X (2020) Pmp: cost-effective forced execution with probabilistic memory pre-planning. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1121–1138. https://doi.org/10.1109/SP40000.2020.00035

  • Yuan Z, Feng M, Li F, Ban G, Xiao Y, Wang S, Tang Q, Su H, Yu C, Xu J, Piao A, Xuey J, Huo W (2019) B2sfinder: detecting open-source software reuse in cots software. In: 2019 34th IEEE/ACM International conference on Automated Software Engineering (ASE), pp. 1038–1049. https://doi.org/10.1109/ASE.2019.00100

  • Yu Z, Cao R, Tang Q, Nie S, Huang J, Wu S (2020) Order matters: semantic-aware neural networks for binary code similarity detection. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, USA, February 7-12, 2020, pp 1145–1152

  • Zeng A, Liu X, Du Z, Wang Z, Lai H, Ding M, Yang Z, Xu Y, Zheng W, Xia X, Tam WL, Ma Z, Xue Y, Zhai J, Chen W, Liu Z, Zhang P, Dong Y, Tang J (2023) GLM-130b: an open bilingual pre-trained model. In: The Eleventh International Conference on Learning Representations (ICLR)

  • Zhang C, Li Y, Chen H, Luo X, Li M, Nguyen AQ, Liu Y (2021) Biff: practical binary fuzzing framework for programs of iot and mobile devices. In: 2021 36th IEEE/ACM International conference on Automated Software Engineering (ASE), pp 1161–1165. https://doi.org/10.1109/ASE51524.2021.9678910

  • Zhang D, Luo P, Tang W, Zhou M (2021) Osldetector: identifying open-source libraries through binary analysis. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. ASE ’20, pp 1312–1315. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3324884.3415303

  • Zhu C, Chen W, Peng T, Wang Y, Jin M (2022) Hard sample aware noise robust learning for histopathology image classification. IEEE Trans Med Imaging 41(4):881–894. https://doi.org/10.1109/TMI.2021.3125459

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zihan Sha.

Ethics declarations

Conflict of Interest Statement

We declare that all authors have no conflict of interest.

Additional information

Communicated by: Foutse Khomh,Bowen Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sha, Z., Zhang, C., Wang, H. et al. PromeTrans: Bootstrap binary functionality classification with knowledge transferred from pre-trained models. Empir Software Eng 30, 32 (2025). https://doi.org/10.1007/s10664-024-10593-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10593-y

Keywords