Formal Semantics Extraction from Natural Language Specifications for ARM

Vu, Anh V.; Ogawa, Mizuhito

doi:10.1007/978-3-030-30942-8_28

Formal Semantics Extraction from Natural Language Specifications for ARM

Anh V. Vu¹¹ &
Mizuhito Ogawa¹¹

Conference paper
First Online: 23 September 2019

1548 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11800))

Abstract

This paper proposes a method to systematically extract the formal semantics of ARM instructions from their natural language specifications. Although ARM is based on RISC architecture and the number of instructions is relatively small, an abundance of variations diversely exist under various series including Cortex-A, Cortex-M, and Cortex-R. Thus, the semi-automatic semantics formalisation of rather simple instructions results in reducing tedious human efforts for tool developments e.g., the symbolic execution. We concentrate on six variations: M0, M0+, M3, M4, M7, and M33 of ARM Cortex-M series, aiming at covering IoT malware. Our systematic approach consists of the semantics interpretation by applying translation rules, augmented by the sentences similarity analysis to recognise the modification of flags. Among 1039 collected specifications, the formal semantics of 662 instructions have been successfully extracted by using only 228 manually prepared rules. They are utilised afterwards to preliminarily build a dynamic symbolic execution tool for Cortex-M called Corana. We experimentally observe that Corana is capable of effectively tracing IoT malware under the presence of obfuscation techniques like indirect jumps, as well as correctly detecting dead conditional branches, which are regarded as opaque predicates.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385–394 (1976)
Article MathSciNet Google Scholar
Thakur, A., et al.: Directed proof generation for machine code. In: Tayssir, T., Byron, C., Paul, J. (eds.) CAV 2010. LNCS, vol. 6174, pp. 288–305. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_27
Chapter Google Scholar
Desclaux, F.: miasm: Framework de reverse engineering. In: Actes du SSTIC (2012)
Google Scholar
Cha, S.K., Avgerinos, T., Rebert, A., Brumley, D.: Unleashing Mayhem on binary code. In: IEEE S and P 2012, pp. 380–394 (2012)
Google Scholar
Anthony, R.: Methods for binary symbolic execution. In: Ph.D. Dissertation, Stanford University (December 2014)
Google Scholar
Bonfante, G., Fernandez, J., Marion, J.Y., Rouxel, B., Sabatier, F., Thierry, A.: Codisasm: medium scale concatic disassembly of self-modifying binaries with overlapping instructions. In: CCS 2015, pp. 745–756 (2015)
Google Scholar
Hai, N.M., Ogawa, M., Tho, Q.T.: Obfuscation code localization based on CFG generation of malware. In: Garcia-Alfaro, J., Kranakis, E., Bonfante, G. (eds.) FPS 2015. LNCS, vol. 9482, pp. 229–247. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30303-1_14
Shoshitaishvili, Y., et al.: (State of) the art of war: offensive techniques in binary analysis. In: IEEE S and P 2016, pp. 138–157 (2016)
Google Scholar
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: ACM PLDI 2007, pp. 89–100 (2007)
Google Scholar
Capstone Engine. http://capstone-engine.org. Accessed 9 July 2019
Ida. https://hex-rays.com/products/ida. Accessed 9 July 2019
Krishnamoorthy, N., Debray, S., Fligg, K.: Static detection of disassembly errors. In: IEEE WCRE 2009, pp. 259–268 (2009)
Google Scholar
Dasgupta, S., Park, D., Kasampalis, T., Adve, V.S., Rosu, G.: A complete formal semantics of x86-64 user-level instruction set architecture. In: ACM PLDI 2019, pp. 1133–1148 (2019)
Google Scholar
ARM Developer. https://developer.arm.com. Accessed 9 July 2019
The Corana Tool. https://anhvvcs.github.io/corana. Accessed 9 July 2019
Robeer, M., Lucassen, G., van der Werf, J.M.E., Dalpiaz, F., Brinkkemper, S.: Automated extraction of conceptual models from user stories via NLP. In: IEEE RE 2016, pp. 196–205 (2016)
Google Scholar
Yue, T., Briand, L.C., Labiche, Y.: aToucan: an automated framework to derive UML analysis models from use case models. ACM TOSEM 24(3), 13:1–13:52 (2015)
Google Scholar
Heule, S., Schkufza, E., Sharma, R., Aiken, A.: Stratified synthesis: automatically learning the x86-64 instruction set. In: ACM PLDI 2016, pp. 237–250 (2016)
Google Scholar
Schkufza, E., Sharma, R., Aiken, A.: Stochastic superoptimization. In: ASPLOS 2013, pp. 305–316 (2013)
Google Scholar
\(\mu \)Vision. http://keil.com/mdk5/uvision. Accessed 9 July 2019
Yen, N.L.H.: Automatic extraction of x86 formal semantics from its natural language description. In: Master’s Thesis, School of Information Science, JAIST (March 2018)
Google Scholar
Anh, V.V.: Formal semantics extraction from natural language specifications for ARM. In: Master’s Thesis, School of Information Science, JAIST (December 2018)
Google Scholar
Bonfante, G., Marion, J.Y., Reynaud-Plantey, D.: A computability perspective on self-modifying programs. In: SEFM 2009, pp. 231–239 (2009)
Google Scholar
Degenbaev, U.: Formal specification of the x86 instruction set architecture. In: Ph.D. Dissertation, Universitat des Saarlandes (February 2012)
Google Scholar
Aceto, L., Fokkink, W., Verhoef, C.: Structural operational semantics. Handbook of Process Algebra, pp. 197–292 (2001)
Google Scholar
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: ACL (2004)
Google Scholar
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Documentation 60(5), 503–520 (2004)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Luckow, K., et al.: JDart: a dynamic symbolic analysis framework. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 442–459. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_26
Chapter Google Scholar
Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)
Google Scholar
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Chapter Google Scholar
Kirat, D., Vigna, G., Kruegel, C.: barebox: efficient malware analysis on bare-metal. In: ACSAC 2011, pp. 403–412 (2011)
Google Scholar
Brumley, D., Hartwig, C., Liang, Z., Newsome, J., Song, D., Yin, H.: Automatically identifying trigger-based behavior in malware. In: Wenke L., Cliff W., David D. (eds.) Botnet Detection 2008, ADIS, vol. 36, pp. 65–88. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-68768-14
Fleck, D., Tokhtabayev, A., Alarif, A., Stavrou, A., Nykodym, T.: PyTrigger: a system to trigger & extract user-activated malware behavior. In: AERES 2013, pp. 92–101 (2013)
Google Scholar
Virus Total. https://www.virustotal.com. Accessed 9 July 2019
Virus Share. https://virusshare.com. Accessed 9 July 2019

Download references

Acknowledgments

We are grateful to Nao Hirokawa, Le Minh Nguyen, and the anonymous reviewers of FM’19 for their insightful feedback and invaluable comments. We sincerely thank Xuan Tung Vu, Thi Hai Yen Vuong, and Lam Hoang Yen Nguyen for their constructive discussions, as well as Thu Trang Hoang for her sharp comments on some grammatical issues. This study is partially supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (B) 19H04083.

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Nomi, Japan
Anh V. Vu & Mizuhito Ogawa

Authors

Anh V. Vu
View author publications
You can also search for this author in PubMed Google Scholar
Mizuhito Ogawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anh V. Vu .

Editor information

Editors and Affiliations

Consiglio Nazionale delle Ricerche, Pisa, Italy
Maurice H. ter Beek
Macquarie University, Sydney, NSW, Australia
Annabelle McIver
University of Minho, Braga, Portugal
José N. Oliveira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vu, A.V., Ogawa, M. (2019). Formal Semantics Extraction from Natural Language Specifications for ARM. In: ter Beek, M., McIver, A., Oliveira, J. (eds) Formal Methods – The Next 30 Years. FM 2019. Lecture Notes in Computer Science(), vol 11800. Springer, Cham. https://doi.org/10.1007/978-3-030-30942-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-30942-8_28
Published: 23 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30941-1
Online ISBN: 978-3-030-30942-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics