research-article

Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse

Authors:
Qing Huang

Jiangxi Normal University, China

Jiangxi Normal University, China

0000-0002-8877-4267
View Profile

,
Dianshu Liao

Jiangxi Normal University, China

Jiangxi Normal University, China

0009-0000-0865-0444
View Profile

,
Zhenchang Xing

CSIRO’s Data61 & Australian National University, Australia

CSIRO’s Data61 & Australian National University, Australia

0000-0001-7663-1421
View Profile

,
Zhengkang Zuo

Jiangxi Normal University, China

Jiangxi Normal University, China

0000-0002-7118-3727
View Profile

,
Changjing Wang

Jiangxi Normal University, China

Jiangxi Normal University, China

0000-0002-3601-4979
View Profile

,
Xin Xia

Zhejiang University, China

Zhejiang University, China

0000-0002-6302-3256
View Profile

ACM Transactions on Software Engineering and Methodology Volume 32 Issue 6Article No.: 147pp 1–37https://doi.org/10.1145/3597206

Published:30 September 2023Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.

REFERENCES

[1] Nakamoto Satoshi. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System. Apple Books.Google Scholar
[2] Wood Daniel Davis. 2014. Ethereum: A Secure Decentralized Generalised Transaction Ledger. Ethereum Project Yellow Paper. Scientific Research.Google Scholar
[3] Torres Christof Ferreira, Baden Mathis, Norvill Robert, Pontiveros Beltran Borja Fiz, Jonker Hugo L., and Mauw Sjouke. 2020. GIS: Shielding vulnerable smart contracts against attacks. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security.Google ScholarDigital Library
[4] Nguyen Tai Duy, Pham Long H., Sun Jun, Lin Yun, and Minh Quang Tran. 2020. sFuzz: An efficient adaptive fuzzer for Solidity smart contracts. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20).778–788.Google Scholar
[5] Buterin Vitalik. 2015. A Next Generation Smart Contract and Decentralized Application Platform. Retrieved June 1, 2023 from https://ethereum.org/en/whitepaper/.Google Scholar
[6] Zou Weiqin, Lo David, Kochhar Pavneet Singh, Le Xuan-Bach D., Xia Xin, Feng Yang, Chen Zhenyu, and Xu Baowen. 2021. Smart contract development: Challenges and opportunities. IEEE Transactions on Software Engineering 47 (2021), 2084–2106.Google ScholarCross Ref
[7] Gao Zhipeng, Jiang Lingxiao, Xia Xin, Lo David, and Grundy John C.. 2021. Checking smart contracts with structural code embedding. IEEE Transactions on Software Engineering 47 (2021), 2874–2891.Google ScholarCross Ref
[8] He Ningyu, Wu Lei, Wang Haoyu, Guo Yao, and Jiang Xuxian. 2020. Characterizing code clones in the Ethereum smart contract ecosystem. arXiv abs/1905.00272 (2020).Google Scholar
[9] Oliva Gustavo Ansaldi, Hassan A., and Jiang Zhen Ming Jack. 2020. An exploratory study of smart contracts in the Ethereum blockchain platform. Empirical Software Engineering 25 (2020), 1864–1904.Google ScholarDigital Library
[10] Hegedűs Péter. 2018. Towards analyzing the complexity landscape of Solidity based Ethereum smart contracts. In Proceedings of the 2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB’18).35–39.Google Scholar
[11] Chen Xiangping, Liao Peiyong, Zhang Yixin, Huang Yuan, and Zheng Zibin. 2021. Understanding code reuse in smart contracts. In Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER’21).470–479.Google ScholarCross Ref
[12] Lee Jong-Hoon, Yoon Seongho, and Lee Hyuk. 2022. SWC-based smart contract development guide research. In Proceedings of the 2022 24th International Conference on Advanced Communication Technology (ICACT’22).138–141.Google ScholarCross Ref
[13] Yang Guang, Liu Ke, Chen Xiang, Zhou Yanlin, Yu Chi, and Lin Hao. 2022. CCGIR: Information retrieval-based code comment generation method for smart contracts. Knowledge-Based Systems 237 (2022), 107858.Google ScholarDigital Library
[14] Srinivas Kavitha, Abdelaziz I., Dolby Julian T., and McCusker Jamie P.. 2020. Graph4Code: A machine interpretable knowledge graph for code. arXiv abs/2002.09440 (2020).Google Scholar
[15] Michail Amir. 2000. Data mining library reuse patterns using generalized association rules. In Proceedings of the 2000 International Conference on Software Engineering (ICSE’00).167–176.Google Scholar
[16] Cao Junming, Yang Shouliang, Jiang Wenhui, Zeng Hushuang, Shen Beijun, and Zhong Hao. 2020. BugPecker: Locating faulty methods with deep learning on revision graphs. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20).1214–1218.Google Scholar
[17] Robertson Stephen E. and Zaragoza Hugo. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3 (2009), 333–389.Google ScholarDigital Library
[18] Gu Xiaodong, Zhang Hongyu, and Kim Sunghun. 2018. Deep code search. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE’18).933–944.Google Scholar
[19] Feng Zhangyin, Guo Daya, Tang Duyu, Duan Nan, Feng Xiaocheng, Gong Ming, Shou Linjun, et al. 2020. CodeBERT: A pre-trained model for programming and natural languages. arXiv abs/2002.08155 (2020).Google Scholar
[20] Luu Loi, Chu Duc-Hiep, Olickel Hrishi, Saxena P., and Hobor Aquinas. 2016. Making smart contracts smarter. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
[21] B. Mueller. n.d. Mythril–reversing and bug hunting framework for the Ethereum blockchain. GitHub. Retrieved June 1, 2023 from https://github.com/ConsenSys/mythril.Google Scholar
[22] Tsankov Petar, Dan Andrei Marian, Drachsler-Cohen Dana, Gervais Arthur, Buenzli Florian, and Vechev Martin T.. 2018. Securify: Practical security analysis of smart contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
[23] Monperrus Martin, Maia Anthony, Rouvoy Romain, and Seinturier Lionel. 2014. Debugging with the crowd: A debug recommendation system based on StackOverflow. ERCIM News 99 (2014), 26–27.Google Scholar
[24] Wang Zeli, Jin Hai, Dai Weiqi, Choo Kim-Kwang Raymond, and Zou Deqing. 2021. Ethereum smart contract security research: Survey and future research opportunities. Frontiers of Computer Science 15 (2021), 1–18.Google ScholarDigital Library
[25] Feist Josselin, Grieco Gustavo, and Groce Alex. 2019. Slither: A static analysis framework for smart contracts. In Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB’19).8–15.Google ScholarDigital Library
[26] Vacca Anna, Fredella Michele, Sorbo Andrea Di, Visaggio Corrado Aaron, and Canfora Gerardo. 2022. An empirical investigation on the trade-off between smart contract readability and gas consumption. In Proceedings of the 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC’22).214–224.Google ScholarDigital Library
[27] GitHub. n.d. GitHub Home Page. Retrieved June 1, 2023 from https://github.com.Google Scholar
[28] Gormley Clinton and Tong Zachary J.. 2015. Elasticsearch: The Definitive Guide. O’Reilly Media.Google Scholar
[29] Ling Xiang, Wu Lingfei, Wang Sai Gang, Pan Gaoning, Ma Tengfei, Xu Fangli, Liu Alex X., Wu Chunming, and Ji Shouling. 2020. Deep graph matching and searching for semantic code retrieval. ACM Transactions on Knowledge Discovery from Data 15 (2020), 1–21.Google ScholarDigital Library
[30] Sun Weisong, Fang Chunrong, Chen Yuchen, Tao Guanhong, Han Ting, and Zhang Quanjun. 2022. Code search based on context-aware code translation. In Proceedings of the 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE’22).388–400.Google Scholar
[31] Liu Chao, Xia Xin, Lo David, Liu Zhiwei, Hassan A., and Li Shanping. 2020. CodeMatcher: Searching code based on sequential semantics of important query words. ACM Transactions on Software Engineering and Methodology 31 (2020), Article 12, 37 pages.Google Scholar
[32] Yu Hao, Zhang Yin, Zhao Yuli, and Zhang Bin. 2022. Incorporating code structure and quality in deep code search. Applied Sciences 12, 4 (2022), 2051.Google ScholarCross Ref
[33] Salza Pasquale, Schwizer Christoph, Gu Jian, and Gall Harald C.. 2021. On the effectiveness of transfer learning for code search. arXiv abs/2108.05890 (2021).Google Scholar
[34] Lu Shuai, Guo Daya, Ren Shuo, Huang Junjie, Svyatkovskiy Alexey, Blanco Ambrosio, Clement Colin B., et al. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. arXiv abs/2102.04664 (2021).Google Scholar
[35] Huang Junjie, Tang Duyu, Shou Linjun, Gong Ming, Xu Ke, Jiang Daxin, Zhou Ming, and Duan Nan. 2021. CoSQA: 20,000+ web queries for code search and question answering. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
[36] Michael del Castillo. 2016. The DAO attacked: Code issue leads to 60 million ether theft. CoinDesk. Retrieved June 1, 2023 from https://www.coindesk.com/markets/2016/06/17/the-dao-attacked-code-issue-leads-to-60-million-ether-theft/.Google Scholar
[37] Atzei Nicola, Bartoletti Massimo, and Cimoli Tiziana. 2017. A survey of attacks on Ethereum smart contracts (SoK). In Proceedings of the 6th International Conference on Principles of Security and Trust (POST’17). 164–186.Google ScholarDigital Library
[38] Landis J. Richard and Koch Gary G.. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33 2 (1977), 363–74.Google Scholar
[39] Singh Ravindra Pal and Mangat Naurang Singh. 1996. Elements of Survey Sampling. Texts in the Mathematical Sciences, Vol. 15. Springer.Google Scholar
[40] Welch B. L.. 1947. The generalization of student’s problems when several different population variances are involved. Biometrika 34 (1947), 28–35.Google ScholarCross Ref
[41] Li Hongwei, Li Sirui, Sun Jiamou, Xing Zhenchang, Peng Xin, Liu Mingwei, and Zhao Xuejiao. 2018. Improving API caveats accessibility by mining API caveats knowledge graph. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME’18).183–193.Google ScholarCross Ref
[42] Sun Jiamou, Xing Zhenchang, Chu Rui, Bai Heilai, Wang Jinshui, and Peng Xin. 2019. Know-how in programming tasks: From textual tutorials to task-oriented knowledge graph. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME’19).257–268.Google ScholarCross Ref
[43] Ren Xiaoxue, Xing Zhenchang, Xia Xin, Li Guoqiang, and Sun Jianling. 2019. Discovering, explaining and summarizing controversial discussions in community Q&A sites. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19).151–162.Google ScholarDigital Library
[44] Liu Mingwei, Peng Xin, Marcus Andrian, Xing Zhenchang, Xie Wenkai, Xing Shuangshuang, and Liu Yang. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google Scholar
[45] Liu Yang, Liu Mingwei, Peng Xin, Treude Christoph, Xing Zhenchang, and Zhang Xiaoxin. 2020. Generating concept based API element comparison using a knowledge graph. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20).834–845.Google Scholar
[46] Robillard Martin P., Bodden Eric, Kawrykow David, Mezini Mira, and Ratchford Tristan. 2013. Automated API property inference techniques. IEEE Transactions on Software Engineering 39 (2013), 613–637.Google ScholarDigital Library
[47] Martin Monperrus, Bruch Marcel, and Mezini Mira. 2010. Detecting missing method calls in object-oriented software. In Proceedings of the European Conference on Object-Oriented Programming.Google Scholar
[48] Bruch Marcel, Martin Monperrus, and Mezini Mira. 2009. Learning from examples to improve code completion systems. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’09). 213–222.Google ScholarDigital Library
[49] Wong Edmund, Liu Taiyue, and Tan Lin. 2015. CloCom: Mining existing source code for automatic comment generation. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15).380–389.Google ScholarCross Ref
[50] Wang Chong, Peng Xin, Liu Mingwei, Xing Zhenchang, Bai Xue, Xie Bing, and Wang Tuo. 2019. A learning-based approach for automatic construction of domain glossary from source code and documentation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google Scholar
[51] Lin Zeqi, Zou Yanzhen, Zhao Junfeng, and Xie Bing. 2017. Improving software text retrieval using conceptual knowledge in source code. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17).123–134.Google ScholarCross Ref
[52] Sim Susan Elliott, Umarji Medha, Ratanotayanon Sukanya, and Lopes Cristina V.. 2011. How well do search engines support code retrieval on the web? ACM Transactions on Software Engineering and Methodology 21 (2011), Article 4, 25 pages.Google ScholarDigital Library
[53] Bajracharya Sushil Krishna, Ngo Trung Chi, Linstead Erik J., Dou Yimeng, Rigor Paul, Baldi Pierre, and Lopes Cristina V.. 2006. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’06). 681–682.Google ScholarDigital Library
[54] Huang Qing and Wu Guoqing. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26 (2019), 705–732.Google ScholarCross Ref
[55] Huang Qing and Wu Huaiguang. 2019. QE-integrating framework based on GitHub knowledge and SVM ranking. Science China Information Sciences 62 (2019), 1–16.Google ScholarCross Ref
[56] Huang Qing, Yang Yang, and Cheng Ming. 2019. Deep learning the semantics of change sequences for query expansion. Software: Practice and Experience 49 (2019), 1600–1617.Google ScholarCross Ref
[57] Huang Qing, Yang Yangrui, Zhan Xue, Wan Hongyan, and Wu Guoqing. 2018. Query expansion based on statistical learning from code changes. Software: Practice and Experience 48 (2018), 1333–1351.Google ScholarCross Ref
[58] Niu Haoran, Keivanloo Iman, and Zou Ying. 2015. Learning to rank code examples for code search engines. Empirical Software Engineering 22 (2015), 259–291.Google ScholarDigital Library
[59] Nguyen Tam The, Vu Phong Minh, and Nguyen Tung Thanh. 2019. Recommendation of exception handling code in mobile app development. arXiv abs/1908.06567 (2019).Google Scholar
[60] Liu Xiaoning, Shen Beijun, Zhong Hao, and Zhu Jiangang. 2016. EXPSOL: Recommending online threads for exception-related bug reports. In Proceedings of the 2016 23rd Asia-Pacific Software Engineering Conference (APSEC’16).25–32.Google ScholarCross Ref

Index Terms

Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse
1. Security and privacy
  1. Software and application security
    1. Software security engineering
2. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering
    2. Software development techniques
      1. Reusability
  2. Software notations and tools
    1. Formal language definitions
      1. Semantics

Recommendations

Studying differentiated code to support smart contract update
Abstract
Smart contracts have received a lot of attention. A smart contract is a program that runs on a blockchain. Some recent studies reveal that most of the smart contracts on the Ethereum blockchain are highly similar. An inexperienced smart contract ...
Read More
Recommending differentiated code to support smart contract update
ICPC '19: Proceedings of the 27th International Conference on Program Comprehension

Blockchain has attracted wide attention. A smart contract is a program that runs on the blockchain, and there is evidence that most of the smart contracts on the Ethereum are highly similar, as they share lots of repetitive code. In this study, we ...
Read More
Aroma: code recommendation via structural code search

Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 32, Issue 6
November 2023
949 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3625557
Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 September 2023
- Online AM: 22 May 2023
- Accepted: 27 April 2023
- Revised: 19 April 2023
- Received: 30 July 2022
Published in tosem Volume 32, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Smart contract
code knowledge graph
knowledge deduction
code recommendation
crowd-scale coding practice checking
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 424
  Total Downloads
- Downloads (Last 12 months)424
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Studying differentiated code to support smart contract update

Recommending differentiated code to support smart contract update

Aroma: code recommendation via structural code search