Abstract
Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.
- [1] . 2008. Bitcoin: A Peer-to-Peer Electronic Cash System. Apple Books.Google Scholar
- [2] . 2014. Ethereum: A Secure Decentralized Generalised Transaction Ledger. Ethereum Project Yellow Paper. Scientific Research.Google Scholar
- [3] . 2020. GIS: Shielding vulnerable smart contracts against attacks. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security.Google ScholarDigital Library
- [4] . 2020. sFuzz: An efficient adaptive fuzzer for Solidity smart contracts. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20).778–788.Google Scholar
- [5] . 2015. A Next Generation Smart Contract and Decentralized Application Platform. Retrieved June 1, 2023 from https://ethereum.org/en/whitepaper/.Google Scholar
- [6] . 2021. Smart contract development: Challenges and opportunities. IEEE Transactions on Software Engineering 47 (2021), 2084–2106.Google ScholarCross Ref
- [7] . 2021. Checking smart contracts with structural code embedding. IEEE Transactions on Software Engineering 47 (2021), 2874–2891.Google ScholarCross Ref
- [8] . 2020. Characterizing code clones in the Ethereum smart contract ecosystem. arXiv abs/1905.00272 (2020).Google Scholar
- [9] . 2020. An exploratory study of smart contracts in the Ethereum blockchain platform. Empirical Software Engineering 25 (2020), 1864–1904.Google ScholarDigital Library
- [10] . 2018. Towards analyzing the complexity landscape of Solidity based Ethereum smart contracts. In Proceedings of the 2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB’18).35–39.Google Scholar
- [11] . 2021. Understanding code reuse in smart contracts. In Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER’21).470–479.Google ScholarCross Ref
- [12] . 2022. SWC-based smart contract development guide research. In Proceedings of the 2022 24th International Conference on Advanced Communication Technology (ICACT’22).138–141.Google ScholarCross Ref
- [13] . 2022. CCGIR: Information retrieval-based code comment generation method for smart contracts. Knowledge-Based Systems 237 (2022), 107858.Google ScholarDigital Library
- [14] . 2020. Graph4Code: A machine interpretable knowledge graph for code. arXiv abs/2002.09440 (2020).Google Scholar
- [15] . 2000. Data mining library reuse patterns using generalized association rules. In Proceedings of the 2000 International Conference on Software Engineering (ICSE’00).167–176.Google Scholar
- [16] . 2020. BugPecker: Locating faulty methods with deep learning on revision graphs. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20).1214–1218.Google Scholar
- [17] . 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3 (2009), 333–389.Google ScholarDigital Library
- [18] . 2018. Deep code search. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE’18).933–944.Google Scholar
- [19] . 2020. CodeBERT: A pre-trained model for programming and natural languages. arXiv abs/2002.08155 (2020).Google Scholar
- [20] . 2016. Making smart contracts smarter. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
- [21] B. Mueller. n.d. Mythril–reversing and bug hunting framework for the Ethereum blockchain. GitHub. Retrieved June 1, 2023 from https://github.com/ConsenSys/mythril.Google Scholar
- [22] . 2018. Securify: Practical security analysis of smart contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
- [23] . 2014. Debugging with the crowd: A debug recommendation system based on StackOverflow. ERCIM News 99 (2014), 26–27.Google Scholar
- [24] . 2021. Ethereum smart contract security research: Survey and future research opportunities. Frontiers of Computer Science 15 (2021), 1–18.Google ScholarDigital Library
- [25] . 2019. Slither: A static analysis framework for smart contracts. In Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB’19).8–15.Google ScholarDigital Library
- [26] . 2022. An empirical investigation on the trade-off between smart contract readability and gas consumption. In Proceedings of the 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC’22).214–224.Google ScholarDigital Library
- [27] . Retrieved June 1, 2023 from https://github.com.Google Scholar
- [28] . 2015. Elasticsearch: The Definitive Guide. O’Reilly Media.Google Scholar
- [29] . 2020. Deep graph matching and searching for semantic code retrieval. ACM Transactions on Knowledge Discovery from Data 15 (2020), 1–21.Google ScholarDigital Library
- [30] . 2022. Code search based on context-aware code translation. In Proceedings of the 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE’22).388–400.Google Scholar
- [31] . 2020. CodeMatcher: Searching code based on sequential semantics of important query words. ACM Transactions on Software Engineering and Methodology 31 (2020), Article 12, 37 pages.Google Scholar
- [32] . 2022. Incorporating code structure and quality in deep code search. Applied Sciences 12, 4 (2022), 2051.Google ScholarCross Ref
- [33] . 2021. On the effectiveness of transfer learning for code search. arXiv abs/2108.05890 (2021).Google Scholar
- [34] . 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. arXiv abs/2102.04664 (2021).Google Scholar
- [35] . 2021. CoSQA: 20,000+ web queries for code search and question answering. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- [36] Michael del Castillo. 2016. The DAO attacked: Code issue leads to 60 million ether theft. CoinDesk. Retrieved June 1, 2023 from https://www.coindesk.com/markets/2016/06/17/the-dao-attacked-code-issue-leads-to-60-million-ether-theft/.Google Scholar
- [37] . 2017. A survey of attacks on Ethereum smart contracts (SoK). In Proceedings of the 6th International Conference on Principles of Security and Trust (POST’17). 164–186.Google ScholarDigital Library
- [38] . 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33 2 (1977), 363–74.Google Scholar
- [39] . 1996. Elements of Survey Sampling. Texts in the Mathematical Sciences, Vol. 15. Springer.Google Scholar
- [40] . 1947. The generalization of student’s problems when several different population variances are involved. Biometrika 34 (1947), 28–35.Google ScholarCross Ref
- [41] . 2018. Improving API caveats accessibility by mining API caveats knowledge graph. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME’18).183–193.Google ScholarCross Ref
- [42] . 2019. Know-how in programming tasks: From textual tutorials to task-oriented knowledge graph. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME’19).257–268.Google ScholarCross Ref
- [43] . 2019. Discovering, explaining and summarizing controversial discussions in community Q&A sites. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19).151–162.Google ScholarDigital Library
- [44] . 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google Scholar
- [45] . 2020. Generating concept based API element comparison using a knowledge graph. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20).834–845.Google Scholar
- [46] . 2013. Automated API property inference techniques. IEEE Transactions on Software Engineering 39 (2013), 613–637.Google ScholarDigital Library
- [47] . 2010. Detecting missing method calls in object-oriented software. In Proceedings of the European Conference on Object-Oriented Programming.Google Scholar
- [48] . 2009. Learning from examples to improve code completion systems. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’09). 213–222.Google ScholarDigital Library
- [49] . 2015. CloCom: Mining existing source code for automatic comment generation. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15).380–389.Google ScholarCross Ref
- [50] . 2019. A learning-based approach for automatic construction of domain glossary from source code and documentation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google Scholar
- [51] . 2017. Improving software text retrieval using conceptual knowledge in source code. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17).123–134.Google ScholarCross Ref
- [52] . 2011. How well do search engines support code retrieval on the web? ACM Transactions on Software Engineering and Methodology 21 (2011), Article 4, 25 pages.Google ScholarDigital Library
- [53] . 2006. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’06). 681–682.Google ScholarDigital Library
- [54] . 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26 (2019), 705–732.Google ScholarCross Ref
- [55] . 2019. QE-integrating framework based on GitHub knowledge and SVM ranking. Science China Information Sciences 62 (2019), 1–16.Google ScholarCross Ref
- [56] . 2019. Deep learning the semantics of change sequences for query expansion. Software: Practice and Experience 49 (2019), 1600–1617.Google ScholarCross Ref
- [57] . 2018. Query expansion based on statistical learning from code changes. Software: Practice and Experience 48 (2018), 1333–1351.Google ScholarCross Ref
- [58] . 2015. Learning to rank code examples for code search engines. Empirical Software Engineering 22 (2015), 259–291.Google ScholarDigital Library
- [59] . 2019. Recommendation of exception handling code in mobile app development. arXiv abs/1908.06567 (2019).Google Scholar
- [60] . 2016. EXPSOL: Recommending online threads for exception-related bug reports. In Proceedings of the 2016 23rd Asia-Pacific Software Engineering Conference (APSEC’16).25–32.Google ScholarCross Ref
Index Terms
- Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse
Recommendations
Studying differentiated code to support smart contract update
AbstractSmart contracts have received a lot of attention. A smart contract is a program that runs on a blockchain. Some recent studies reveal that most of the smart contracts on the Ethereum blockchain are highly similar. An inexperienced smart contract ...
Recommending differentiated code to support smart contract update
ICPC '19: Proceedings of the 27th International Conference on Program ComprehensionBlockchain has attracted wide attention. A smart contract is a program that runs on the blockchain, and there is evidence that most of the smart contracts on the Ethereum are highly similar, as they share lots of repetitive code. In this study, we ...
Aroma: code recommendation via structural code search
Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets ...
Comments