How Similar Are Smart Contracts on the Ethereum?

Jia, Nan; Kong, Queping; Huang, Haiping

doi:10.1007/978-981-15-9213-3_32

How Similar Are Smart Contracts on the Ethereum?

Nan Jia⁹,
Queping Kong¹¹ &
Haiping Huang¹⁰

Conference paper
First Online: 12 November 2020

2433 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1267))

Abstract

Ethereum is a programmable platform that allows everyone to deploy and access the smart contracts on it. Such flexibility can lead everyone to browse or reuse the source code of the existing smart contracts on the Ethereum. In this paper, to characterize the code clone practice of the smart contract, we present a large-scale study on the smart contracts coming from the Ethereum. We firstly collect more than 700,000 open-source smart contracts, and then we employ a highly effective approach (i.e., Locality-Sensitive Hashing, LSH) to cluster the similar smart contracts. At last, we conduct a qualitative analysis to characterize the clone practice of the smart contract, and further analyze the reason why smart contracts are similar. Our analysis revealed that over 96% of the smart contracts can found similar contracts, which indicates that the smart contracts on the Ethereum are highly homogeneous.

Download conference paper PDF

1 Introduction

Blockchain serves as a public ledger and transactions stored in blockchain are nearly impossible to tamper [1, 2]. Its purpose is to solve the credit problems of both sides of the transaction in a decentralized environment, which can greatly improve transaction efficiency and reduce costs [3, 4]. Then, blockchain has become a widely used technique to enable decentralized financial and business transactions [5].

As one of the most revolutionary and representative blockchain platforms, Ethereum [6] has attracted a large number of participants, including developers and users, and becomes one of the most active communities in the cryptocurrency world [7]. In Ethereum, developers are allowed to develop their own smart contracts using high-level programming languages such as Solidity for various domains [5, 8,9,10], e.g., finance, game and healthcare.

The smart contract is a program that can be triggered to execute any task when specifically predefined conditions are satisfied [11, 12]. The conditions defined in smart contracts, and the execution of the contracts, are supposed to be trackable and irreversible in such a way that minimizes the need for trusted intermediaries [13, 14]. Due to the creditability of smart contract, more than millions of smart contracts have been deployed on the Ethereum until July 6th, 2019.

Since Ethereum is open platform, everyone can access the smart contracts without any constraints. Then, the source code of the existing smart contracts on the Ethereum can be reused by other developers. Meanwhile, the Ethereum applications are highly domain-specific, and the applications can share similar functionalities within the same domain [8], e.g., ERC20 applications implement the same interface for money transfer and balance inquiry [15]. As a result, the nature of Ethereum has provided convenience to create contract clones i.e., copying code from other available contracts.

The impact of contract clone is profound. Since many smart contracts are suffering from serious vulnerabilities, the copy-paste vulnerabilities would be inherited by the cloned contracts [15]. In this paper, we present a large-scale study to characterize the code clone of Ethereum smart contracts. Firstly, we collect a dataset from Ethereum that contains more than 700,000 open source smart contracts, which are deployed from July 30th, 2015 to July 6th, 2019. Then, we employ the Locality-Sensitive Hashing (i.e., LSH) [16] to quickly identify the similar smart contracts from the large-scale dataset. Specifically, we extract the syntactic tokens from the smart contracts in the dataset, and transform contracts into vector representation according to the syntactic tokens. LSH is employed to cluster the similar smart contracts based on the distances between the vectors.

We conduct quantitative analysis and qualitative analysis to characterize the clone practice of the smart contract. Fisrtly, our quantitative analysis reveals that over 96% of the smart contracts have similar contracts on the Ethereum, and this result suggests that the smart contracts on the Ethereum are highly homogeneous. Secondly, we further analyze the reason why smart contracts are similar. Some interesting reasons such as implementing the same “interface” have been found in our qualitative analysis.

The rest of the paper is organized as following. The background about blockchain and smart contract is introduced in Sect. 2. The data collection is presented in Sect. 3. Section 4 describes the LSH methodology we used to cluster the similar smart contracts. The setups and results of experiment are discussed in Sect. 5. We discuss the related works in Sect. 6. Section 7 presents the threats to validity. Section 8 summarizes our approach and outlines directions of future work.

2 Background

2.1 BlockChain and Smart Contract

Blockchain was first introduced by Satoshi Nakamoto in 2008 as the underlying data structure of Bitcoin [1]. As its name suggested, a blockchain is a chain of blocks, in which each block contains a number of transactions which are hashed in a Merkle Tree [17]. By storing the hash value of the previous block, each block refers to its previous block, forming a chain structure. Together with peer-to-peer communication, consensus between miners such as Proof of Work (PoW), asymmetric encryption and digital signature, a blockchain system can provide a temper-proof and immutable value-transfer network without relying on a trusted third party [17]. Hence, many people think blockchain tends to be another technology revaluation of the Internet, due to its unique security, trustworthiness and reliability [18].

In order to make blockchain suitable for more scenarios other than cryptocurrency, Ethereum, a blockchain platform, introduced smart contract which can be constructed with turing-complete programming languages such as Solidity (Solidity^{Footnote 1} is a contract-oriented, high-level language whose syntax is similar to that of JavaScript). Smart contracts are self-executing contracts where the terms of the agreement between multiple parties are directly written into lines of code [19]. The code and the agreements contained therein exist across a blockchain network. By developing different types of smart contracts, Ethereum can facilitate the construction and execution of complex applications such as financial exchanges, game, social and insurance contracts on the blockchain.

Any user can create a smart contract by publishing a transaction to a blockchain. Once a smart contract’s program code has been deployed on the blockchain, it cannot be changed [20, 21]. Therefore, even when the same contract creators may want to evolve the contract code and create new versions of the smart contracts, the older versions are still kept visible in the blockchain. As a result, the smart contract is similar with its evolving ones, and a code clone case exists on the Ethereum [22, 23].

2.2 Locality-Sensitive Hashing

The Locality-Sensitive Hashing (LSH) algorithm was proposed by Aristides Gionis in 1999 [16]. The basic idea behind LSH is that: if two instances are similar in the original data space, then they have a high similarity after hashing conversion. On the contrary, if they are not similar, they should not be similar after hashing conversion. If a hash function h(.) satisfies these two conditions, it is called a locality-sensitive hashing function. Mathematically, h(.) should satisfy formulas (1) and (2):

$$\begin{aligned} if~d(x,y) \le d_1,~then~P(h(x)=h(y)) \ge p_1 \end{aligned}$$

(1)

$$\begin{aligned} if~d(x,y)\ge d_2,~then~P(h(x)=h(y)) \le p_2 \end{aligned}$$

(2)

where x and y are two instances in the data space, d(x, y) represents the distance between x and y. h(x) represents the hashing value of x. P(x) represents the probability of event x, and $(d_1, d_2, p_1, p_2)$ is a set of thresholds. If both formulas (1) and (2) are satisfied, the locally sensitive hash function h(.) is sensitive for thresholds $(d_1, d_2, p_1, p_2)$.

3 Data Collection

Smart contract can be divided into open source and closed source categories. Open source contracts allow any user to download their source code from the Ethereum while closed source contracts only provide bytecode for users. To study why smart contracts are similar, we need to collect the source code of the smart contracts for further analysis. Therefore, we only collect the open source smart contracts as our dataset. We download the smart contracts from the Etherscan^{Footnote 2}, which is an blockchain browser supported by Ethereum, and it provides the real-time transaction query.

Table 1 shows the statistical characteristics of the collected dataset. We collected 146,402 solidity files from Etherscan. There are a total of 703,565 smart contracts, which are stored in a local repository. On average, each smart contract involves around 4.8 individual contracts (ranges from 0 to 36), 20 functions, and 202 lines of code. And these smart contracts deployed on the Ehtereum mainnet from July 30th, 2015 to July 6th, 2019.

Table 1. Collected data

Full size table

An Ethereum smart contract can be created either by a user, or by another existing contract [6, 7]. Then, we call them user-created contract and contract-created contract to distinguish these two types of contracts. Since we try to study the code clone practice in the two types of contracts, we distinguish the two types of contracts according to the address of the contract creator. If an address of the contract creator points to another contract, then this contract is a contract-created one, otherwise, it is a user-created contract. Table 2 shows the statistical characteristics of the user-created and contract-created contracts.

Table 2. User-created and contract-created contracts

Full size table

4 Clustering Similar Contracts

In this section, we employ LSH method to cluster the similar smart contract. To measure the similarity of smart contracts, the direct way is to compare the code syntactic similarity [24,25,26,27] between the smart contracts [2]. Therefore, we firstly extract the code syntax from the smart contracts. Then, a smart contract is transformed into a high-dimensional vector representation based on its syntactic tokens. At last, LSH is employed to map the high-dimensional vectors to the clusters in a low-dimensional space. The smart contracts in the same cluster is similar.

4.1 Code Syntactic Tokenizing

To obtain code syntax of a smart contract, we should identify the syntax of each code line containing in the smart contract. We employ the algorithm proposed in our previous study [2] to identify the main syntax tokens of smart contracts, such as MappingExpression, ModifierDeclaration, IfStatement, AssignmentExpression, ReturnStatement, payable, Money. Our algorithm parses abstract syntax tree to obtain the syntactic tokens of each code line. It’s worth noting that a single code line may contain multiple types of syntax tokens. For example, a if code line “if(_to == address(this))” contains three types of syntax tokens: IfStatement, BinaryExpression, and CallExpression.

For all the user-created and contract-created contracts in our dataset, we extract the syntax tokens at code line level. Then, the syntax tokens containing in each code line is a token set, and we regard it as a token unit. For example, the token unit of code line “if(_to == address(this))” is ${<}{} IfStatement , BinaryExpression , and CallExpression {>}$. Then, the token units contained by a contract is the features that can be used to measure the similarity between the contracts.

Similar to the bag of words model [28], we can build a vector for each smart contract according to the token units its contained. Then, for all the contracts, a feature matrix is built. Two vector matrices based on the token units is built for the user-created and contract-created smart contracts, respectively. As Figure 1 shows, there are z user-created smart contracts, and we identify the token units contained in each contract. Then, we use matrix M to represent the token units that each contract contains. If a contract contains a certain token unit, it is labeled as 1 in the matrix. The matrix M is $z \times m$, and m is the number of the distinct token units.

4.2 LSH Clustering

We can use the LSH method to cluster the similar contract based on the feature matrix M. Specifically, we firstly randomly generate a zero-one matrix V with $m \times r$ dimensions. Then, we multiply matrices M and V, and obtain third matrix H. Each element H(i.j) in H represent the product between the feature vector of a smart contract $c_i$ and a random zero-one vector. If H(i.j) is greater than a threshold t, the locality-sensitive hashing value $h(c_i)$ of the smart contract is 1. Otherwise, $h(c_i)$ is 0. Repeating the previous steps r times, we can get r locality-sensitive hashing values. If we splice these values together, and we can get a hashing sequence consisting of 0 and 1 with r length for smart contract $c_i$, i.e., $H(c_i)=(h^1(c_i),...,h^r(c_i))$. Figure 2 shows the process of applying LSH to the feature matrix.

According to the locality-sensitive hashing value $H(c_i)$ of smart contract $c_i$, we can map the smart contract to a bucket $[b_1...,b_k]$, where $[b_1...,b_k]$ is the existing buckets [16], and k is the number of buckets. As a result, the similar contracts are mapped to the same buckets, and these contracts in the same buckets are likely to involve code clone. We regard the smart contracts in the buckets as a cluster. Figure 3 shows the process of mapping smart contracts to the different buckets.

5 Results Analysis

When we apply LSH to cluster the similar smart contracts, the parameters t is 3 and r is 13. We cluster the similar smart contracts on the user-created and contract-created datasets, respectively.

5.1 Quantitative Analysis

Our observations from Table 3 show that LSH generates 1,230 clusters for user-created smart contracts. There are 288 unique contracts, which means they do not belong to any of the clusters. The proportion of the unique contracts is 4% (i.e., 288/684,029). This result suggests that 96% of user-created smart contracts can find at least one similar contracts in the dataset. For the contract-created contracts, there are 285 clusters created by LSH, and 93 contract-created smart contracts do not belong to any of the clusters, and this means that 99.5% (i.e., 19,443/19,536) of contract-created smart contracts can find at least one similar contracts in the dataset. Therefore, we can conclude that the code clone is a common practice in both user-created and contract-created smart contracts, and the result also reveals the homogeneity nature of the smart contract on the Ethereum.

Table 3. Clusters for user-created and contract-created contracts

Full size table

Figure 4 shows the top 100 clusters for user-created contracts. We can observe that the biggest cluster contains 22,9224 contracts. In general, the clusters of user-created contracts follows a long-tail distribution considering there are 1,230 clusters in total. For all the user-created contracts, the top 20 clusters account for 87% of the contracts. The results suggest that the distribution of clusters follows a typical Pareto principle rule. Therefore, many smart contracts are concentrated in same cluster, and these contracts have similar code.

Figure 5 shows the top 100 clusters for contract-created contracts. The biggest cluster contains 5,994 contracts. The distribution of clusters also follows a typical Pareto principle rule, i.e., the top 20 clusters account for 90% of the smart contracts.

5.2 Qualitative Analysis

Since all the collected contracts are open source, we manually check these clusters and identify them according to the source code of the smart contracts. The largest clusters mainly fall into the following categories:

ERC Related Clusters. ERC related contracts take the majority of popular clusters. ERC standard^{Footnote 3} includes ERC-20, ERC-721, ERC-825, ERC-223. For example, to achieve the “issue currency”, the corresponding smart contracts should implement the “interface” of ERC20. If a contract want to implement the ERC20 interface, it needs to implement the 6 functions, i.e., totalSupply(), balanceOf(), transfer(), transferFrom(), approve(), allowance(). As a result, all the smart contracts implements the ERC20 interface have similar source code. The famous tokens implementing the ERC20 interface include: Huobi Token^{Footnote 4}, FTX Token^{Footnote 5}, USD Coin^{Footnote 6}, etc.

Gambling Related Clusters. Many clusters are related to the gambling contracts. There are many gambling contracts on the Ethereum, and these gambling contracts often implement very simple and similar logic. Then, developers can directly copied and pasted the original open-source contracts to create similar gambling contract. As a result, the gambling contracts can be clustered together.

Other Clusters. We also observe other types of clusters, such as, game related cluster, social related cluster. These clusters have a strong industry orientation. The contracts belonging to the same industry are more likely to cluster together. These results suggest that the smart contracts on the Ethereum are highly homogeneous.

6 Related Work

The clone detection for smart contract can be divided into static [6, 7, 13, 29] and dynamic ways [8, 30]. He et al. [7] revealed that a large number of smart contracts are similar on Ethereum, which suggests that the smart contract is highly homogeneous. Our study is different from them in the clustering approaches. He et al. clustered any contract pair whose similarity score is greater than 0.7. Then, they build a weighted undirected graph by treating each contract as a node. At last, they traverse the graph and consider each connected component as a cluster. Kiffer et al. [6] found the smart contracts on Ethereum exhibit extensive code reuse. They firstly compute the frequency of the 5-grams in the opcode sequence of a contract. Then, each contract corresponds to vector of 5-grams. The similarity of two contracts can be computed by the cosine similarity of two vectors. Gao et al. [13, 29] utilized code embedding technique to encode the code elements in a smart contract, and each code element is converted into numerical vector with preserving the code syntactic and semantic information. Then, the code embeddings for any code fragment is summing up all the vectors of the possible tokens’ embeddings within it. At last, the similarity between two fragments can be computed by the Euclidean distance between the vectors.

In addition, Liu et al. employed a dynamic approach to detect the code clone in smart contracts [8, 30]. They proposed ECLONE to detect semantic clones for smart contracts. ECLONE extracts a set of critical semantic properties generated from symbolic transaction of a smart contract, and then these semantic properties will be normalized into numeric vector. At last, the clone detection problem is modeled as a similarity computation of the numeric vectors. In summary, our approach is different from the existing studies. We extract the code syntactic tokens from each smart contract, and employ the LSH method to cluster the similar smart contracts and further analysis the code clone in smart contracts.

7 Conclusion and Future Work

Code clone is an essential and vital part of modern software development. Although studying the code clone has a long research history, we are the first to employ the LSH technique to analyze the similarity of the user-created and contract-created contracts, respectively. To evaluate our approach, we collect a datasets that contains more than 700,000 smart contract coming from Ethereum. The quantitative analysis shows that over 96% of the smart contracts are similar. The qualitative analysis reveals that the majority of popular clusters are ERC related contracts. The future research agenda mainly focus on extending the scale of the dataset. Firstly, we will take more open source smart contracts into consideration. Secondly, we will try to identify the code clone in the closed source smart contracts.

Notes

1.
http://solidity.readthedocs.io/en/develop.
2.
https://etherscan.io.
3.
A standard interface for tokens. https://eips.ethereum.org/EIPS/eip-20.
4.
https://coinmarketcap.com/currencies/huobi-token/.
5.
https://coinmarketcap.com/currencies/ftx-token/.
6.
https://coinmarketcap.com/currencies/usd-coin/.

References

Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Cryptography Mailing list, March 2009. https://metzdowd.com
Huang, Y., Kong, Q., Jia, N., Chen, X., Zheng, Z.: Recommending differentiated code to support smart contract update. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 260–270, May 2019
Google Scholar
Dinh, T.T.A., Liu, R., Zhang, M., Chen, G., Ooi, B.C., Wang, J.: Untangling blockchain: a data processing view of blockchain systems. IEEE Trans. Knowl. Data Eng. 30(7), 1366–1385 (2018)
Article Google Scholar
Zheng, P., Zheng, Z., Luo, X., Chen, X., Liu, X.: A detailed and real-time performance monitoring framework for blockchain systems. In: International Conference on Software Engineering Software Engineering in Practice - ICSE-SEIP 2018, pp. 134–143, May 2018
Google Scholar
Norta, A.: Creation of smart-contracting collaborations for decentralized autonomous organizations. In: Matulevičius, R., Dumas, M. (eds.) BIR 2015. LNBIP, vol. 229, pp. 3–17. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21915-8_1
Chapter Google Scholar
Kiffer, L., Levin, D., Mislove, A.: Analyzing ethereum’s contract topology. In: Proceedings of the Internet Measurement Conference 2018, pp. 494–499 (2018)
Google Scholar
He, N., Wu, L., Wang, H., Guo, Y., Jiang, X.: Characterizing code clones in the ethereum smart contract ecosystem. arXiv preprint arXiv:1905.00272 (2019)
Liu, H., Yang, Z., Jiang, Y., Zhao, W., Sun, J.: Enabling clone detection for ethereum via smart contract birthmarks. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 105–115. IEEE (2019)
Google Scholar
Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the internet of things. IEEE Access 4, 2292–2303 (2016)
Article Google Scholar
Juels, A., Kosba, A., Shi, E.: The ring of gyges: investigating the future of criminal smart contracts. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS 2016. ACM, New York, pp. 283–295 (2016). http://doi.acm.org/10.1145/2976749.2978362
Nick, S.: The idea of smart contracts (1997). http://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/Literature/LOTwinterschool2006/szabo.best.vwh.net/idea.html. Accessed 2008
Chen, T., et al.: Understanding ethereum via graph analysis. In: IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, pp. 1484–1492, April 2018
Google Scholar
Gao, Z., Jiang, L., Xia, X., Lo, D., Grundy, J.: Checking smart contracts with structural code embedding. IEEE Trans. Softw. Eng. (2020)
Google Scholar
Porru, S., Pinna, A., Marchesi, M., Tonelli, R.: Blockchain-oriented software engineering: challenges and new directions. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 169–171, May 2017
Google Scholar
Somin, S., Gordon, G., Altshuler, Y.: Network analysis of ERC20 tokens trading on ethereum blockchain. In: Morales, A.J., Gershenson, C., Braha, D., Minai, A.A., Bar-Yam, Y. (eds.) ICCS 2018. SPC, pp. 439–450. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96661-8_45
Chapter Google Scholar
Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. Vldb 99(6), 518–529 (1999)
Google Scholar
Swan, M.: Blockchain: Blueprint for a New Economy, 1st edn. O’Reilly Media Inc., Newton (2015)
Google Scholar
Wang, B., Chen, S., Yao, L., Liu, B., Xu, X., Zhu, L.: A simulation approach for studying behavior and quality of blockchain networks. In: Chen, S., Wang, H., Zhang, L.-J. (eds.) ICBC 2018. LNCS, vol. 10974, pp. 18–31. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94478-4_2
Chapter Google Scholar
Parizi, R.M., Amritraj, Dehghantanha, A.: Smart contract programming languages on blockchains: an empirical evaluation of usability and security. In: Chen, S., Wang, H., Zhang, L.J. (eds.) ICBC 2018. LNCS, vol. 10974, pp. 75–91. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94478-4_6
Chapter Google Scholar
Huang, Y., Chen, X., Zou, Q., Luo, X.: A probabilistic neural network-based approach for related software changes detection. In: 2014 21st Asia-Pacific Software Engineering Conference, vol. 1, pp. 279–286, December 2014
Google Scholar
Kosba, A., Miller, A., Shi, E., Wen, Z., Papamanthou, C.: Hawk: the blockchain model of cryptography and privacy-preserving smart contracts. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 839–858, May 2016
Google Scholar
Bartoletti, M., Carta, S., Cimoli, T., Saia, R.: Dissecting Ponzi schemes on Ethereum: identification, analysis, and impact. ArXiv e-prints, March 2017
Google Scholar
Huang, Y., Jia, N., Shu, J., Hu, X., Chen, X., Zhou, Q.: Does your code need comment? Softw.: Pract. Exp. 50(3), 227–245 (2020). https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2772
Google Scholar
Huang, Y., Zheng, Q., Chen, X., Xiong, Y., Liu, Z., Luo, X.: Mining version control system for automatically generating commit comment. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 414–423, November 2017
Google Scholar
Huang, Y., Chen, X., Liu, Z., Luo, X., Zheng, Z.: Using discriminative feature in software entities for relevance identification of code changes. J. Softw.: Evol. Process 29(7), e1859 (2017). e1859 smr.1859. https://onlinelibrary.wiley.com/doi/abs/10.1002/smr.1859
Google Scholar
Huang, Y., Jia, N., Chen, X., Hong, K., Zheng, Z.: Salient-class location: help developers understand code change in code review. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2018, pp. 770–774. ACM, New York (2018). http://doi.acm.org/10.1145/3236024.3264841
Huang, Y., Hu, X., Jia, N., Chen, X., Xiong, Y., Zheng, Z.: Learning code context information to predict comment locations. IEEE Trans. Reliab. 1–18 (2019)
Google Scholar
Jiang, H., Xiao, Y., Wang, W.: English explaining a bag of words with hierarchical conceptual labels. World Wide Web (2020). Bag-of-words models; Concept graph; De-noising algorithm; Explicit semantics; High-accuracy;Knowledge base; NAtural language processing; Rose tree. http://dx.doi.org/10.1007/s11280-019-00752-3
Gao, Z., Jayasundara, V., Jiang, L., Xia, X., Lo, D., Grundy, J.: SmartEmbed: a tool for clone and bug detection in smart contracts through structural code embedding. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 394–397. IEEE (2019)
Google Scholar
Liu, H., Yang, Z., Liu, C., Jiang, Y., Zhao, W., Sun, J.: EClone: detect semantic clones in ethereum via symbolic transaction sketch. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 900–903 (2018)
Google Scholar

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China (61902105), the Characteristic Innovation Project of Guangdong Province Office of Education (2019GKTSCX129).

Author information

Authors and Affiliations

School of Management Science and Engineering, Hebei GEO University, Shijiazhuang, 050031, China
Nan Jia
Department of Basic Courses, Zhaoqing Medical College, Zhaoqing, 526020, China
Haiping Huang
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Queping Kong

Authors

Nan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Queping Kong
View author publications
You can also search for this author in PubMed Google Scholar
Haiping Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiping Huang .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Zibin Zheng
Macau University of Science and Technology, Macau, China
Hong-Ning Dai
Kunming University of Science and Technology, Kunming, China
Xiaodong Fu
Dali University, Dali, China
Benhui Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, N., Kong, Q., Huang, H. (2020). How Similar Are Smart Contracts on the Ethereum?. In: Zheng, Z., Dai, HN., Fu, X., Chen, B. (eds) Blockchain and Trustworthy Systems. BlockSys 2020. Communications in Computer and Information Science, vol 1267. Springer, Singapore. https://doi.org/10.1007/978-981-15-9213-3_32

Download citation

DOI: https://doi.org/10.1007/978-981-15-9213-3_32
Published: 12 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9212-6
Online ISBN: 978-981-15-9213-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

1 Introduction

2 Background

2.1 BlockChain and Smart Contract

2.2 Locality-Sensitive Hashing

3 Data Collection

4 Clustering Similar Contracts

4.1 Code Syntactic Tokenizing

4.2 LSH Clustering

5 Results Analysis

5.1 Quantitative Analysis

5.2 Qualitative Analysis

6 Related Work

7 Conclusion and Future Work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation