skip to main content
10.1145/3704522.3704554acmotherconferencesArticle/Chapter ViewFull TextPublication PagesnsyssConference Proceedingsconference-collections
short-paper
Open access

Short Paper: Storage Reduction of Private Blockchain with Sharding and Community Based Clustering

Published: 03 January 2025 Publication History

Abstract

Blockchain has gained popularity in software industry for its transparency, security and privacy guarantees. The scalability issues in terms of latency and throughput is addressed with better consensus and communication protocols. But the inherent problem of high storage requirement for append only blocks and high computing needs are also the barriers to large scale adoption of blockchain in networks with low end devices. In this work, we propose a sharding configuration for popular Hyperledger Fabric(HLF) blockchain and show the storage reduction utilizing the network community detection. First our implementation derive channel based shard characteristics in terms of different parameters. We compare the practical storage rate computed by our implementation with state-of-the art. Finally, we show the storage reduction in three datasets applying our novel approach of community based clustering for blockchain sharding. Overall, our work paves way for running HLF on a small scale, reducing cost of running and adding compatibility in various use cases.

1 Introduction

Blockchain is like linked list of blocks. New blocks are linked to previous ones using special hashing mechanism based on information in the block. Blocks are built from transactions. Each transaction(TX) is considered immutable as the blocks and chain of blocks are immutable for distributed consensus mechanism. With growing participation in network the required storage is rising at good pace. The Bitcoin network [18] has grown to 600GB+ and for Ethereum [21] its 1.1TB+ as of September 2024. The new nodes wanting to join in these networks need to ensure far more capacity keeping future growth in mind.
For addressing the storage problem two broad types of techniques are applied in current literature. One involves forming a cluster of nodes and trying to process transactions within a single cluster and thus minimizing inter-cluster communications [5], [9]. Some of these techniques create data clusters along with node clusters. In [1, 13] such a technique is applied. The other type of solutions approach the problem with the idea of dividing the data in the blockchain such as [23].
Distributed data implementations in database systems inspired to bring similar approach into blockchain. Numerous approaches like weighted models [6, 14], off-chain [7], on-chain [4], blockchain sharding [24] are proposed in recent research to improve the performance of the blockchain.
In [23] authors dynamically divide the blockchain structures into segments. They ensure that each segment is stored in at least one reliable keeper by classifying the nodes using the jury hypothesis. However, their approach goes against permissioned blockchain with PoW overhead. The complexity in achieving a fully working solution is further validated by Ethereum’s [21] progress of implementing shard-chains called danksharding (still several years away according to them).
It is evident that, the high storage problem should be solved not only for public blockchain but also for private ones. Even with moderate TPS IBM blockchain based on HLF may produce Terabytes of data yearly [12]. This makes the topic more interesting for HLF.
As a groundwork to overcome the storage problem in HLF, we propose a novel idea. In our approach we propose to keep overlapping copies(s) of a ledger in a cluster. Moreover, we propose to form the cluster based on the network community formed by the participating nodes. Finally, the reduced storage usage can be obtained for this two level of minimization in ledger copies, first inner cluster using overlapping ledger copies and then in overall network by applying community to cluster mapping.
We have devised the workflow for overlapping ledger copy generation after defined number of blocks(shard limit-SL) are added. To do this, we have customized the peer source code of HLF. We made use of the HLF Java SDK with shell scripts. Then we release the load to custom HLF network with overlapping copies of ledger(channel) among the peers to take storage measures. The practical size of shards are computed for the combination of shard limit, transaction size and transaction per block. Then the comparison of storage rate (% storage taken compared to all nodes having same ledger copies) in our practical calculation with related works is presented. We further formulate the equation for community to cluster mapping and show results for three network datasets. In first step, the network is partitioned into cluster based on detected community. We have applied InfoMap[8] community detection on three datasets to show the number of clusters that can be formed using the size of largest community and number of detected community. The benefit of community to cluster mapping is less cross shard transactions. Secondly, inside the cluster we store the overlapping copies of ledger to reduce storage further. When we do this inside a cluster, it ensures reduced number of messaging in a carefully chosen network. We analyze the characteristics of overlapping shard sizes based on different shard limit, transaction size and transaction per block combination. These provides insight into the feasibility of operating on non-compatible devices, e.g. mobile and IoT devices, in our approach. Rest of the paper we discuss related works for HLF and sharding, describe essential background of HLF and our methodology, our implementation and experimental setup and finally present the results.

2 Related Work

HLF[2] is by far the most adopted private blockchain [20]. [20] compares permissioned blockchains to explain the trade-offs for community activities, performance, scalability, privacy and adoption in industry. HLF shows generic superiority on most of the aspects. Recent works related to HLF is on application of blockchain like [10] or on performance and scalability [1, 5, 22].
[5] proposes three improvements on HLF - AHL(Attested Hyperledger), AHL+ and AHLR(AHL Relay). They have presented a shard formation protocol on top of AHL+ achieving 3k+ TPS. Small trusted logs are maintained inside the trusted execution environment (TEE) to avoid tampering in AHL. Unlike HLF, AHL keeps different logs for different consensus messages. AHL+ uses two separate channel(one in HLF) for consensus messages and request messages to delay overflow and reduce drop of messages. AHLR only optimizes communication in case of no view change. Storage scaling is not covered.
SharPer [1] describes how DAG of blocks can form shards with clusters. Here, cross-shard transactions are duplicated among different shards. All other transactions belong to only specific shard. However, the storage analysis(understandably reduction is there) is not covered by the authors as the focus is only on the throughput and latency for their work.
In [17], a simulation analysis show sharding and distribution of ledger copies in a cluster with fixed number of nodes. The block data are stored overlapped on some nodes in the cluster. Each block is guaranteed to have some copies in a cluster. Data integrity will be maintained if number of node failures in a single cluster is within threshold. In SharPer and RingOverlap the DAG or Overlapped ledger can themselves be large and ever growing. This makes it non-feasible to operate in Mobile Node or IoT Node in long run. With this in mind, we focus on the storage analysis in our approach for individual nodes and overall network with limited sized shards.

3 Background and Methodology

HLF offers an organizational structure and collaboration framework among participants in its modular design. The design allows interaction among organization’s peers and clients application using channel to address real world scenarios. The details of every components is available in Hyperledger documentation and source codes are available in Github [2].
As Figure 1 shows, the basic motivation to divide the ledger into shard is to enable dynamic membership of several peers in a shard and thus the whole chain. In HLF, the permission boundary is a channel which represents a private ledger inside the network with defined and agreed membership of participants. We make use of the fact that, the joined members agreed on the specifics of the channel. So spawning another channel with the same criteria is straightforward. We create a new channel(shard) when number of block (shard limit(SL)) cross preset value. To show the change in channel membership, we choose a sample network similar to test network provided by HLF. This network is very representative for any other scenario. Figure 1(a) shows our sample network with three peers joining in the channel from one organization Org1. If the ledger size is S then the total space occupied would be 3*S. Now we present our change to the network in Fig. 1(c) where we divide the ledger in three channels and join 2 peers in each of the channel so that each channel has a copy of it in two of the joined peers(i.e overlapping shard, s=2). The fault tolerance(FT) could be reduced like Fig. 1(b) in lieu of restricting peers from option to join specific shards as needed unlike in Fig. 1(c).
Figure 1:
Figure 1: Sharded vs. Non-sharded
Now, we move to storage rate(Rv) calculation where the original ledger of size S is divided into ns parts leaving the shard size, Sc = S/ns in ideal case. In actual implementation each shard will have some overhead making Sc > S/ns. The overlapping shard number s implies that we have s copies in cluster with m nodes. So the general formula of Rv will be-
\begin{equation} R_v = \frac{n_s \times s \times S_c}{m \times S} \times 100\% \end{equation}
(1)
For ideal case,
\begin{equation} R_i = \frac{s}{m} \times 100\% \end{equation}
(2)
Equation 1 is applicable for inner cluster sharding of HLF. Note that, increasing s increases the reliability of the system.
It is common to set specific cluster size and formation based on geographic attribute to form cluster. We add novelty in this cluster formation utilizing network community detection. Communities have more intra links than inter community links yielding predictable cross cluster transactions and better ledger partitioning. This is described in section 3.3. Like multi-channel Fabric [2, 3], channels serve as shards with efficient processing of intra-shard transactions. On the other hand, cross shard transactions require either a trusted channel among the participants or a 2PC like atomic commit protocol [3]. In 2PC, additional cost of 4n messages and 2n + 2 forced log writes is added for n participants. This overhead can be minimized using our idea of community to cluster mapping in applicable network as discussed in 3.3.1.

3.1 Flowchart and Key Functions

Figure 2:
Figure 2: Flowchart of Sharding
Figure 3:
Figure 3: Shard creation flow
In Fig. 2, we have outlined basic steps for sharding HLF keeping state of the blockchain configuration in mind. We can identify below major tasks in our experiment as described below-
Shard Creation: Shard creation is performed when the block limit is crossed for a shard. In this case, the participating peers need to rejoin with the same configurations in a new shard as shown in Fig. 3(a). Here, initial shards are created at network startup. In Fig. 3(b) a generic view of shard creation over time is presented. We have used HLF Java SDK with REST API endpoint in our implementation to achieve shard generation functionality. Since we limit the scope to storage analysis at this moment we skipped a rigorous implementation including consensus on shard creation and 2PC protocol for multichannel.
Chaincode Deployment in New Shard Channel: The chaincodes must be programmatically deployed for new shard channels for transaction processing. Adding a chaincode to the channel is a multi-step process where the participating peers will interact to make it ready to execute. There are two options to deploy chaincode for channel starting from HLF V2.0- (1) Embedded Chaincode (2) Chaincode as External Service. For this experiment we used the embedded approach for simplicity. This is implemented using several shell scripts and HLF SDK.
Configuration change: The configuration change parameters are listed and yet to be implemented. The realistic scenario is to experience cluster and network configuration changes (like number of peers, organizations, chaincode, FT etc) along with shard size and transaction size. The constraints on preceding parameters can be set to maintain a balanced trade-off of system reliability and FT to achieve desired storage reduction and compatibility for target devices.

3.2 Algorithm for Transaction, Shard Generation

As in 3.1 the full workflow is described for shard generation and configuration change. We limit our scope of experimentation to transaction Algorithm 1 only.

3.3 Community Based Clustering

To the best of our knowledge, we are the first to present the idea of community to network cluster mapping for blockchain. We discuss storage reduction measures for three datasets- email-Eu-core temporal network [19], AS-733 [16], Bitcoin(BTC) OTC network [15]. The resulting community graphs using InfoMap[8] algorithm is presented in Fig. 4. Here, each node is a community and edges represent inter community links. As depicted, the applicability of a community to network cluster mapping is promising.
Figure 4:
Figure 4: Community Visualization using InfoMap

3.3.1 Integrating HLF Sharding with Community Based Clustering.

In this section, we point out the feasibility and approach for integrating the community based cluster (Fig. 4) and channel based sharding of HLF (Fig. 1 (c) and Fig. 3). We periodically create new channel with desired size. The overhead of 2PC protocol for a multi-channel sharding[3] will be reduced for a cluster with reduced number of nodes. The new overhead will be 4Cmax  messages and 2Cmax +2 forced log writes instead of 4n and 2n+2 respectively. Here, Cmax  is the node count in the largest community. To avoid overhead of cross cluster transactions, we should apply this technique for networks with uniform community and lesser inter community interaction from domain knowledge.

3.3.2 Measuring Storage.

Following our novel idea, we will calculate the storage reduction in overall network. The largest community in the graphs will take the largest share of ledger copies compared to other smaller communities. Hence, we simplify our calculation by normalizing number of community to, nc = N/Cmax. So, the storage rate can be measured using below equation for network-
\begin{equation} R_{vn} = \frac{1}{n_c} \times 100 \% \end{equation}
(3)
The fraction of maximum inter community links(Linter − max) among the communities should have been added to Equation 3. At most, Linter − max shard channels will have copies in node from different communities. Understandably this fraction should not be a major factor for Equation 3 if we assume sparse graph, temporal and probabilistic perspective of inter community link formation.

4 Experimental Evaluation

In this section we will describe the experimental setup, cluster structure and chaincode description for this work.

4.1 Experiments

Firstly, we run the load for 500 and 100 blocks SL for each sharded channel and took normalized storage measure of 10K transactions for comparison with non-sharded measure. For this experiment the number of transaction per block(TPB) is set to 10. As payload 1KB, 5KB, 20KB and 40KB of binary data with random alphanumeric key. Secondly, we initiated transactions for the combination of SL (10,50,100,200,500), TX size (1KB,5KB,20KB,40KB) and TPB (1,5,10,15,20,50) to calculate and observe the variance in storage consumption and latency.

4.1.1 Shard Structure.

As shown in Figure 1, with three peers joining the non-sharded channel and three sharded channels each with a combination of 2 peers out of 3 making up initial cluster configuration. When a shard is full a new shard is created with the same members as in the old shard thus maintaining the chronological structure over the progression of the network which is visualized in Fig. 3.

4.1.2 Experimental Setup.

We have done this experiment in containerized environment using docker in AWS EC2 t3.large instance. We used version 2.2 LTS docker images of HLF in our experiment. The networks shown in Fig. 1 is created following HLF sample networks.

4.1.3 Chaincode Description.

We have used the simple asset chaincode(Java) with write functionality provided in HLF samples [11]. The chaincode method signature is as below-
WriteAsset(key, value)

4.1.4 State DB.

Between Level DB and Couch DB we choose Level DB for its less overhead compared to Couch DB in configuration steps. In future works involving performance, both state DB may be analyzed.

4.1.5 Endorsement Policy.

For our experiment we used very basic endorsement policy where one of the peer must endorse the transaction for both sharded and non-sharded configuration.
Figure 5:
Figure 5: Storage Consumption for Networks in Fig. 1
Figure 6:
Figure 6: Storage and Latency w.r.t segment limit, TPB and TX size

4.2 Results

We present the results in Fig. 5 - 7 and Table 3.

4.2.1 HLF Segment Characteristics Observation.

The amount of storage taken compared to non sharded architecture is shown in Fig. 5(a). The storage consumed for a specific number of transactions for both 500 and 1000 SL is close for 1 TPB. Trivially, the higher the TX size the higher the shard size. However having control of the shard size gives flexibility on application or even cluster specific choice. In Fig. 5(b), the storage reduction for varied transaction size is presented. We further present the storage measures and latency variance with respect to SL, TPB, transaction size in Fig. 6(a) and Fig. 6(b) respectively. For storage reduction we find the impact of these parameters to be negligible. The latency vary very similarly for different SL, transaction size and TPB. This implies while choosing SL for specific network and application the designer need not think much about those parameters at least in intra-shard scenario and as discussed earlier 2PC like cross shard transactions can be implemented with low cost depending on Cmax.
Further analyzing we find, for smaller transaction size the storage consumption is close to non-sharded version as the transaction size is superseded by header size. As we increase the transaction size, the overhead of header information becomes negligible. As can be seen from Fig. 5(b), with 100KB transaction size the percentage of storage reduction(R) is 18.114%, the storage rate Rc (\((100 - R)\%\)) stands at around 81.89%(Ri=66.67% from Eq. 2; s=2, m=3) compared to non-sharded configuration for sample configuration in Fig. 1(c).
Fig. 6(a) shows how changing transaction size and SL we are producing a range of shards with different sizes that may be maintained by mobile and IoT devices internally or externally. Currently in mobile, only wallets are managed not the blocks; and transaction requests are made, through hosted APIs, not executed. Another advantage is the simplification of implementing archiving/caching for HLF blockchain. The older shards can be put into archive and active shards will be loaded in faster memory.
We compare actual storage rate(Rc) from computed shard size using Equation 1 with Ri. For calculating reference Rc we take from our experiment ns = 3, SL = 500, 10 TPB and 40KB payload. Here, Sc = 480MB. With same TX size and TPB, S = 1229MB for 10k transactions for original network without reducing FT. If we consider m = 25 and other TX sizes results in Fig. 7 is derived. If we consider m = 10 results in Table 1 shows numeric comparison.
Table 1:
s23456789
Ri2030405060708090
Rc23354759708294105
Table 1: HLF Sharding Ideal(Ri) vs. Actual(Rc) Storage Reduction
The practical results show the actual storage rate Rc is higher than that of Ring Overlap simulation. As depicted in Fig. 5(b), this overhead can be ignored for higher transaction sizes.
Figure 7:
Figure 7: Ring Overlap Simulation vs Our Experimental Storage Rate

4.2.2 Storage Analysis for Community to Cluster Mapping.

To reduce ledger copies further in a novel way the network structure, network growth and interaction properties of Social, e-commerce or scientific networks can be taken into account. For example, if the network grows with homophily to form certain clusters HLF shards can be formed on these clusters to minimize cross shard transactions as well as storage. With this assumption, Table 2 shows storage rate(Rv) in overall network with our methodology using Equation 3 for the datasets given in section 3.3.
Table 2:
DatasetNodesCmax Rvn
EU Email98610510.65
AS 7336474364956.50
Bitcoin OTC58814908.33
Table 2: Storage Rate in Community to Cluster Mapping
Note that, for AS dataset where more than 50% of node belongs to one large community the storage reduction can still be around 44%.

4.2.3 Combined Storage Reduction.

To get the final storage rate(Rf) the below equation is used-
\begin{equation} R_f = \frac{R_c \times R_{vn}}{100} \end{equation}
(4)
Table 3:
s23456789
EU Email2.53.75.06.27.58.710.011.2
AS 73313.219.826.533.139.746.352.959.5
BTC OTC2.02.93.94.95.96.87.88.8
Table 3: Combined Final Storage Rate
From Table 3, we can observe even with 6 copies of a shard the prospective storage reduction will be 92.5%, 60.3% and 94.1% for EU Email, AS 733 and Bitcoin OTC network respectively.

5 Conclusions

In this paper, we show significant reduction of storage for three datasets incorporating network community characteristics. By implementing basic shard generation with a configurable SL, we demonstrated the characteristics of HLF shard sizes for different parameters. Sharding of ledger enables selected storing mechanism and execution of chaincode over limited portion of blocks. Overall, the prospect and feasibility of inclusion of limited resource nodes is made clear by this work.

References

[1]
Mohammad Javad Amiri, Divyakant Agrawal, and Amr El Abbadi. 2021. Sharper: Sharding permissioned blockchains over network clusters. In Proceedings of the 2021 International Conference on Management of Data. 76–88.
[2]
Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al. 2018. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the thirteenth EuroSys conference. 1–15.
[3]
Elli Androulaki, Christian Cachin, Angelo De Caro, and Eleftherios Kokoris-Kogias. 2018. Channels: Horizontal scaling and confidentiality on permissioned blockchains. In Computer Security: 23rd European Symposium on Research in Computer Security, ESORICS 2018, Barcelona, Spain, September 3-7, 2018, Proceedings, Part I 23. Springer, 111–131.
[4]
Shehar Bano, Mustafa Al-Bassam, and George Danezis. 2017. The road to scalable blockchain designs. USENIX; login: magazine 42, 4 (2017), 31–36.
[5]
Hung Dang, Tien Tuan Anh Dinh, Dumitrel Loghin, Ee-Chien Chang, Qian Lin, and Beng Chin Ooi. 2019. Towards scaling blockchain systems via sharding. In Proceedings of the 2019 international conference on management of data. 123–140.
[6]
Ali Dorri, Salil S Kanhere, Raja Jurdak, and Praveen Gauravaram. 2019. LSB: A Lightweight Scalable Blockchain for IoT security and anonymity. J. Parallel and Distrib. Comput. 134 (2019), 180–197.
[7]
Jacob Eberhardt and Stefan Tai. 2017. On or off the blockchain? Insights on off-chaining computation and data. In European Conference on Service-Oriented and Cloud Computing. Springer, 3–15.
[8]
Daniel Edler, Anton Holmgren, and Martin Rosvall. 2024. The MapEquation software package. https://mapequation.org.
[9]
Adem Efe Gencer, Robbert van Renesse, and Emin Gün Sirer. 2017. Short paper: Service-oriented sharding for blockchains. In International Conference on Financial Cryptography and Data Security. Springer, 393–401.
[10]
Houshyar Honar Pajooh, Mohammad Rashid, Fakhrul Alam, and Serge Demidenko. 2021. Hyperledger Fabric Blockchain for Securing the Edge Internet of Things. Sensors 21, 2 (2021), 359.
[11]
Hyperledger. 2024. Hyperledger Github Respositories. Retrieved 22 November 2024 from https://github.com/hyperledger/
[12]
IBM. 2018. Storage Needs for Blockchain Technology-Point of View. Retrieved 22 November 2024 from https://www.ibm.com/downloads/cas/LA8XBQGR
[13]
Mohammad Javad Amiri, Divyakant Agrawal, and Amr El Abbadi. 2019. SharPer: Sharding Permissioned Blockchains Over Network Clusters. arXiv e-prints (2019), arXiv–1910.
[14]
Sidra Khatoon and Nadeem Javaid. 2019. ‘Blockchain based decentralized scalable identity and access management system for Internet of Things. Technical Report. Working Paper.
[15]
Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and VS Subrahmanian. 2018. Rev2: Fraudulent user prediction in rating platforms. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 333–341.
[16]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 177–187.
[17]
Wenxuan Liu, Donghong Zhang, Chunxiao Mu, Xiangfu Zhao, and Jindong Zhao. 2022. Ring-Overlap: A Storage Scaling Mechanism for Hyperledger Fabric. Applied Sciences 12, 19 (2022), 9568.
[18]
Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. Decentralized Business Review (2008), 21260.
[19]
Ashwin Paranjape, Austin R Benson, and Jure Leskovec. 2017. Motifs in temporal networks. In Proceedings of the tenth ACM international conference on web search and data mining. 601–610.
[20]
Julien Polge, Jérémy Robert, and Yves Le Traon. 2021. Permissioned blockchain frameworks in the industry: A comparison. Ict Express 7, 2 (2021), 229–233.
[21]
Gavin Wood et al. 2014. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, 2014 (2014), 1–32.
[22]
Xiaoqiong Xu, Gang Sun, Long Luo, Huilong Cao, Hongfang Yu, and Athanasios V Vasilakos. 2021. Latency performance modeling and analysis for hyperledger fabric blockchain network. Information Processing & Management 58, 1 (2021), 102436.
[23]
Yibin Xu and Yangyu Huang. 2020. Segment blockchain: A size reduced storage mechanism for blockchain. IEEE Access 8 (2020), 17434–17441.
[24]
Mahdi Zamani, Mahnush Movahedi, and Mariana Raykova. 2018. Rapidchain: Scaling blockchain via full sharding. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 931–948.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
NSysS '24: Proceedings of the 11th International Conference on Networking, Systems, and Security
December 2024
278 pages
ISBN:9798400711589
DOI:10.1145/3704522
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 January 2025

Check for updates

Qualifiers

  • Short-paper

Conference

NSysS '24

Acceptance Rates

Overall Acceptance Rate 12 of 44 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 313
    Total Downloads
  • Downloads (Last 12 months)313
  • Downloads (Last 6 weeks)57
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media