short-paper

Open access

Short Paper: Storage Reduction of Private Blockchain with Sharding and Community Based Clustering

Authors:

Md. Mizanur Rahman,

Muhammad Abdullah AdnanAuthors Info & Claims

NSysS '24: Proceedings of the 11th International Conference on Networking, Systems, and Security

Pages 260 - 266

https://doi.org/10.1145/3704522.3704554

Published: 03 January 2025 Publication History

PDF eReader

Abstract

Blockchain has gained popularity in software industry for its transparency, security and privacy guarantees. The scalability issues in terms of latency and throughput is addressed with better consensus and communication protocols. But the inherent problem of high storage requirement for append only blocks and high computing needs are also the barriers to large scale adoption of blockchain in networks with low end devices. In this work, we propose a sharding configuration for popular Hyperledger Fabric(HLF) blockchain and show the storage reduction utilizing the network community detection. First our implementation derive channel based shard characteristics in terms of different parameters. We compare the practical storage rate computed by our implementation with state-of-the art. Finally, we show the storage reduction in three datasets applying our novel approach of community based clustering for blockchain sharding. Overall, our work paves way for running HLF on a small scale, reducing cost of running and adding compatibility in various use cases.

1 Introduction

Blockchain is like linked list of blocks. New blocks are linked to previous ones using special hashing mechanism based on information in the block. Blocks are built from transactions. Each transaction(TX) is considered immutable as the blocks and chain of blocks are immutable for distributed consensus mechanism. With growing participation in network the required storage is rising at good pace. The Bitcoin network [18] has grown to 600GB+ and for Ethereum [21] its 1.1TB+ as of September 2024. The new nodes wanting to join in these networks need to ensure far more capacity keeping future growth in mind.

For addressing the storage problem two broad types of techniques are applied in current literature. One involves forming a cluster of nodes and trying to process transactions within a single cluster and thus minimizing inter-cluster communications [5], [9]. Some of these techniques create data clusters along with node clusters. In [1, 13] such a technique is applied. The other type of solutions approach the problem with the idea of dividing the data in the blockchain such as [23].

Distributed data implementations in database systems inspired to bring similar approach into blockchain. Numerous approaches like weighted models [6, 14], off-chain [7], on-chain [4], blockchain sharding [24] are proposed in recent research to improve the performance of the blockchain.

In [23] authors dynamically divide the blockchain structures into segments. They ensure that each segment is stored in at least one reliable keeper by classifying the nodes using the jury hypothesis. However, their approach goes against permissioned blockchain with PoW overhead. The complexity in achieving a fully working solution is further validated by Ethereum’s [21] progress of implementing shard-chains called danksharding (still several years away according to them).

It is evident that, the high storage problem should be solved not only for public blockchain but also for private ones. Even with moderate TPS IBM blockchain based on HLF may produce Terabytes of data yearly [12]. This makes the topic more interesting for HLF.

As a groundwork to overcome the storage problem in HLF, we propose a novel idea. In our approach we propose to keep overlapping copies(s) of a ledger in a cluster. Moreover, we propose to form the cluster based on the network community formed by the participating nodes. Finally, the reduced storage usage can be obtained for this two level of minimization in ledger copies, first inner cluster using overlapping ledger copies and then in overall network by applying community to cluster mapping.

We have devised the workflow for overlapping ledger copy generation after defined number of blocks(shard limit-SL) are added. To do this, we have customized the peer source code of HLF. We made use of the HLF Java SDK with shell scripts. Then we release the load to custom HLF network with overlapping copies of ledger(channel) among the peers to take storage measures. The practical size of shards are computed for the combination of shard limit, transaction size and transaction per block. Then the comparison of storage rate (% storage taken compared to all nodes having same ledger copies) in our practical calculation with related works is presented. We further formulate the equation for community to cluster mapping and show results for three network datasets. In first step, the network is partitioned into cluster based on detected community. We have applied InfoMap[8] community detection on three datasets to show the number of clusters that can be formed using the size of largest community and number of detected community. The benefit of community to cluster mapping is less cross shard transactions. Secondly, inside the cluster we store the overlapping copies of ledger to reduce storage further. When we do this inside a cluster, it ensures reduced number of messaging in a carefully chosen network. We analyze the characteristics of overlapping shard sizes based on different shard limit, transaction size and transaction per block combination. These provides insight into the feasibility of operating on non-compatible devices, e.g. mobile and IoT devices, in our approach. Rest of the paper we discuss related works for HLF and sharding, describe essential background of HLF and our methodology, our implementation and experimental setup and finally present the results.

2 Related Work

HLF[2] is by far the most adopted private blockchain [20]. [20] compares permissioned blockchains to explain the trade-offs for community activities, performance, scalability, privacy and adoption in industry. HLF shows generic superiority on most of the aspects. Recent works related to HLF is on application of blockchain like [10] or on performance and scalability [1, 5, 22].

[5] proposes three improvements on HLF - AHL(Attested Hyperledger), AHL+ and AHLR(AHL Relay). They have presented a shard formation protocol on top of AHL+ achieving 3k+ TPS. Small trusted logs are maintained inside the trusted execution environment (TEE) to avoid tampering in AHL. Unlike HLF, AHL keeps different logs for different consensus messages. AHL+ uses two separate channel(one in HLF) for consensus messages and request messages to delay overflow and reduce drop of messages. AHLR only optimizes communication in case of no view change. Storage scaling is not covered.

SharPer [1] describes how DAG of blocks can form shards with clusters. Here, cross-shard transactions are duplicated among different shards. All other transactions belong to only specific shard. However, the storage analysis(understandably reduction is there) is not covered by the authors as the focus is only on the throughput and latency for their work.

In [17], a simulation analysis show sharding and distribution of ledger copies in a cluster with fixed number of nodes. The block data are stored overlapped on some nodes in the cluster. Each block is guaranteed to have some copies in a cluster. Data integrity will be maintained if number of node failures in a single cluster is within threshold. In SharPer and RingOverlap the DAG or Overlapped ledger can themselves be large and ever growing. This makes it non-feasible to operate in Mobile Node or IoT Node in long run. With this in mind, we focus on the storage analysis in our approach for individual nodes and overall network with limited sized shards.

3 Background and Methodology

HLF offers an organizational structure and collaboration framework among participants in its modular design. The design allows interaction among organization’s peers and clients application using channel to address real world scenarios. The details of every components is available in Hyperledger documentation and source codes are available in Github [2].

As Figure 1 shows, the basic motivation to divide the ledger into shard is to enable dynamic membership of several peers in a shard and thus the whole chain. In HLF, the permission boundary is a channel which represents a private ledger inside the network with defined and agreed membership of participants. We make use of the fact that, the joined members agreed on the specifics of the channel. So spawning another channel with the same criteria is straightforward. We create a new channel(shard) when number of block (shard limit(SL)) cross preset value. To show the change in channel membership, we choose a sample network similar to test network provided by HLF. This network is very representative for any other scenario. Figure 1(a) shows our sample network with three peers joining in the channel from one organization Org1. If the ledger size is S then the total space occupied would be 3*S. Now we present our change to the network in Fig. 1(c) where we divide the ledger in three channels and join 2 peers in each of the channel so that each channel has a copy of it in two of the joined peers(i.e overlapping shard, s=2). The fault tolerance(FT) could be reduced like Fig. 1(b) in lieu of restricting peers from option to join specific shards as needed unlike in Fig. 1(c).

Figure 1:

Now, we move to storage rate(R_v) calculation where the original ledger of size S is divided into n_s parts leaving the shard size, S_c = S/n_s in ideal case. In actual implementation each shard will have some overhead making S_c > S/n_s. The overlapping shard number s implies that we have s copies in cluster with m nodes. So the general formula of R_v will be-

\begin{equation} R_v = \frac{n_s \times s \times S_c}{m \times S} \times 100\% \end{equation}

(1)

For ideal case,

\begin{equation} R_i = \frac{s}{m} \times 100\% \end{equation}

(2)

Equation 1 is applicable for inner cluster sharding of HLF. Note that, increasing s increases the reliability of the system.

It is common to set specific cluster size and formation based on geographic attribute to form cluster. We add novelty in this cluster formation utilizing network community detection. Communities have more intra links than inter community links yielding predictable cross cluster transactions and better ledger partitioning. This is described in section 3.3. Like multi-channel Fabric [2, 3], channels serve as shards with efficient processing of intra-shard transactions. On the other hand, cross shard transactions require either a trusted channel among the participants or a 2PC like atomic commit protocol [3]. In 2PC, additional cost of 4n messages and 2n + 2 forced log writes is added for n participants. This overhead can be minimized using our idea of community to cluster mapping in applicable network as discussed in 3.3.1.

3.1 Flowchart and Key Functions

Figure 2:

Figure 3:

In Fig. 2, we have outlined basic steps for sharding HLF keeping state of the blockchain configuration in mind. We can identify below major tasks in our experiment as described below-

•

Shard Creation: Shard creation is performed when the block limit is crossed for a shard. In this case, the participating peers need to rejoin with the same configurations in a new shard as shown in Fig. 3(a). Here, initial shards are created at network startup. In Fig. 3(b) a generic view of shard creation over time is presented. We have used HLF Java SDK with REST API endpoint in our implementation to achieve shard generation functionality. Since we limit the scope to storage analysis at this moment we skipped a rigorous implementation including consensus on shard creation and 2PC protocol for multichannel.

•

Chaincode Deployment in New Shard Channel: The chaincodes must be programmatically deployed for new shard channels for transaction processing. Adding a chaincode to the channel is a multi-step process where the participating peers will interact to make it ready to execute. There are two options to deploy chaincode for channel starting from HLF V2.0- (1) Embedded Chaincode (2) Chaincode as External Service. For this experiment we used the embedded approach for simplicity. This is implemented using several shell scripts and HLF SDK.

•

Configuration change: The configuration change parameters are listed and yet to be implemented. The realistic scenario is to experience cluster and network configuration changes (like number of peers, organizations, chaincode, FT etc) along with shard size and transaction size. The constraints on preceding parameters can be set to maintain a balanced trade-off of system reliability and FT to achieve desired storage reduction and compatibility for target devices.

3.2 Algorithm for Transaction, Shard Generation

As in 3.1 the full workflow is described for shard generation and configuration change. We limit our scope of experimentation to transaction Algorithm 1 only.

3.3 Community Based Clustering

To the best of our knowledge, we are the first to present the idea of community to network cluster mapping for blockchain. We discuss storage reduction measures for three datasets- email-Eu-core temporal network [19], AS-733 [16], Bitcoin(BTC) OTC network [15]. The resulting community graphs using InfoMap[8] algorithm is presented in Fig. 4. Here, each node is a community and edges represent inter community links. As depicted, the applicability of a community to network cluster mapping is promising.

Figure 4:

3.3.1 Integrating HLF Sharding with Community Based Clustering.

In this section, we point out the feasibility and approach for integrating the community based cluster (Fig. 4) and channel based sharding of HLF (Fig. 1 (c) and Fig. 3). We periodically create new channel with desired size. The overhead of 2PC protocol for a multi-channel sharding[3] will be reduced for a cluster with reduced number of nodes. The new overhead will be 4C_max messages and 2C_max+2 forced log writes instead of 4n and 2n+2 respectively. Here, C_max is the node count in the largest community. To avoid overhead of cross cluster transactions, we should apply this technique for networks with uniform community and lesser inter community interaction from domain knowledge.

3.3.2 Measuring Storage.

Following our novel idea, we will calculate the storage reduction in overall network. The largest community in the graphs will take the largest share of ledger copies compared to other smaller communities. Hence, we simplify our calculation by normalizing number of community to, n_c = N/C_max. So, the storage rate can be measured using below equation for network-

\begin{equation} R_{vn} = \frac{1}{n_c} \times 100 \% \end{equation}

(3)

The fraction of maximum inter community links(L_{inter − max}) among the communities should have been added to Equation 3. At most, L_{inter − max} shard channels will have copies in node from different communities. Understandably this fraction should not be a major factor for Equation 3 if we assume sparse graph, temporal and probabilistic perspective of inter community link formation.

4 Experimental Evaluation

In this section we will describe the experimental setup, cluster structure and chaincode description for this work.

4.1 Experiments

Firstly, we run the load for 500 and 100 blocks SL for each sharded channel and took normalized storage measure of 10K transactions for comparison with non-sharded measure. For this experiment the number of transaction per block(TPB) is set to 10. As payload 1KB, 5KB, 20KB and 40KB of binary data with random alphanumeric key. Secondly, we initiated transactions for the combination of SL (10,50,100,200,500), TX size (1KB,5KB,20KB,40KB) and TPB (1,5,10,15,20,50) to calculate and observe the variance in storage consumption and latency.

4.1.1 Shard Structure.

As shown in Figure 1, with three peers joining the non-sharded channel and three sharded channels each with a combination of 2 peers out of 3 making up initial cluster configuration. When a shard is full a new shard is created with the same members as in the old shard thus maintaining the chronological structure over the progression of the network which is visualized in Fig. 3.

4.1.2 Experimental Setup.

We have done this experiment in containerized environment using docker in AWS EC2 t3.large instance. We used version 2.2 LTS docker images of HLF in our experiment. The networks shown in Fig. 1 is created following HLF sample networks.

4.1.3 Chaincode Description.

We have used the simple asset chaincode(Java) with write functionality provided in HLF samples [11]. The chaincode method signature is as below-

•

WriteAsset(key, value)

4.1.4 State DB.

Between Level DB and Couch DB we choose Level DB for its less overhead compared to Couch DB in configuration steps. In future works involving performance, both state DB may be analyzed.

4.1.5 Endorsement Policy.

For our experiment we used very basic endorsement policy where one of the peer must endorse the transaction for both sharded and non-sharded configuration.

Figure 5:

Figure 6:

4.2 Results

We present the results in Fig. 5 - 7 and Table 3.

4.2.1 HLF Segment Characteristics Observation.

The amount of storage taken compared to non sharded architecture is shown in Fig. 5(a). The storage consumed for a specific number of transactions for both 500 and 1000 SL is close for 1 TPB. Trivially, the higher the TX size the higher the shard size. However having control of the shard size gives flexibility on application or even cluster specific choice. In Fig. 5(b), the storage reduction for varied transaction size is presented. We further present the storage measures and latency variance with respect to SL, TPB, transaction size in Fig. 6(a) and Fig. 6(b) respectively. For storage reduction we find the impact of these parameters to be negligible. The latency vary very similarly for different SL, transaction size and TPB. This implies while choosing SL for specific network and application the designer need not think much about those parameters at least in intra-shard scenario and as discussed earlier 2PC like cross shard transactions can be implemented with low cost depending on C_max.

Further analyzing we find, for smaller transaction size the storage consumption is close to non-sharded version as the transaction size is superseded by header size. As we increase the transaction size, the overhead of header information becomes negligible. As can be seen from Fig. 5(b), with 100KB transaction size the percentage of storage reduction(R) is 18.114%, the storage rate R_c (\((100 - R)\%\)) stands at around 81.89%(R_i=66.67% from Eq. 2; s=2, m=3) compared to non-sharded configuration for sample configuration in Fig. 1(c).

Fig. 6(a) shows how changing transaction size and SL we are producing a range of shards with different sizes that may be maintained by mobile and IoT devices internally or externally. Currently in mobile, only wallets are managed not the blocks; and transaction requests are made, through hosted APIs, not executed. Another advantage is the simplification of implementing archiving/caching for HLF blockchain. The older shards can be put into archive and active shards will be loaded in faster memory.

We compare actual storage rate(R_c) from computed shard size using Equation 1 with R_i. For calculating reference R_c we take from our experiment n_s = 3, SL = 500, 10 TPB and 40KB payload. Here, S_c = 480MB. With same TX size and TPB, S = 1229MB for 10k transactions for original network without reducing FT. If we consider m = 25 and other TX sizes results in Fig. 7 is derived. If we consider m = 10 results in Table 1 shows numeric comparison.

Table 1:

s	2	3	4	5	6	7	8	9
*R_i*	20	30	40	50	60	70	80	90
*R_c*	23	35	47	59	70	82	94	105

Table 1: HLF Sharding Ideal(R_i) vs. Actual(R_c) Storage Reduction

The practical results show the actual storage rate R_c is higher than that of Ring Overlap simulation. As depicted in Fig. 5(b), this overhead can be ignored for higher transaction sizes.

Figure 7:

4.2.2 Storage Analysis for Community to Cluster Mapping.

To reduce ledger copies further in a novel way the network structure, network growth and interaction properties of Social, e-commerce or scientific networks can be taken into account. For example, if the network grows with homophily to form certain clusters HLF shards can be formed on these clusters to minimize cross shard transactions as well as storage. With this assumption, Table 2 shows storage rate(R_v) in overall network with our methodology using Equation 3 for the datasets given in section 3.3.

Table 2:

Dataset	Nodes	C_max	R_vn
EU Email	986	105	10.65
AS 733	6474	3649	56.50
Bitcoin OTC	5881	490	8.33

Table 2: Storage Rate in Community to Cluster Mapping

Note that, for AS dataset where more than 50% of node belongs to one large community the storage reduction can still be around 44%.

4.2.3 Combined Storage Reduction.

To get the final storage rate(R_f) the below equation is used-

\begin{equation} R_f = \frac{R_c \times R_{vn}}{100} \end{equation}

(4)

Table 3:

s	2	3	4	5	6	7	8	9
EU Email	2.5	3.7	5.0	6.2	7.5	8.7	10.0	11.2
AS 733	13.2	19.8	26.5	33.1	39.7	46.3	52.9	59.5
BTC OTC	2.0	2.9	3.9	4.9	5.9	6.8	7.8	8.8

Table 3: Combined Final Storage Rate

From Table 3, we can observe even with 6 copies of a shard the prospective storage reduction will be 92.5%, 60.3% and 94.1% for EU Email, AS 733 and Bitcoin OTC network respectively.

5 Conclusions

In this paper, we show significant reduction of storage for three datasets incorporating network community characteristics. By implementing basic shard generation with a configurable SL, we demonstrated the characteristics of HLF shard sizes for different parameters. Sharding of ledger enables selected storing mechanism and execution of chaincode over limited portion of blocks. Overall, the prospect and feasibility of inclusion of limited resource nodes is made clear by this work.

References

[1]

Mohammad Javad Amiri, Divyakant Agrawal, and Amr El Abbadi. 2021. Sharper: Sharding permissioned blockchains over network clusters. In Proceedings of the 2021 International Conference on Management of Data. 76–88.

Abstract

1 Introduction

2 Related Work

3 Background and Methodology

3.1 Flowchart and Key Functions

3.2 Algorithm for Transaction, Shard Generation

3.3 Community Based Clustering

3.3.1 Integrating HLF Sharding with Community Based Clustering.

3.3.2 Measuring Storage.

4 Experimental Evaluation

4.1 Experiments

4.1.1 Shard Structure.

4.1.2 Experimental Setup.

4.1.3 Chaincode Description.

4.1.4 State DB.

4.1.5 Endorsement Policy.

4.2 Results

4.2.1 HLF Segment Characteristics Observation.

4.2.2 Storage Analysis for Community to Cluster Mapping.

4.2.3 Combined Storage Reduction.

5 Conclusions

References

Index Terms

Recommendations

A Community Detection-Based Blockchain Sharding Scheme

Sharding for Scalable Blockchain Networks

MC Sharding: An Efficient Blockchain Sharding Based on Minimum Cut

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations