1 Introduction

In recent years, research institutions, enterprises and governments have attached great importance and in-depth study to Internet of Things (IoTs). As an essential part of IoTs, wireless sensor networks (WSNs) can collect the real-time environmental or physical data for transferring to the remote monitors or the Internet. Recently, WSNs have been used widely in practical applications, e.g. environmental monitoring, military, national defense, traffic management, medical and health, manufacturing, and disaster prevention [1]. A WSN consists of a large number of battery-driven sensor nodes, which have low computing, storage and communication power [2]. In most of the cases, the sensor nodes die because of energy depletion. After a certain percentage of sensor nodes die, the WSN becomes impractical. However, the sensor nodes are usually deployed in harsh or hostile environments. Therefore, it is almost impossible to recharge or replace the battery of the sensor nodes [3]. Thus, the manner of balancing energy consumption becomes critical in WSNs.

In WSNs, the majority of energy consumption is due to data transmission [4]. By reducing the amount of transmitted data and optimizing the data transmission path, clustering provides a practical solution to improve energy efficiency and extend network life cycle [5].The main idea of clustering is to select cluster heads (CHs) through a distributed or centralized method [6]. Each CH belongs to a cluster. Beside the CH, each cluster also contains multiple cluster members (CMs). The CMs are responsible for collecting and sending information to the CH. Then, the CH aggregates the received data and forwards the data to the sink node or the base station [7]. In clustering algorithms, the sensor nodes only need to communicate with the CHs or the sink node. And through data aggregation some redundant or incorrect data can be removed by CHs. Thus, clustering improves the efficiency of data collection by saving the energy dissipation of the network.

For WSNs load balancing is also an important metric in the clustering along with energy efficiency and network life cycle. Some clustering algorithms [8,9,10] add CMs to their nearest CHs. This kind of allocation methods may lead to unbalanced loading in which some CHs consume a high amount of energy because of connecting with a large number of CMs. In this case, the energy of the whole network cannot be consumed equably and some nodes might die quickly. In addition, the CH with too many CMs also needs to wait for a long period of time to collect all of the messages from the CMs. This condition can cause latency and packets loss [11].

In this paper, we present a distributed and load balancing clustering algorithm DLBCA for WSNs to improve energy efficiency and solve load unbalancing problem. The contributions of our approach are as follows:

  • Compared to some centralized clustering algorithms [12,13,14], the sensor nodes identify the CHs and the clustering structure locally, without relying on the sink node. Thus, the sink node does not consume high computing capabilities to determine the CHs and their corresponding CMs.

  • The residual energy of each sensor node is considered. In the set-up phase of each round, the nodes with higher residual energy have higher probabilities to become CHs. In this way, the energy of the nodes can be consumed evenly.

  • The location of the nodes is also considered when determining the CHs. If the distance from a node to its neighboring nodes is shorter than others, it has priority to be selected as a CH. In this way, the consumed energy of each round can be reduced.

  • To make the number of assigned CMs of each CH balanced, three matrixes (DD, Flag and FlagDis) are designed to allocate the CMs for the selected CHs. The matrixes store the distances and relationships between sensor nodes. By calculating the Flag and FlagDis, we can get the appropriate and load-balanced clustering structure. In this way, the sensor nodes can consume the energy evenly. In addition, latency and packet loss rate can also be reduced.

The remainder of the paper is structured as follows: Sect. 2 introduces the previous work of other experts. Sect. 3 presents some preliminaries used in this paper. Sect. 4 gives the detail description of the proposed algorithm. Sect. 5 illustrates the experimental results. Section 6 presents the conclusion.

2 Related work

Clustering algorithms can be divided into centralized clustering and distribute clustering. For centralized clustering, the structure is being calculated in one or multiple central nodes. Many researches have been dedicated into centralized clustering [15, 16]. Heinzelman [17] has proposed one of the most representative centralized clustering protocols. Before the calculation of CHs, each node sends its current location and residual energy to the sink node. After collecting the information about all of the sensor nodes in the network, the sink node uses simulated annealing to determine the CHs and then informs the sensor nodes. Darabkh et al. [18] have proposed a centralized density and threshold-based CH replacement protocol (C-DTB-CHR). Based on this protocol, a C-DTB-CHR with adaptive data distribution protocol (C-DTB-CHR-ADD) is presented. The protocols aim to minimize the operations in the clustering, prevent some nodes to die in advance, and reduce the communication distances. Sasikumar et al. [14] have proposed a centralized K-Means based centralized clustering protocol, which applies the K-Means algorithm to divide the sensor nodes into multiple clusters. After determining the clusters, the sink node selects the most appropriate node as the CH for each cluster. For centralized clustering, using heuristic algorithms (e.g. genetic algorithm, particle swarm optimization, and artificial immune algorithm) is a good solution for clustering. To improve the network life cycle and energy efficiency, many heuristic algorithms based clustering protocols [19,20,21,22] have been proposed. Kuila et al. [20] have presented a genetic algorithm based load balanced clustering algorithm (GALBCA). This clustering algorithm encodes all of the ordinary sensors into one chromosome and sets the gene of each node via the randomly selected legal gateway node of each node. Kim et al. [20] have applied the particle swarm optimization to determine an energy efficient clustering routing scheme. The fitness function of the clustering scheme is based on the ratio of the estimated life-cycle and the average transmission distance.

The centralized clustering protocols have disadvantages of latency and packets loss. Therefore, most of the studies have focused on the research about distributed clustering [23,24,25,26]. Heinzelman et al. [8] have proposed the most famous hierarchical clustering protocol denoted as low-energy adaptive clustering hierarchy (LEACH). The protocol divides the network time into multiple rounds. In each round, each node judges whether it can become a CH though a probability. The selected CHs broadcast their information and inform the neighboring nodes to become their members. In this way, the sensor nodes can act as CHs in turns and the network life cycle can be prolonged. Younis et al. [27] have presented an hybrid energy-efficient distributed clustering (HEED) algorithm. The probability of CHs selection is related to the residual energy of sensor nodes. HEED divides the sensor nodes into several levels based on the residual energy. The nodes in higher levels have higher probabilities to become CHs. Smaragdakis et al. [28] have proposed a stable election protocol (SEP) for heterogeneous WSNs. The protocol sets different weighted probabilities for advance nodes and normal nodes. The advance nodes are more likely to be selected as a CH. Thus, it extends the life cycle of the normal nodes. Mehmood et al. [29] have designed a type of candidate CHs named VH (Vice Cluster Head). In a cluster, the node with highest residual energy is selected as the CH. The node with second highest residual energy is selected as the VH, which is used to act as the backup of CH and is always in the sleepy state. Once the residual energy of CH is less than a certain threshold, the VH is awakened and replaces the previous CH. Batra et al. [9] have proposed a CH selection algorithm based on the messages of MAC (Medium Access Control) layer. According to the messages, the sensor nodes can be sorted by their residual energy. The nodes with higher energy have larger probabilities to become CHs. Yang et al. [30] have proposed a clustering scheme with multi-level energy heterogeneity. The scheme divides the sensor field into multiple equal-sized rectangle units. To balance energy consumption among the CHs in different units, the number of clusters in each unit is calculated. Then the optimal number of units can be found by minimizing the total energy consumption.

3 Preliminaries

This section describes the network model and introduces some prior knowledge, which includes a brief introduction to LEACH and the three matrixes used in the proposed DLBCA algorithm.

3.1 Network model

In this paper, we consider a two-level WSN, where n sensor nodes are randomly distributed. These nodes are responsible for monitoring the information of the network area. And we have the following assumptions:

  • In the two-level WSN, the higher level nodes are CHs and the lower level nodes are CMs. Each CM sends the same length message to its CH.

  • The sink node is located at the center of the network and has a constant energy supply. The sensor nodes can send their collected information to the sink node directly.

  • There are two types of sensor nodes: advance nodes and normal nodes. Both of them are equipped with limited energy supply.

  • Each node can run data aggregation algorithms and the energy for aggregation is \(E_{DA}\)(nJ/bit).

Energy plays a very important role during establishment and running of the network. This paper adopts the first order radio model [8] to analysis the energy consumption of sensor nodes. The amount of energy consumption in transmitting l-bit packet from node i to node j can be represented by

$$\begin{aligned} \begin{aligned} E_{TX}\left( l,d_{ij}\right)&=E_{elec}*l+E_{amp}\left( d_{ij}\right) *l\\&={\left\{ \begin{array}{ll} E_{elec}*l+\varepsilon _{fs}*l*d_{ij}^{2}, &{} \text { if } d_{ij}<d_{0} \\ E_{elec}*l+\varepsilon _{mp}*l*d_{ij}^{4}, &{} \text { if } d_{ij}\ge d_{0} \end{array}\right. } \end{aligned} \end{aligned}$$
(1)

where \(E_{elec}\) is the energy required for driving and controlling electronic components. \(E_{amp}(d_{ij})\) represents the energy consumption of signal amplification when transmitting 1-bit data. \(\varepsilon _{fs}\) and \(\varepsilon _{mp}\) are the factors for free space model and multipath model, respectively. \(d_{ij}\) is the Euclidean distance between node i and node j. And \(d_{0}\) is the distance threshold, which is calculated as

$$\begin{aligned} d_{0}=\sqrt{\varepsilon _{fs}/\varepsilon _{mp}}. \end{aligned}$$
(2)

Meanwhile, the node consumes the following amount of energy in receiving l-bit packet:

$$\begin{aligned} E_{RX}(l)=E_{elec}*l. \end{aligned}$$
(3)

Assume that the proportion of advance node is \(\alpha\), the initial energy of normal node is \(E_{0}\), the initial energy of advance node is \(\beta\) times more than that of normal energy. Thus, we can get the total initial energy:

$$\begin{aligned} E_{total}=E_{nomal}+E_{advance}=n(1-\alpha )E_{0}+n\alpha *E_{0}(1+\beta )=nE_{0}(1+n\beta ). \end{aligned}$$
(4)

3.2 Brief introduction to LEACH

The proposed DLBCA algorithm is based on the “round” idea of the LEACH algorithm. As shown in Fig. 1, the running time of network is divided into multiple equal rounds. Each round contains a set-up phase and a steady-state phase. The steady-state phase is divided into multiple frames and the time of this phase is much longer than that of the set-up phase.

Fig. 1
figure 1

Flow of LEACH protocol

In the set-up phase, each sensor node generates a random number in [0, 1]. If the random number is less than the threshold T(n), then the corresponding node is selected as the CH. The calculation method of T(n) is formulated as

$$\begin{aligned} T_{n}={\left\{ \begin{array}{ll} \frac{p}{1-p*(r\;mod\;\frac{1}{p}))} &{} \text { if } n\in G\\ 0 &{} \text { otherwise } \end{array}\right. } \end{aligned}$$
(5)

where p denotes the expected proportion of CHs (when p equals 5%, the maximum network life cycle can be reached [31]). r is the current round number. G denotes the set of nodes that have not been selected as CHs in previous 1/p rounds. After the CHs have been selected, they broadcast their identity. The normal nodes choose a CH with strongest signal (closest CH) and response a join-request message. When the network enters into the steady-state phase, the member nodes send the information to their CHs during their own slots. Then the CHs aggregate the collected information through some effective data aggregation algorithms and send the results to the sink node. At the end of the current round, the network enters into a new round.

Although LEACH is one of the most representative clustering algorithms, there are some limitations. First, the CHs are selected randomly, thus, if the nodes with lower residual energy are selected as CHs, they might die quickly. Second, the unbalanced loads of CHs may cause some CHs have too many CMs and consume a large amount of energy. In this study, we improve the CHs selection method where the nodes with higher residual energy will have higher probabilities to become CHs. In addition, a load-balanced CMs assignment method is proposed. In this way, the energy of the nodes can be consumed evenly.

3.3 Three matrixes

In this paper, we first introduce the node distance matrix ND, which stores the distance between each sensor node. Table 1 illustrates an example of the ND matrix, which indicates that there are 9 nodes in the network. ND(i, j) is the distance between node i and node j. In the given example, ND(1, 2)=7.35 indicates that the distance between node 1 and node 2 is 7.35m. In addition, the distance between a same node is set \(\infty\), i.e., ND(1, 1) = \(\infty\).

Table 1 Node distance matrix (ND)

Node relationship matrix (Flag) stores the relationship between each sensor node. If Flag(i,j) = 1, then it indicates node i and node j can communicate with each other. Table 2 illustrates an example of Flag matrix whose size is 9*9. There are 9 nodes in the network. In the Flag matrix, if Flag(i,j) = 1,\((i ,j =1,\ldots ,9)\), then we can learn that node i is the CH of node j. From the given Flag example, it is obvious that the network is divided into 3 clusters and the identities of CHs are 1, 3 and 8, respectively. Fig. 2 illustrates the ownership relationships between the CHs and CMs of Table 2. The relationship distance matrix (FlagDis) stores the distances between the sensor nodes, which can communicate with each other. As shown in Table 3, FlagDis(1,4) = 2.93 denotes that the distance between the CH 1 and the CM 4 is 2.93 m.

Table 2 Node relationship matrix (Flag)
Fig. 2
figure 2

Ownership relationships between nodes of Table 2

Table 3 Relationship distance matrix (FlagDis)

4 Proposed algorithm

This section presents the detail design of the proposed DLBCA algorithm. The computing method of the optimal number of CHs is firstly introduced. We then describe the CHs selection method, the process of assigning CMs, and further optimization of clusters. Finally, the overall description of DLBCA is discussed.

4.1 Optimal number of CHs

Assume that n sensor nodes are randomly and evenly distributed in a \(M*M\) \({\rm m}^2\) wireless monitoring area (\(0<M\le 100m\)). The geographic coordinate of sink node is (M/2, M/2). Then we can get

$$\begin{aligned} \forall CH_{i},0<d_{CH_{i\_to\_sink}}\le \frac{\sqrt{2}}{2}M \end{aligned}$$
(6)

where \(d_{CH_{i\_to\_sink}}\) denotes the distance from the CH i to the sink node. Since \(0<M\le 100m\), then \(d_{CH_{i\_to\_sink}}\le 50 \sqrt{2}\). From (2) the distance threshold \(d_{0}\) is determined by \(\varepsilon _{fs}\) and \(\varepsilon _{mp}\), whose values are 10 pJ/bit/\(m^4\) and 0.0013 pJ/bit/\(m^4\) [8], respectively. Then we can get \(d_{0}\)=87.7m, thus \(d_{CH_{i\_to\_sink}}\le 50 \sqrt{2}\le 87.7\). Based on (1), the energy consumption of sending a l-bits packet from the CH i to the sink node is calculated as

$$\begin{aligned} E_{CH_{i\_to\_sink}}=lE_{elec}+l\varepsilon _{fs}d_{CH_{i\_to\_sink}}^{2}. \end{aligned}$$
(7)

Denote k as the optimal number of CHs, then the average number of sensor nodes in each cluster is n/k. Let \(E_{con\_CH}\) denotes the energy consumption of a CH. \(E_{con\_CH}\) consists of three parts: (a) energy consumption of receiving a l-bits packet from \(n/k-1\) CMs; (b) energy consumption of aggregating n/k packets; (c) send the aggregated l-bits packet to the sink node. Thus, \(E_{con\_CH}\) can be calculated as

$$\begin{aligned} E_{con\_CH}=lE_{elec}(n/k-1))+lE_{DA}n/k +\left( lE_{elec}+l\varepsilon _{fs}d_{CH_{i\_to\_sink}}^{2}\right) . \end{aligned}$$
(8)

Let \(E_{con\_CM}\) denotes the energy consumption of sending a l-bits packet from each CM to the corresponding CH and \(d_{CM\_to\_CH}\) denotes the distance between CM and CH. Since the proposed DLBCA algorithm assigns the near sensor nodes to CHs as their CMs, thus, \(d_{CM\_to\_CH}<d_{0}\). Therefore, the computing method of \(E_{con\_CM}\) is shown as follows:

$$\begin{aligned} E_{con\_CM}=lE_{elec}+l\varepsilon _{fs}d_{CH_{CM\_to\_CH}}^{2}. \end{aligned}$$
(9)

We define a rectangular coordinate whose origin is the CH. Assume the coordinate of a random distributed sensor node is (X,Y) and the distribution density function is \(\rho (x,y)\). Then we can get the following mathematic expectation of the square of \(d_{CM\_to\_CH}\):

$$\begin{aligned} E\left[ d_{CM\_to\_CH}^{2}\right] =\iint _{}^{}\left( x^{2}+y^{2}\right) \rho (x,y)dxdy. \end{aligned}$$
(10)

After transforming the rectangular coordinate into polar coordinates, (10) can be represented as follows:

$$\begin{aligned} E\left[ d_{CM\_to\_CH}^{2}\right] =\iint _{}^{}r^{2}\rho (r,\theta )rdrd\theta . \end{aligned}$$
(11)

Since the area of the network is M and the number of clusters is k, then the area of each cluster is \(M^{2}/k\) and \(\rho (r,\theta )\) is a constant \(k/M^2\) . Assume the cluster is a circle area and the radius of the area is \(M/\sqrt{\varPi K}\). Then (11) can be simplified as

$$\begin{aligned} E\left[ d_{CM\_to\_CH}^{2}\right] =\int _{\theta =0}^{2\varPi }\int _{r=0}^{M/\sqrt{\varPi k}}\frac{k}{M^{2}}r^{3}drd\theta =M^{2}/2k\varPi . \end{aligned}$$
(12)

Therefore, \(E_{con\_CM}\) can be calculated as

$$\begin{aligned} E_{con\_CM}=lE_{elec}+l\varepsilon _{fs}\frac{M^{2}}{2K\varPi }. \end{aligned}$$
(13)

The energy consumption of a cluster is given as

$$\begin{aligned} E_{con\_cluster}=E_{con\_CH}+\frac{n}{k}E_{con\_CM}. \end{aligned}$$
(14)

Based on the above discussion, the total energy consumption of the network can be calculated as

$$\begin{aligned} \begin{aligned} E_{con\_total}&=k*E_{con\_cluster}\\&=2nlE_{elec}+nlE_{DA}+kl\varepsilon _{fs}d_{CH_{i\_to\_sink}}^{2}+nl\varepsilon _{fs}\frac{M^{2}}{2k\varPi }. \end{aligned} \end{aligned}$$
(15)

To minimize the total energy consumption, we take a derivative of \(E_{con\_total}\) with respect to k and equate it to 0. Finally, we can get the optimal number of CHs:

$$\begin{aligned} k=\frac{\sqrt{n}M}{\sqrt{2\varPi d_{CH_{i\_to\_sink}}^{2}}}. \end{aligned}$$
(16)

4.2 CHs selection method

Generally the sensor nodes with more residual energy are more competitive to become a CH. From (1) we can see that the distance is the main factor affecting the energy consumption. Thus, if the total distance between a sensor node and its neighboring nodes is short, then the node is competitive to be selected as a CH. Therefore, the proposed DLBCA algorithm considers residual energy and distance together.

In the network, all the sensor nodes are time synchronous and they have the same duration of round, thus they enter the steady-state phase at the same time. Assume the time of the start of \(\mu\)-th round is \(t_{\mu }\), then we have the following CHs selection steps of the \(t_{\mu }\) round:

  • In our approach, the selected CHs should broadcast their identities. Denote the length of the broadcasted message and the number of the CHs as \(l_{contrl\_packet}\) and k, respectively. Then the total time of broadcasts for all of the CHs can be calculated as

    $$\begin{aligned} t_{total\_adv}=\left( l_{contrl\_packet}+4\right) *(k*4+1). \end{aligned}$$
    (17)
  • In each sensor node, there is a counter \(n_{CH\_heard}\), which is initialed 0 at the start of each set-up phase. \(n_{CH\_heard}\) is incremented by 1 once a node receives the broadcast. Thus, \(n_{CH\_heard}\) can represent the number of current CHs.

  • Each node i \((i=1,2,\ldots ,n)\) computes \(RE_{i}(t_{\mu })\), which is related to the residual energy of node i. \(RE_{i}(t_{\mu })\) is calculated by

    $$\begin{aligned} RE_{i}\left( t_{\mu }\right) =\frac{1}{E_{residual_{i}}\left( t_{\mu }\right) } \end{aligned}$$
    (18)

    where \(E_{residual_{i}}(t_{\mu })\) denotes the residual energy of node i at time \(t_{\mu }\).

  • Based on the node distance matrix ND, each node i calculates the distances from node i to its nearest \(\left\lfloor n/k \right\rfloor -1\) nodes (\(d_{i1}\), \(d_{i2}\), \(\ldots\), \(d_{i(\left\lfloor n/k \right\rfloor -1)}\)). Then, the parameter \(D_{i}\) can be calculated by

    $$\begin{aligned} D_{i}=d_{i\_to\_sink}+\sum _{j=1}^{\left\lfloor n/k \right\rfloor -1}d_{ij} \end{aligned}$$
    (19)

    where \(d_{i\_to\_sink}\) denotes the distance from node i to the sink node.

  • Each node i normalizes \(RE_{i}(t_{\mu })\) and \(D_{i}\), and the calculation method is shown as below:

    $$\begin{aligned} RE_{i}\left( t_{\mu }\right)= & {} \frac{RE_{i}\left( t_{\mu }\right) -RE_{min}}{RE_{max}-RE_{min}}, \end{aligned}$$
    (20)
    $$\begin{aligned} D_{i}= & {} \frac{D_{i}-D_{min}}{D_{max}-D_{min}}. \end{aligned}$$
    (21)

    \(RE_{max}\) and \(RE_{min}\) denotes the maximum and minimum of \(RE_{i}(t_{\mu })\), respectively. \(D_{max}\) and \(D_{min}\) denotes the maximum and minimum of \(D_{i}\), respectively. Then, a weight \(\lambda\) is introduced to combine \(RE_{i}(t_{\mu })\) and \(D_{i}\):

    $$\begin{aligned} \theta _{i}\left( t_{\mu }\right) =\lambda *RE_{i}\left( t_{\mu }\right) +(1-\lambda )*D_{i}. \end{aligned}$$
    (22)
  • Each node i calculates its broadcast time of the \(\mu\)-th round:

    $$\begin{aligned} t_{adv_{i}}\left( t_{\mu }\right) =t_{\mu }+t_{total\_adv}*\theta _{i}\left( t_{\mu }\right) . \end{aligned}$$
    (23)
  • When the time of the network is \(t_{adv_{i}}(t_{\mu })\), if \(n_{CH\_heard}\) is less than k, the node i broadcast that it has been selected as a CH. If \(n_{CH\_heard}\) is greater than k, then the limit of the current CHs has been reached. In this circumstance, the node i becomes a CM and determines its optimal CH, and sends a join-request to its CH.

4.3 Process of assigning CMs

Figure 3 illustrates an example of load unbalance. The network contains two clusters: Cluster1 and Cluster2. They have 8 and 4 sensor nodes, respectively. Obviously CH 1 consumes much more energy than CH 2. Thus, when allocating the CMs, the situation shown in Fig. 3 should be avoided. Since \(d_{1}\) is shorter than \(d_{2}\), the CM 3 is assigned to the CH 1. However, \(d_{2}\) is similar to \(d_{1}\). Hence, assigning the node 3 to the CH 2 can reduce the burden of the CH 1 without improving the energy consumption.

Fig. 3
figure 3

An example of load unbalance

During the operation of the network, some nodes may die after bursting the initial energy. From (16), we can learn that the dead nodes can influence the number of optimal CHs. Meanwhile, the dead nodes can also influence the allocation of CMs. Thus, in the set-up phase of each round, the dead nodes need to be recognized firstly. After the selection of CHs, all the living nodes need to run the allocation of CMs. The CHs firstly calculate the identity ID of their own CMs and wait for the request from the CMs (e.g. the CMs of the CH i is 2, 5, 6 and 9). During this process, the CMs calculate their own CHs and send requests. If a CH does not receive the request of certain CMs (e.g. the CH i only receives the request of the CMs 2, 6 and 9), then it considers that the corresponding CMs (e.g. the CM 5) are dead. Afterward, the sink node collects the dead nodes IDs from the CHs, and then broadcast all the IDs in the start of next round. The process of assigning the CMs is shown as follows:

1.:

Calculate the number of CMs of each cluster. Set the number of dead nodes \(n_{dead}\), then each sensor node i \((i=1,2,\ldots ,n-n_{dead})\) can get the following number of CMs of each cluster:

$$\begin{aligned} n_{CM}=\left\lfloor (n-n_{dead})/k \right\rfloor -1. \end{aligned}$$
(24)
2.:

Set up a node relationship matrix (Flag) and a relationship distance matrix (FlagDis) through the node distance matrix ND. Firstly, the size of Flag and FlagDis are n rows and n columns where n denotes the number of sensor nodes in the network, and the values in the two matrixes are initialized to zero. In current time, all of the nodes have received the IDs of the CHs after the selection of CHs. If the node i is a CH, choose the smallest \(n_{CM}\) values from the i-th row of the ND matrix, and update the matrixes: Flag and FlagDis. Figure 4 illustrates an example of how FlagDis is updated after Step2. In Fig. 4, the nodes 1, 3 and 8 (the first, third and eighth rows) are CHs. From (24) we can learn \(n_{CM}\) is 2. Then from the ND matrix, choose the smallest 2 values of the first, third and eighth rows, respectively. For instance, the smallest value of the first row of ND matrix is ND(1,6)=0.40, then we set the FlagDis matrix: FlagDis(1,6)=0.04, and set the Flag matrix: Flag(1,6)=1.

3.:

Mark the dead nodes and the CHs. In the Flag and the FlagDis matrix, set the values of the columns and the rows of the dead nodes to 0, and set the values of the columns of the CHs to 0. Let \(n_{residual}\) denotes the number of nonzero values of the columns of CHs. After this step, \(n_{residual}\) of some CHs may be less than \(n_{CM}\). Thus, we need to choose the smallest \(n_{CM}-n_{residual}\) values from the candidate CMs columns of the corresponding rows of the ND matrix, and set the corresponding cells of Flag and FlagDis to 1 and the same values.

4.:

In the Flag matrix, sum the columns that represent the living non-CH nodes, and set up a Judge0 array and a Judge1 array. If the sum of a column is equal to 0, add the column into Judge0 ordinally; if the sum of a column is greater than 1, add the column into Judge1 ordinally. Thus, Judge0 stores the IDs of the nodes that are not connected with any CHs, and Judge1 stores the IDs of the nodes that connect with at least two CHs.

5.:

Empty the Judge0 and the Judge1, the process is illustrated in Algorithm1. Firstly, the algorithm judges whether Judge1 is not empty (Line 2). If Judge1 is not empty, return all IDs of the CHs that connect with the first node of Judge1, and add them into the array Multi_CH (Line 3). If the number of the elements in Multi_CH is greater than 1 (Line 4), find the farthest CH to the first node of Judge1, and use MAX_CH to denote the ID of the CH (Line 5). Set Flag(MAX_CH, Judge1(1)) and FlagDis(MAX_CH, Judge1(1)) to 0 (Lines 6-7). After that, if Judge0 is not empty(Line 8), select the nearest node to MAX_CH from Judge0 and use MIN_CM to denote the nearest node (Lines 9-10). Use MIN_CM to replace the deleted member node of MAX_CH, and update Flag, FlagDis and Judge0 (Lines 11-13). Next, update MAX_CH (Line 15). After empty MAX_CH, delete the first element of Judge1 (Line 17). After empty Judge1, judge whether Judge0 is not empty (Line 19). If it is not empty, select the nearest CH to the first node of Judge0 and use MIN_CH to denote the ID of the CH (Line 20), and then update Flag, FlagDis and Judge0 (Lines 21-23).

figure e
Fig. 4
figure 4

An example of how FlagDis is updated after Step2

4.4 Further optimization of clusters

After assigning the CMs for the CHs preliminarily, the distances between some CMs and CHs may be too long. Thus, the clustering structure needs to be further optimized and the optimization process is illustrated as follows:

1.:

Calculate the maximum of FlagDis and denote as FlagDis(\(CH_{a}\),\(CM_{b}\)). Through this step, we can find the connection, which consumes the highest energy. Then, Set Flag(\(CH_{a}\),\(CM_{b}\)) and FlagDis(\(CH_{a}\),\(CM_{b}\)) to 0.

2.:

Find the nearest CH to the node \(CM_{b}\) from the ND matrix, and denote as \(CH_{c}\). Set Flag(\(CH_{c}\),\(CM_{b}\)) and FlagDis(\(CH_{c}\),\(CM_{b}\)) to 1 and ND(\(CH_{c}\),\(CM_{b}\)), respectively.

3.:

In Step1 \(CH_{a}\) has been reduced a CM node and in Step2 \(CH_{c}\) has been increased a CM node. Thus, to guarantee the balance of the loads, we need to select a node among the CMs of \(CH_{c}\) and assign it to \(CH_{a}\). Select the farthest node to the \(CH_{c}\) from FlagDis and denote the node \(CM_{d}\). If it meets ND(\(CH_{a}\),\(CM_{d}\))+ND(\(CH_{c}\),\(CM_{b}\))<ND(\(CH_{a}\),\(CM_{b}\))+ND(\(CH_{c}\),\(CM_{d}\)), go to Step4. Otherwise, go to Step5.

4.:

Set Flag(\(CH_{a}\),\(CM_{d}\)) and Flag(\(CH_{c}\),\(CM_{d}\)) to 1 and 0, respectively. Set FlagDis(\(CH_{a}\),\(CM_{d}\)) and FlagDis(\(CH_{c}\),\(CM_{d}\)) to ND(\(CH_{a}\),\(CM_{d}\)) and 0, respectively. Then, go to Step 6.

5.:

Go back to Step3 and continue to select the farthest node to \(CH_{c}\) except \(CM_{d}\). If the algorithm has not gone into Step4 when all of the member nodes of \(CH_{c}\) are gone through, return Flag and FlagDis to the values of the end of Step1. Then go back to Step2, select the nearest CH to \(CM_{b}\) except \(CH_{c}\). If the algorithm has not gone into Step4 when all of the CHs of \(CM_{b}\) are gone through, return Flag and FlagDis to the values of the end of Step1. Then, go to Step6.

6.:

Go back to Step1 and loop at least k times and until the maximum of FlagDis is less than the distance threshold \(d_{0}\).

4.5 Overall description of DLBCA

Figure 5 illustrates the overall process of the proposed DLBCA. When the network goes into the first round (\(\mu =1\)), each sensor node i (\(i=1,2,\ldots ,n\)) initiates \(n_{CH\_heard}\) to 0 and calculates the time \(t_{adv_{i}}(t_{\mu })\) through (16). \(n_{CH\_heard}\) denotes the number of CHs that the node i has heard and \(t_{adv_{i}}(t_{\mu })\) denotes the time that the node i needs to broadcast its CH identity. When the network time is \(t_{adv_{i}}(t_{\mu })\), the node i checks the value of \(n_{CH\_heard}\) (During the total broadcast procedure, once the node receives a broadcast, \(n_{CH\_heard}\) is increased by 1.). If \(n_{CH\_heard}<k\), the node i broadcasts that it is selected as the CH and continues to collect the broadcasted messages. Then after the broadcast procedure, the CH i determines the clustering structure (see Sects. 4.3 and 4.4) and waits the request from its CMs. Otherwise if \(n_{CH\_heard}\ge k\), the node i is selected as a CM and calculates its own CH, and then send a request to the CH. After collecting the request of the CMs, the CHs decide the CMs which did not send the request to be dead nodes, and keep down the IDs of the dead nodes (\(ID_{dead}\)).

Fig. 5
figure 5

Overall process of the proposed DLBCA

Then the network goes into the steady-state phase, the CMs collect the physical or environmental information, and send the information to their CHs. The CHs aggregate the received data and send the data together with \(ID_{dead}\) to the sink node. When the first round ends, \(\mu =\mu +1\). If all of the sensor nodes are dead, i.e. \(n-n_{dead}=0\), then the algorithm DLBCA terminates. Otherwise, the network goes into the set-up phase of the next round. The sink node broadcasts the new IDs of the dead nodes of previous round. Then each sensor node replaces the value of n in (16) with \(n-n_{dead}=0\) and updates the number of optimal number of CHs. After that, the sensor nodes initiate \(n_{CH\_heard}\) to 0 again and repeat the steps of the previous round.

5 Experimental result

We use MATLAB R2014a and C programming language to simulate the proposed DLBCA and compare it to the clustering protocols LEACH [8] and LEACH-MAC [9]. Metrics of comparison contains weight \(\lambda\), number of CHs, variance of loads of CHs, network life cycle, and energy consumption. Table 4 illustrates the parameters used in the experiment. Meanwhile, the experiments are take in four \(100*100\,{\rm m}^2\) scenarios (WSN#1, WSN#2, WSN#3 and WSN#4) where 100 sensor nodes are randomly distributed. They have 5%, 10%, 15% and 20% advance nodes, respectively. In addition, in the proposed DLBCA the sensor nodes calculate the number of CHs of each round through (16). In this equation, \(d_{CH_{i\_to\_sink}}\) is calculated as the average distance of each living sensor to the sink node.

Table 4 Parameters used in the experiment

5.1 Comparison of weight \(\lambda\)

The proposed DLBCA determines the CHs based on \(\theta _{i}(t_{\mu })\) in (23). From (22) we can learn that the value of \(\theta _{i}(t_{\mu })\) depends on the weight \(\lambda\). Thus, we need to select the suitable \(\lambda\) through comparison experiment. Since the target of DLBCA is to prolong the network life cycle, we select \(\lambda\) that corresponding to longer network life cycle. The definition of network life-cycle usually has three modes: FND (First Node Death), HNA (Half Nodes Alive) and LND (Last Node Death). Table 5 illustrates the network life cycle of different \(\lambda\) values under scenarios WSN#1, WSN#2, WSN#3 and WSN#4. Figure 6 shows the number of living nodes of different \(\lambda\) values under the scenarios.

Table 5 Network life cycle of different \(\lambda\) values under scenarios WSN#1, WSN#2, WSN#3 and WSN#4
Fig. 6
figure 6

Comparison in terms of the number of living nodes of different \(\lambda\) values under different scenarios

From Table 5 and Fig.  6 we can get the following conclusions. (a) The first node dead time for \(\lambda =0.6\), \(\lambda =0.7\), \(\lambda =0.8\), \(\lambda =0.9\) and \(\lambda =1\) could be extended. However, once a dead node appears, the residual nodes in the network will be depleted quickly. (b) The HNA and LND of \(\lambda =0\) are high, but dead nodes appear from the 162th round. (c) When compared with \(\lambda =0.4\) and \(\lambda =0.5\), the FND and HNA of \(\lambda =0.1\), \(\lambda =0.2\) and \(\lambda =0.3\) are low. (d) The last node dead time of \(\lambda =0.1\), \(\lambda =0.2\) and \(\lambda =0.3\) are extended. However, only several living nodes exist at about 1250th round when \(\lambda =0.1\), \(\lambda =0.2\) and \(\lambda =0.3\). The reason for the high LND of \(\lambda =0.1\), \(\lambda =0.2\) and \(\lambda =0.3\) mainly because the energy heterogeneity of advance nodes has been underused, which causes all the living nodes of this period are advance nodes. But only several nodes alive are not crucial for the whole network. Therefore, based on the above analyses, \(\lambda =0.4\) and \(\lambda =0.5\) can get the optimal results. After compare the experimental results of \(\lambda =0.4\) and \(\lambda =0.5\), we can find the FND, HNA, and LND of \(\lambda =0.5\) are better. Thus, the weight \(\lambda\) of the proposed DLBCA is set as 0.5 in later comparison experiments.

5.2 Number of CHs

The number of CHs is related to the energy consumption in the clustering protocols of WSNs. If the number of CHs is too high, the aggregation energy may be higher than the saved energy in some nodes because they have very few loads. If the number of CHs is too few, the nodes will consume a large amount of energy due to high loads. Thus, suitable number of CHs can improve the energy efficiency of network. Figure 7 illustrates the number of CHs of DLBCA, LEACH, and LEACH-MAC under different scenarios (WSN#1, WSN#2, WSN#3 and WSN#4). From Fig. 7, we can see that changes in the number of CHs of DLBCA are less. When there are no dead nodes, the number of CHs of DLBCA maintains a stable value of 11. And since LEACH selects the CHs randomly, the fluctuation of the number of CHs is most obvious. When all the nodes are alive, its number of CHs changes from 1 to 21 dramatically. In addition, LEACH-MAC is better than LEACH, its number of CHs maintains at 5 in most times. When its initial number of CHs is less than 5, the final number of CHs cannot remain stable.

Fig. 7
figure 7

Comparison of DLBCA with LEACH and LEACH-MAC in terms of number of CHs under different scenarios

5.3 Comparison of load balancing

Figures 8, 9 and 10 gives the comparison of clustering distribution of DLBCA, LEACH, and LEACH-MAC. The filled circles denote the CHs, the hollow circles denote the CMs, and the blue lines denote the communication relationships between two nodes. Fig. 8a–c show the clustering distribution of DLBCA in some rounds. From the experimental results, we can see the distribution of CHs of DLBCA is more even and the numbers of the clusters are more balanced when compared to LEACH and LEACH-MAC. For instance, in Fig. 8a the network has 11 clusters. The cluster with maximum load has 13 sensors, and the cluster with minimum load has 7 sensors. The differences of the numbers of loads are very small. In addition, there is no single node in the network. Figure 9a–c show the clustering distribution of LEACH in some rounds. The protocol selects CHs randomly, so the differences of the number of CHs are big in different rounds. Meanwhile, the CMs choose the nearest CH that has the strongest broadcast. This results in unbalanced loads in the CHs. For instance, in Fig. 9a the network only contains two CHs, and one of which has up to 74 CMs. This circumstance leads to acute energy consumption. By contrast, in Fig. 9b there are 17 CHs. Too many CHs leads to some single nodes, so the data aggregation function of CHs has been underused. In Fig. 9c the loads of CHs are very imbalanced. The cluster with the most loads has 27 CMs, and the cluster with the fewest loads just has 0 CMs. Fig. 10a–c show the clustering distribution of LEACH-MAC in some rounds. Same to LEACH, LEACH-MAC also has the load unbalance problem since the CMs choose the nearest CHs. For instance, in Fig. 10b the cluster with the most loads has up to 42 CMs, while the cluster with the fewest loads just has 1 CMs. Meanwhile, LEACH-MAC selects the initial CHs randomly. Hence, it cannot avoid the circumstance that the number of CHs is very low. For instance, in Fig. 10c the network only has three clusters, one of which has up to 67 sensors.

Fig. 8
figure 8

Clustering distribution of DLBCA

Fig. 9
figure 9

Clustering distribution of LEACH

Fig. 10
figure 10

Clustering distribution of LEACH-MAC

Figure 11 illustrates the variance of the loads of CHs of DLBCA, LEACH, and LEACH-MAC under different scenarios. From the figures, we can see that DLBCA is most load-balancing. Its variances of loads of CHs are lowest in all rounds and the variances keep less than 25. The load balancing of LEACH-MAC is worse and its mean value of the variances is about 50. Meanwhile, LEACH has the worst load balancing. The maximum of the variances is more than 300, and the mean value is up to about 150.

Fig. 11
figure 11

Comparison of DLBCA with LEACH and LEACH-MAC in terms of load balancing under different scenarios

5.4 Network life cycle and energy consumption

Network life cycle and energy consumption are two important metrics in testing the performance of clustering algorithms of WSNs. Figure 12 illustrates the network life cycle of DLBCA, LEACH, and LEACH-MAC under different scenarios (WSN#1, WSN#2, WSN#3 and WSN#4). From Fig. 12 we can see that the proposed DLBCA obviously performs better than the others. LEACH has the shortest network life cycle. For LEACH, the first node dead time in WSN#1 and WSN#4 are about 700th round and 650th round, respectively. Meanwhile, in all the scenarios, half nodes of LEACH die in about 750th round. However, no dead nodes appear for LEACH-MAC and DLBCA in about 750th round. As it can be learnt from Fig. 12, the first node dead time of LEACH-MAC is about 50-100 rounds earlier than DLBCA. Thus, LEACH-MAC performs worse than DLBCA.

Fig. 12
figure 12

Comparison of DLBCA with LEACH and LEACH-MAC in terms of network life cycle under different scenarios

Figure 13 illustrates the comparison of the amount of energy consumption of DLBCA, LEACH, and LEACH-MAC under different scenarios. From Fig. 13, we can see that DLBCA performs best in energy efficiency. The energy consumption of LEACH-MAC in each round is higher than DLBCA. The energy consumption of LEACH is the highest. Thus, the energy efficiency of LEACH is the worst.

Fig. 13
figure 13

Comparison of DLBCA with LEACH and LEACH-MAC in terms of energy consumption under different scenarios

6 Conclusion

This paper presents a distributed and load balancing hierarchical clustering algorithm. Each sensor node determines its own role and calculates the clustering structure without depending on the central node. When the network goes into the set-up phase, the sensors calculate the broadcast time \(t_{adv_{i}}(t_{\mu })\) based on the residual energy and distance to other sensors. When the time is \(t_{adv_{i}}(t_{\mu })\) and the current number of CHs is less than the specified number, the node broadcasts a message that it is being selected as a CH. In addition, this paper presents three matrixes: ND, Flag and FlagDis. The matrixes are designed to store the relationships between the sensor nodes and the distances between the nodes that have relationships. By means of the matrixes, the sensor nodes can determine the clustering structure and assign the appropriate CMs for CHs. Hence, the numbers of the assigned CMs in the clusters are balanced. Experimental results show that the proposed method is load balanced and energy efficient, and has longer network life cycle compared to LEACH and LEACH-MAC.