Abstract
This paper investigates the distributed cooperative learning (DCL) problems over networks, where each node only has access to its own data generated by the unknown pattern (map or function) uniformly, and all nodes cooperatively learn the pattern by exchanging local information with their neighboring nodes. These problems cannot be solved by using traditional centralized algorithms. To solve these problems, two novel DCL algorithms using wavelet neural networks are proposed, including continuous-time DCL (CT-DCL) algorithm and discrete-time DCL (DT-DCL) algorithm. Combining the characteristics of neural networks with the properties of the wavelet approximation, the wavelet series are used to approximate the unknown pattern. The DCL algorithms are used to train the optimal weight coefficient matrix of wavelet series. Moreover, the convergence of the proposed algorithms is guaranteed by using the Lyapunov method. Compared with existing distributed optimization strategies such as distributed average consensus (DAC) and alternating direction method of multipliers (ADMM), our DT-DCL algorithm requires less information communications and training time than ADMM strategy. In addition, it achieves higher accuracy than DAC strategy when the network consists of large amounts of nodes. Moreover, the proposed CT-DCL algorithm using a proper step size is more accurate than the DT-DCL algorithm if the training time is not considered. Several illustrative examples are presented to show the efficiencies and advantages of the proposed algorithms.
Similar content being viewed by others
References
Chen W, Hua S, Ge SS (2014) Consensus-based distributed cooperative learning control for a group of discrete-time nonlinear multi-agent systems using neural networks. Automatica 50(9):2254–2268
Boyd S, Parikh N, Chu E et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Li J, Lin X, Rui X et al (2014) A distributed approach toward discriminative distance metric learning. IEEE Trans Neural Netw Learn Syst 26(9):2111–2122
Tekin C, Van der SM (2015) Distributed online learning via cooperative contextual bandits. Signal Process IEEE Trans 63(14):3700–3714
Chen T, Wang C, Hill DJ (2014) Rapid oscillation fault detection and isolation for distributed systems via deterministic learning. IEEE Trans Neural Netw Learn Syst 25(6):1187–1199
Mertikopoulos P, Belmega EV, Moustakas AL et al (2012) Distributed learning policies for power allocation in multiple access channels. Onomr 30(1):96–106
Predd JB, Kulkarni SR, Vincent PH (2005) Distributed learning in wireless sensor networks. IEEE Signal Process Mag 23(4):56–69
Scardapane S, Wang D, Panella M (2016) A decentralized training algorithm for echo state networks in distributed big data applications. Neural Netw 78:65–74
Georgopoulos L, Hasler M (2014) Distributed machine learning in networks by consensus. Neurocomputing 124(2):2–12
Chen J, Sayed AH (2011) Diffusion adaptation strategies for distributed optimization and learning over networks. IEEE Trans Signal Process 60(8):4289–4305
Chen W, Hua S, Zhang H (2015) Consensus-based distributed cooperative learning from closed-loop neural control systems. IEEE Trans Neural Netw Learn Syst 26(2):331–345
Ai W, Chen W, Xie J (2016) A zero-gradient-sum algorithm for distributed cooperative learning using a feedforward neural network with random weights. Inf Sci 373:404–418
Ai W, Chen W, Xie J (2016) Distributed learning for feedforward neural networks with random weights using an event-triggered communication scheme. Neurocomputing 224:184–194
Olfati-Saber R, Fax JA, Murray RM (2007) Consensus and cooperation in networked multi-agent systems. Proc IEEE 95(1):215–233
Xiao L, Boyd S, Kim SJ (2007) Distributed average consensus with least-mean-square deviation. J Parallel Distrib Comput 67(1):33–46
Aysal T, Coates M, Rabbat M (2008) Distributed average consensus with dithered quantization. IEEE Trans Signal Process 56(10):4905–4918
Lim C, Lee S, Choi JH et al (2014) Efficient implementation of statistical model-based voice activity detection using Taylor series approximation. IEICE Trans Fundam Electron Commun Comput Sci 97(3):865–868
Sharapudinov II (2014) Approximation of functions in variable-exponent Lebesgue and Sobolev spaces by finite Fourier–Haar series. Russ Acad Sci Sb Math 205(205):145–160
Huang GB, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67
Yang C, Jiang K et al (2017) Neural control of bimanual robots with guaranteed global stability and motion precision. IEEE Trans Ind Inf 13(3):1162–1171
Yang C, Wang X et al (2017) Neural-learning based telerobot control with guaranteed performance. IEEE Trans Cybern (in press). doi:10.1109/TCYB.2016.2573837
Cui R, Yang C et al (2017) Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE Trans Syst Man Cybern Syst 47(6):1019–1029
Wu S, Er MJ (2000) Dynamic fuzzy neural networks-a novel approach to function approximation. IEEE Trans Syst Man Cybern Part B Cybern A Publ IEEE Syst Man Cybern Soc 30(2):358–364
Ferrari S, Stengel RF (2005) Smooth function approximation using neural networks. IEEE Trans Neural Netw 16(1):24–38
Yang C, Yi Z, Zuo L (2008) Function approximation based on twin support vector machines. In: IEEE conference on cybernetics and intelligent systems, pp 259–264
Zhang Q, Benveniste A (1991) Approximation by nonlinear wavelet networks. In: International conference on acoustics, speech and signal processing. ICASSP-91. pp 3417–3420
Zhang Q, Benveniste A (1992) Wavelet networks. IEEE Trans Neural Netw 3(6):889
Delyon B, Juditsky A, Benveniste A (1995) Accuracy analysis for wavelet approximations. IEEE Trans Neural Netw 6(2):332–348
Zainuddin Z, Pauline O (2011) Modified wavelet neural network in function approximation and its application in prediction of time-series pollution data. Appl Soft Comput 11(8):4866–4874
Zainuddin Z, Ong P (2013) Design of wavelet neural networks based on symmetry fuzzy C-means for function approximation. Neural Comput Appl 23(1):247–259
Zainuddin Z, Ong P (2016) Optimization of wavelet neural networks with the firefly algorithm for approximation problems. Neural Comput Appl 28(7):1715–1728
Hou M, Han X, Gan Y (2009) Constructive approximation to real function by wavelet neural networks. Neural Comput Appl 18(8):883–889
Oysal Y, Yilmaz S (2010) An adaptive wavelet network for function learning. Neural Comput Appl 19(3):383–392
Xu J, Yan R (2011) Adaptive learning control for finite interval tracking based on constructive function approximation and wavelet. IEEE Trans Neural Netw 22(6):893–905
Alexandridis AK, Zapranis AD (2013) Wavelet neural networks: a practical guide. Neural Netw 42(1):1–27
Yang C, Wang X et al (2016) Teleoperation control based on combination of wave variable and neural networks. IEEE Trans Syst Man Cybern Syst (in press). doi:10.1109/TSMC.2016.2615061
Chen S, Zhao H, Zhang S et al (2013) Study of ultra-wideband fuze signal processing method based on wavelet transform. IET Radar Sonar Navig 8(3):167–172
Courroux S, Chevobbe S, Darouich M et al (2013) Use of wavelet for image processing in smart cameras with low hardware resources. J Syst Archit 59(10):826–832
Pavez E, Silva JF (2012) Analysis and design of wavelet-packet cepstral coefficients for automatic speech recognition. Speech Commun 54(6):814–835
Siddiqi MH, Lee SW, Khan AM (2014) Weed image classification using wavelet transform, stepwise linear discriminant analysis and support vector machines for an automatic spray control system. J Inf Sci Eng 30(4):1227–1244
Yan R, Gao RX, Chen X (2014) Wavelets for fault diagnosis of rotary machines: a review with applications. Signal Process 96(5):1–15
Ganjefar S, Tofighi M (2015) Single-hidden-layer fuzzy recurrent wavelet neural network: applications to function approximation and system identification. Inf Sci 294:269–285
Bazaraa MS, Goode JJ (1973) On symmetric duality in nonlinear programming. Oper Res 21(21):1–9
Lu J, Tang CY (2011) Zero-gradient-sum algorithms for distributed convex optimization: the continuous-time case. IEEE Trans Autom Control 57(9):5474–5479
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant Numbers: 61503292, 61673308, 61673014), which made it possible to undertake this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no conflicts of interest.
Additional information
This work is supported by the National Natural Science Foundation of China (Grant numbers: 61503292, 61673308, 61673014).
Appendices
Appendix A
Proof of Theorem 1
For convenience of analyzing the convergence of the CT-DCL algorithm, we rewrite algorithm (20) in a matrix form as
where \(W(t)=[W_1^T(t),W_2^T(t),...,W_N^T(t)]^T\in {\mathbf{R}}^{mlN\times 1}\), \(H=diag \{H_1,H_2,\ldots,H_N\}\in {\mathbf{R}}^{mN_iN\times mlN}\), \(\sigma =diag\{\sigma _1,\sigma _2,\ldots,\sigma _N\}\in {\mathbf{R}}^{N\times N}\), \(Y=[Y_1^T,Y_2^T,\ldots,Y_N^T]^T\in {\mathbf{R}}^{mN_iN\times 1}\), \({\mathcal {L}}\) is the Laplacian matrix of the graph \({\mathcal {G}}\).
Proof
Consider CT-DCL algorithm (20), the following Lyapunov function candidate is constructed:
where \(W^*,W_i \in {\mathbf{R}}^{ml\times 1}\), \(V:{\mathbf{R}}^{ml}\rightarrow {\mathbf{R}}\).
It is easy to verify that
In addition, the following inequality holds (Proof see “Appendix C''):
Now, we are in the position to give the main result on the convergence of algorithm (20). Consider Lyapunov function candidate (34). Then, along the solution of (20), we have
Since graph \({\mathcal {G}}\) is undirected and connected, we deduce that \(\sum _{i=1}^{N}(H_i^TH_i+\sigma _iI_{ml})\dot{W_i}(t)=\gamma \sum _{i=1}^{N}\sum \nolimits _{j\in {\mathcal {N}}_{i}}a_{ij}\left( W_j(t)-W_i(t)\right) =0\) according to CT-DCL algorithm (20). Then,
According to inequality (36), we have \(W(t)^T({\mathcal {L}}\otimes I_{ml})W(t)\ge \frac{2\lambda _2}{\overline{\varTheta }} V(W)\). Therefore,
Integrating both sides of (39) from 0 to t leads to
According to inequality (35), we have
It will be seen from this that \(W_i(t)\) exponential convergence to \(W^*\) when \(t\rightarrow \infty\). This implies that problem (18) \(W^*=\arg \min \nolimits _{W_i}(G(W_i)\) is solved. Then, our goal \(\lim \nolimits _{t\rightarrow \infty }W_i(t)=W^*,\forall i\in {\mathcal {V}}\) is achieved by CT-DCL algorithm (20). \(\tilde{f_i}^*(x)=Q_i(x)W^*\) is the estimation function of original function \(f_i(x)\).
The proof is completed. \(\square\)
Appendix B
Proof of Theorem 2
We also rewrite DT-DCL algorithm (23) in a matrix form as
where \(W(k),H,\sigma\) and Y are similar to the form in CT-DCL algorithm (33).
Proof
Consider DT-DCL algorithm (23), the discrete form of Lyapunov function candidate (34) is as follows:
The following two inequalities are still established under discrete form:
Now, we are in the position to give the main result on the convergence of algorithm (23). Consider Lyapunov function candidate (43), whose difference is given by
Under the discrete form, \(\sum\nolimits _{i=1}^{N}(H_i^TH_i+\sigma _iI_l)\left( {W_i}(k+1)-{W_i}(k)\right) =\gamma \sum\nolimits _{i=1}^{N}\sum \nolimits _{j\in {\mathcal {N}}_{i}}a_{ij}\left( W_j(k)-W_i(k)\right) =0\) according to DT-DCL algorithm (23). Then,
By increasing and decreasing function terms, we have
Due to (42), we obtain
where \(\bar{\lambda }=\lambda _{min}(H^TH+\sigma \otimes I_{ml})\).
Because of \({\mathcal {L}}^2\le \eta {\mathcal {L}}\), where \(\eta =\lambda _{max}({\mathcal {L}})\). Then, it follows that
If \(\gamma\) can be chosen such that \(0<\gamma <\frac{\bar{\lambda }}{\eta }\), \(\{V\left( {W}(k)\right) \}_{k=0}^{\infty }\) is nonnegative and non-increasing.
Due to (45), we obtain
Then,
where \(\varepsilon =1-\frac{2 \gamma \lambda _2}{\overline{\varTheta }}(1-\frac{\eta \gamma }{\bar{\lambda }})\).
Because of \(0<\gamma <\frac{\bar{\lambda }}{\eta }\), then, \(0<1-\frac{\eta \gamma }{\bar{\lambda }}<1\). And because \(\{V\left( {W}(k)\right) \}_{k=0}^{\infty }\) is nonnegative and non-increasing, thus \(0<\frac{2 \gamma \lambda _2}{\overline{\varTheta }}(1-\frac{\eta \gamma }{\bar{\lambda }})<1\), so \(\gamma\) just need can be chosen such that \(0<\gamma <\frac{\,\overline{\varTheta }\,}{\,2\lambda _2\,}\).
Therefore, based on the above analysis, if \(\gamma\) can be chosen such that \(0<\gamma <\min \{\frac{\,\overline{\varTheta }\,}{\,2\lambda _2\,},\frac{\bar{\lambda }}{\eta }\}\), then \(0<\varepsilon <1\), and \(\{V\left( {W}(k)\right) \}_{k=0}^{\infty }\) is nonnegative and non-increasing.
Due to (52), we obtain
According to inequality (44), we have
Because of \(0<\varepsilon <1\), then, our goal \(\lim \nolimits _{k\rightarrow \infty }W_i(k)=W^*\), \(\forall i\in {\mathcal {V}}\) is achieved by DT-DCL algorithm (23). \(\tilde{f_i}^*(x)=Q_i(x)W^*\) is the estimation function of original function \(f_i(x)\).
The proof is completed. \(\square\)
Appendix C
Proof of the inequality (36)
Proof
From optimization problem (18), we have \(g_i(W_i)=\frac{1}{2}\left( \parallel Y_i-H_iW_i\parallel ^2 +\sigma _i \parallel W_i\parallel ^2\right)\), then
Thus, the following inequality holds [44]:
where \(\tilde{\mathcal {L}}\in {\mathbf{R}}^{N\times N}\) is the Laplacian matrix of the complete graph \(\tilde{{\mathcal {G}}}\).
For the undirected and connected graph \({\mathcal {G}}\), the following inequality holds [44]:
Substituting (57) into (56), we have
The proof is completed. \(\square\)
Rights and permissions
About this article
Cite this article
Xie, J., Chen, W. & Dai, H. Distributed cooperative learning algorithms using wavelet neural network. Neural Comput & Applic 31, 1007–1021 (2019). https://doi.org/10.1007/s00521-017-3134-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3134-1