Skip to main content
Log in

Deep Unfolded Extended Conjugate Gradient Method for Massive MIMO Processing with Application to Reciprocity Calibration

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In this paper, we consider deep unfolding the standard iterative conjugate gradient (CG) algorithm to solve a linear system of equations. Instead of being adjusted with known rules, the parameters are learned via backpropagation to yield the optimal results. However, the proposed unfolded CG (UCG) is extended wherein a scalar parameter is substituted by a matrix-parameter to augment the degrees of freedom per layer. Once the training is completed, the UCG has revealed to require far a smaller number of layers than the number of iterations needed using the standard iterative CG. It is also shown to be very robust to noise and outperforms the standard CG in low signal to noise ratio (SNR) region. A key merit of the proposed approach is the fact that no explicit training data is dedicated to the learning phase as the optimization process relies on the residual error which is not explicitly expressed as a function of the desired data. As an example, the proposed UCG is applied to solve the reciprocity calibration problem encountered in massive MIMO (Multiple-Input Multiple-Output) systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

References

  1. Gentle, J. E. (2007). Matrix algebra: Theory, computations and applications in statistics, Springer, 978-0-387-70872-0.

  2. O’Shea, T., & Hoydis, J. (2017). An introduction to deep learning for the physical layer. IEEE Transactions on Cognitive Communications and Networking, 3(4), 563–575.

    Article  Google Scholar 

  3. Samuel, N., Diskin, T. and Wiesel, A. (2017). Deep MIMO detection. 2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Sapporo, pp. 1–5.

  4. O’Shea, T. J., Corgan, J., and Clancy, T. C. (2016). Unsupervised representation learning of structured radio communication signals. In: Proc IEEE Int Workshop Sensing, Processing and Learning for Intelligent Machines (SPLINE), pp. 1–5.

  5. Gruber, T., Cammerer, S., Hoydis, J., and ten Brink, S. (2017). On deep learning based channel decoding. In: Proc IEEE 51st Annu Conf Inf Sciences Syst (CISS), pp. 1–6.

  6. Schenk, T. (2008). RF imperfections in high-rate wireless systems: Impact and digital compensation. Springer Science & Business Media.

  7. Chen, Y.-H., Krishna, T., Emer, J. S., & Sze, V. (2017). Eyeriss: An energy efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138.

    Article  Google Scholar 

  8. Raj, V., & Kalyani, S. (2018). Backpropagating through the air: Deep learning at physical layer without channel models. IEEE Communications Letters, 22(11), 2278–2281.

    Article  Google Scholar 

  9. Ye, H., Li, G. Y., & Juang, B. (2018). Power of deep learning for channel estimation and signal detection in OFDM systems. IEEE Wireless Communications Letters, 7(1), 114–117.

    Article  Google Scholar 

  10. Borgerding, M., and Schniter, P. (2016). Onsager-corrected deep learning for sparse linear inverse problems. 2016 IEEE global conference on signal and information processing (GlobalSIP). IEEE.

  11. Neev, S., Diskin, T., & Wiesel, A. (2019). Learning to detect. IEEE Transactions on Signal Processing, 67(10), 2554–2564.

    Article  MathSciNet  Google Scholar 

  12. Boccardi, F., Heath, R. W., Lozano, A., Marzetta, T. L., & Popovski, P. (2014). Five disruptive technology directions for 5G. IEEE Communications Magazine, 52(2), 74–80.

    Article  Google Scholar 

  13. Marzetta, T. L. (2010). Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Transactions on Wireless Communications, 9(11), 3590–3600.

    Article  Google Scholar 

  14. Hoydis, J., Hosseini, K., Brink, S. T., & Debbah, M. (2013). Making Smart Use of Excess Antennas: Massive MIMO, Small Cells, and TDD. The Bell Labs Technical Journal, 18(2), 5–21.

    Article  Google Scholar 

  15. Ngo, H. Q. (2015). Massive MIMO: Fundamentals and System Designs, Ph.D. Thesis, Linköping University Electronic Press.

  16. Shariati, N., Björnson, E., Bengtsson, M., & Debbah, M. (2014). Low-complexity Polynomial Channel estimation in large-scale MIMO with arbitrary statistics. IEEE Journal of Selected Topics in Signal Processing, 8(5), 815–830.

    Article  Google Scholar 

  17. Yin, H., Gesbert, D., Filippou, M., & Liu, Y. (2013). A coordinated approach to channel estimation in large-scale multiple-antenna systems. IEEE Journal on Selected Areas in Communications, 31(2), 264–273.

    Article  Google Scholar 

  18. Bourdoux, A. and Van der Perre, L. (2015). Analysis of non-reciprocity impact and possible solutions. Technical report, MAMMOET project, ref. no. ICT-619086/D2.4.

  19. Vieira, J., Rusek, F., and Tufvesson, F. (2014). Reciprocity calibration methods for massive MIMO based on antenna coupling. 2014 IEEE global communications conference, Austin, TX, pp. 3708–3712.

  20. Kaltenberger, F., Jiang, H., Guillaud, M., and Knopp, R. (2010). Relative channel reciprocity calibration in MIMO/TDD systems. In: Future Network and Mobile Summit, 2010, pp. 1–10.

  21. Shepard, C., Yu, H., Anand, N., Li, E., Marzetta, T., Yang, R., and Zhong, L. (2012) Argos: Practical many-antenna base stations. In: Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, ser. Mobicom ‘12. New York, NY, USA: ACM, pp. 53–64.

  22. Björnson, E., Larsson, E. G., & Marzetta, T. L. (2016). Massive MIMO: Ten myths and one critical question. IEEE Communications Magazine, 54(2), 114–123.

    Article  Google Scholar 

  23. Rogalin, R., Bursalioglu, O. Y., Papadopoulos, H., Caire, G., Molisch, A. F., Michaloliakos, A., Balan, V., & Psounis, K. (2014). Scalable synchronization and reciprocity calibration for distributed multiuser MIMO. IEEE Transactions on Wireless Communications, 13(4), 1815–1831.

    Article  Google Scholar 

  24. Ouameur, M. A, and Massicotte, D. (2019). Successive column-wise matrix inversion update for large scale massive MIMO reciprocity calibration, Accepted in IEEE wireless communications and networking conference (WCNC), Marrakech, Morocco.

  25. Vieira, J., Rusek, F., Edfors, O., Malkowsky, S., Liu, L., & Tufvesson, F. (2017). Reciprocity calibration for massive MIMO: Proposal, Modeling, and validation. IEEE Transactions on Wireless Communications, 16(5), 3042–3056.

    Article  Google Scholar 

  26. Arun, K. S. (1992). A unitary constrained total least squares problem in signal processing. SIAM Journal on Matrix Analysis and Applications, 13(3), 728–745.

    Article  MathSciNet  Google Scholar 

  27. Cichocki, A., & Unbehauen, R. (1994). Simplified neural networks for solving linear least squares and total least squares problems in real time. IEEE Transactions on Neural Networks, 5(6), 910–923.

    Article  Google Scholar 

Download references

Acknowledgments

This work has been funded by the Natural Sciences and Engineering Research Council of Canada, Prompt and NUTAQ Innovation, and the CMC Microsystems for equipments. This work was supported by the Chaire de recherche sur les signaux et l’intelligence de systems haute performance for technical support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Massicotte.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

The reader must refer to the dependence tree shown in Fig. 1 and to Eqs. (19), (20) and (21) to understand the following derivations. The explicit forms of the gradient of the loss function e with respect to αk and λkare

$$ \overline{\frac{\partial e}{\partial {\alpha}_k}}=2{r}_{n+1}^T\overline{\frac{\partial {r}_{n+1}}{\partial {\alpha}_k}} $$
(A.1)
$$ \overline{\frac{\partial e}{\partial {\lambda}_k}}=2\sum \limits_{z=1}^{2M}{r}_{n+1}^{(z)}\overline{\frac{\partial {r}_{n+1}^{(z)}}{\partial {\lambda}_k}} $$
(A.2)

where \( \overline{\frac{\partial \left(\cdot \right)}{\partial \left(\cdot \right)}} \) indicates the total derivative and \( \overline{\frac{\partial {r}_{n+1}^{(z)}}{\partial {\lambda}_k}} \) represents the total derivative of the zth element of rn + 1 with respect to λk where k ∈ [1 : n] for Eq. (A.1) and k ∈ [1 : n − 1] for Eq. (A.2). The formulas for the total derivative of rn + 1 with respect to αk and λk use

$$ \overline{\frac{\partial {r}_{n+1}}{\partial {\alpha}_n}}=\frac{\partial {r}_{n+1}}{\partial {\alpha}_n} $$
(A.3)

where the total derivative of rn + 1 with respect to αn corresponds to the only possible path on Fig. 1 that goes from rn + 1 to αn. Before going further, it is important to notice that the allowed paths are those following the same direction as the arrows in Fig. 1. Therefore, we have

$$ \overline{\frac{\partial {r}_{n+1}}{\partial {\alpha}_k}}=\overline{\frac{\partial {r}_{n+1}}{\partial {r}_{k+1}}}\frac{\partial {r}_{k+1}}{\partial {\alpha}_k} $$
(A.4)

where the chain rule is applied between the total derivative of rn + 1 with respect to rk + 1, which depends on every possible path on Fig. 1 between rn + 1 and rk + 1, and the straight unique path derivative between rk + 1 and αk with k ∈ [1 : n − 1]. In a similar way, we derive

$$ \overline{\frac{\partial {r}_{n+1}^{(z)}}{\partial {\lambda}_k}}\left(i,j\right)=\overline{\frac{\partial {r}_{n+1}}{\partial {p}_{k+1}}}\left(i,z\right)\frac{\partial {p}_{k+1}}{\partial {\lambda}_k}\left(i,j\right) $$
(A.5)

where the two indices in parenthesis next to every term represent the row and the column of the resulting matrix with i, j, z ∈ [1 : 2M] and k ∈ [1 : n − 1]. The left-hand side of Eq. (A.5) shows that there is a distinct derivative for each element in the vector rn + 1 with respect to each (i, j) element in the λk matrix. The reason why the form of Eq. (A.5) is slightly different from the one of Eq. (A.4) is due to the fact that λk is a matrix instead of being a scalar. We then continue with the total derivative of rn + 1 with respect to pn

$$ \overline{\frac{\partial {r}_{n+1}}{\partial {p}_n}}=\frac{\partial {r}_{n+1}}{\partial {p}_n} $$
(A.6)

which is simply the direct path in Fig. 1 between rn + 1 and pn. Things get slightly more complicated for the general form of the derivative of rn + 1 with respect to pk with k ∈ [2 : n − 1]

$$ \overline{\frac{\partial {r}_{n+1}}{\partial {p}_k}}=\overline{\frac{\partial {r}_{n+1}}{\partial {p}_{k+1}}}\left(\frac{\partial {p}_{k+1}}{\partial {p}_k}+\frac{\partial {p}_{k+1}}{\partial {s}_k}\frac{\partial {s}_k}{\partial {r}_{k+1}}\frac{\partial {r}_{k+1}}{\partial {p}_k}\right)+\overline{\frac{\partial {r}_{n+1}}{\partial {r}_{k+1}}}\frac{\partial {r}_{k+1}}{\partial {p}_k} $$
(A.7)

where the chain rule is first applied between the total derivative of rn + 1 with respect to pk + 1, which depends on every possible path on Fig. 1 between these two nodes, and the straight path derivative between pk + 1 and pk. In a similar way, the chain rule is applied between the total derivative of rn + 1 with respect to pk + 1 and the straight paths derivatives going through pk + 1, sk, rk + 1 and pk respectively. Finally, the same process is done between the total derivative of rn + 1 with respect to rk + 1 and the unique straight path derivative between rk + 1 and pk. At that point, the only unknown we are left with is the total derivative of rn + 1 with respect to any other r. We begin with

$$ \overline{\frac{\partial {r}_{n+1}}{\partial {r}_n}}=\frac{\partial {r}_{n+1}}{\partial {r}_n}+\frac{\partial {r}_{n+1}}{\partial {p}_n}\frac{\partial {p}_n}{\partial {s}_{n-1}}\frac{\partial {s}_{n-1}}{\partial {r}_n} $$
(A.8)

where the first possible path is the direct one between rn + 1 and rn and the second possible path is the one that goes through rn + 1, pn, sn − 1 and rn respectively. We then continue with

$$ {\displaystyle \begin{array}{c}\overline{\frac{\partial {r}_{n+1}}{\partial {r}_{n-1}}}=\overline{\frac{\partial {r}_{n+1}}{\partial {r}_n}}\left(\frac{\partial {r}_n}{\partial {r}_{n-1}}+\frac{\partial {r}_n}{\partial {p}_{n-1}}\frac{\partial {p}_{n-1}}{\partial {s}_{n-2}}\frac{\partial {s}_{n-2}}{\partial {r}_{n-1}}\right)\\ {}\kern2.36em +\frac{\partial {r}_{n+1}}{\partial {p}_n}\frac{\partial {p}_n}{\partial {p}_{n-1}}\frac{\partial {p}_{n-1}}{\partial {s}_{n-2}}\frac{\partial {s}_{n-2}}{\partial {r}_{n-1}}\end{array}} $$
(A.9)

where the three possible paths between rn + 1 and rn − 1 are considered. Finally, we have the general case for k ∈ [2 : n − 2]

$$ {\displaystyle \begin{array}{c}\overline{\frac{\partial {r}_{n+1}}{\partial {r}_k}}=\overline{\frac{\partial {r}_{n+1}}{\partial {r}_{k+1}}}\left(\frac{\partial {r}_{k+1}}{\partial {r}_k}+\frac{\partial {r}_{k+1}}{\partial {p}_k}\frac{\partial {p}_k}{\partial {s}_{k-1}}\frac{\partial {s}_{k-1}}{\partial {r}_k}\right)\\ {}\kern2.24em +\frac{\partial {r}_{n+1}}{\partial {p}_n}\left(\prod \limits_{i=n}^{k+1}\frac{\partial {p}_i}{\partial {p}_{i-1}}\right)\frac{\partial {p}_k}{\partial {s}_{k-1}}\frac{\partial {s}_{k-1}}{\partial {r}_k}\\ {}\kern2.24em +\sum \limits_{j=1}^{n-k-1}\left(\overline{\frac{\partial {r}_{n+1}}{\partial {r}_{n-j+1}}}\frac{\partial {r}_{n-j+1}}{\partial {p}_{n-j}}\prod \limits_{z=n-j}^{k+1}\left(\frac{\partial {p}_z}{\partial {p}_{z-1}}\right)\right)\frac{\partial {p}_k}{\partial {s}_{k-1}}\frac{\partial {s}_{k-1}}{\partial {r}_k}\end{array}} $$
(A.10)

where all possible paths going from rn + 1 to rkare selected. By differentiating Eqs. (19), (20) and (21), every symbolic derivative can be replaced by its true value

$$ \Big\{{\displaystyle \begin{array}{c}\frac{\partial {r}_{k+1}}{\partial {r}_k}=I,\frac{\partial {r}_{k+1}}{\partial {p}_k}=-{\alpha}_kA\\ {}\frac{\partial {r}_{k+1}}{\partial {\alpha}_k}=-A{p}_k,\frac{\partial {s}_{k-1}}{\partial {r}_k}=A\\ {}\frac{\partial {p}_{k+1}}{\partial {p}_k}={\lambda}_k,\frac{\partial {p}_{k+1}}{\partial {s}_k}=I\\ {}\frac{\partial {p}_{k+1}}{\partial {\lambda}_k}=\left(\begin{array}{ccc}{p}_k^{(1)}& \dots & {p}_k^{(2M)}\\ {}\vdots & \vdots & \vdots \\ {}{p}_k^{(1)}& \cdots & {p}_k^{(2M)}\end{array}\right)\in {\mathbb{R}}^{2M\times 2M}\end{array}} $$
(A.11)

where I is a 2M × 2M identity matrix and \( \frac{\partial {\mathbf{p}}_{k+1}}{\partial {\boldsymbol{\uplambda}}_k} \) is a matrix with the ith column having a constant value corresponding to the ith element of pk.

To get better and faster results during the backpropagation process, the adaptation step for αk and λk can be set to different values and a momentum technique is used on λk:

$$ {\lambda}_k\left(t+1\right)={\lambda}_k(t)-{\mu}_1\overline{\frac{\partial e}{\partial {\lambda}_k(t)}}+\beta \left({\lambda}_k(t)-{\lambda}_k\left(t-1\right)\right) $$
(A.12)
$$ {\alpha}_k\left(t+1\right)={\alpha}_k(t)-{\mu}_2\overline{\frac{\partial e}{\partial {\alpha}_k(t)}} $$
(A.13)

where μ1 and μ2 are the fixed adaptation step, β is the momentum coefficient between 0 and 1 and t represents the current iteration step (epoch) in the backpropagation process.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sirois, S., Ahmed Ouameur, M. & Massicotte, D. Deep Unfolded Extended Conjugate Gradient Method for Massive MIMO Processing with Application to Reciprocity Calibration. J Sign Process Syst 93, 965–975 (2021). https://doi.org/10.1007/s11265-020-01631-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-020-01631-1

Keywords

Navigation