1 Introduction

With fast-growing demand of wireless communications, the worldwide telecommunication organization 3rd generation partnership project (3GPP) has begun the standardization process for the next generation system, known as 5G. To achieve the goal of 1000 times increase in mobile data rate [1], many new technologies are proposed, including ultra-dense network, device-to-device, millimeter-wave (mmWave), massive MIMO, and non-orthogonal modulation multiple access [57, 21, 24, 2830, 33, 40, 41]. A fundamental requirement in wireless communication is spectrum. Conventionally, wireless communication is operated below 10 GHz for its good radio propagation properties. However, it is expected that the spectrum will be used up in just a few years. This pushes system designer to consider higher frequency spectrum, i.e., mmWave bands, where larger bandwidth allocation is available [26]. Since the pathloss in mmWave transmission is severe, reliable communication requires beamforming with large-scale antenna arrays. Many challenging issues have been raised regarding the implementation of digital beamforming in large-scale antenna array systems such as cost and energy consumption. Hybrid beamforming is then developed to alleviate the problem [13, 1517, 36]. It is known that beamforming with large-scale antenna arrays is much easier to be implemented in base-stations (BSs) than that in user elements (UEs). To conduct beamforming, either with digital or hybrid arrays, we need to know angle-of-departure (AoD) or angle-of-arrival (AoA). The estimation is also much easier to be conducted in BSs than UEs. When UE is mobile, the direction of beamforming has to change dynamically, known as beam tracking. Beam tracking is known to be a great challenge for UE. For these reasons, in this paper, we consider the single-input-multiple-output (SIMO) OFDM system, in which beamforming and AoA estimation are conducted only in BSs.

Subspace-based methods such as multiple-signal-classification (MUSIC) and estimation-of-signal-parameters-via-rotational-invariance-techniques (ESPRIT) algorithm are well-known in AoA estimation [31]. By the proper arrangement of received signal vectors, signal subspace could be extracted with signature vectors; each signature vector serves a function of AoAs, AoDs, and path delays. Since the received signal is obtained by the convolution of the transmit signal and the propagation channel, the measurement vectors have to be concatenated to a high-dimension vector. This operation greatly increases the computational complexity of the subspace methods since they involve complex matrix operations such as singular-value-decomposition. Note that the original MUSIC and ESPRIT in [31] are blind meaning that no training data are required. In [8, 18, 20, 25, 32, 39], the MUSIC/ESPRIT method has been extended to deal with the joint estimation of AoA, AoD, and path delays. However, training data are needed to conduct channel estimation first, and the estimated channel is input to MUSIC/ESPRIT instead of the received data. Thus, these extended methods are not blind and they still suffer from the problem of high-dimension vector. In the case of AoA estimation, the dimension would be the number of receiving antennas times the considered channel length. For example, let the number of receiving antennas be 16, and the considered channel length be 10. The computational complexity for eigen-value decomposition is O(1603), which is too high for practical implementation. For mmWave channel, the number of paths is generally limited. In [3], the propagation channel at 28 and 73 GHz in some area of New York was measured. The maximum number of channel paths is found to be four. A Poisson distribution with mean being two is recommended to model the number of the paths. This is equivalent to saying that AoAs and AoDs are in a sparse domain if both are quantized to certain levels. The work in [4] utilizes the sparsity and applies the compressive sensing (CS) technique to estimate AoA and AoD. However, it only considers the flat-fading channel, meaning that the bandwidth is small and all the channel paths are not distinguishable. Also, using the CS method, Zhang et al. [43] develops a scheme that can estimate AoA, AoD, and path delays. However, it requires to construct a matrix with a high dimension and the computational complexity is still high. The high dimension is also due to the convolution of transmitted training signal and channel-impulse- response (CIR).

In this paper, we propose low-complexity methods for the joint estimation of AoAs and the CIR. As the usual assumption in most literatures for channel estimation in mmWave, we assume that each scatterer contributes one single path parameterized by its delay and AoA. We further assume that the transmission bandwidth is large, making all channel paths distinguishable. Therefore, there is one AoA associated with a channel path. As mentioned, the CIR is assumed to be sparse in mmWave transmission and the CIR estimation can be regarded as a CS problem [42]. After each path is estimated, the associated AoA can be estimated. There are a couple of features that make our method different from the existing ones. First, we consider the OFDM system with reference signal defined in the frequency-domain for channel estimation, known as pilots. This is the same as current wireless systems, e.g. LTE. Second, we decouple the estimation of CIR and AoA: the CS method is applied for the CIR estimation and the maximum-likelihood (ML) estimation is applied for AoA estimation. This decoupling can effectively lower the computational complexity. Third, we propose improved matching-pursuit-based CS methods for channel estimation. Conventional application of the CS method to channel estimation only considers the single receive-antenna scenario. The proposed matching-pursuit-based methods take the multiple-antenna into consideration. The proposed joint method effectively improves the performance of the AoA and channel estimation while the computational complexity remains low. In ML AoA estimation, there exists a hidden parameter in the likelihood function. The expectation-maximization (EM) algorithm [19, 23] is well known for solving this kind of ML problem. The EM algorithm, being iterative, consists of the expectation (E) and the maximization (M) steps in each iteration. The E step aims to take the expectation of the joint log-likelihood with respect to the hidden parameter, using the current estimate of unknown parameters. After the E step, the hidden parameter in the log-likelihood function is substituted by its posterior statistics. The following M step is then to maximize the resultant function with respect to the unknown parameters. In our AoA estimation, the EM algorithm will be used. In [22], the EM algorithm is also used for the same purpose. However, our method differs from [22] in the derivation of posteriori statistics in the E step and the AoA search in the M step.

The rest of the paper is organized as follows. In Section 2, the system model is defined, including the transmit signal, the propagation channel, and the received signal. In Section 3, the proposed joint AoA and channel estimation method is described in details. Performance analysis is also conducted which includes the derivation of the mean-square-error (MSE) of the channel gain estimate and the Cramer-Rao lower bound (CRLB) for the proposed AoA estimation. In Section 4, simulation results are reported to evaluate the performance of the proposed methods. Finally, conclusions are drawn in Section 5.

2 System Model

Let M be the number of the received antennas, L be the number of paths in the CIR, and a m, l = exp(j(m − 1)ϕ l ) be the array response for path-l signal at receive antenna m. Here, ϕ l = πsin(φ l ), and φ l is the AoA in radian. Then, the time-domain CIR for the receive antennas, denoted by c(n), can be written as:

$$\begin{array}{@{}rcl@{}} \mathbf{c}(n) &=& \sum\limits_{l=1}^{L} \alpha_{l}\mathbf{a}_{l}\delta(n-k_{l}) \end{array} $$
(1)

where a l = [a 1, l , a 2, l ,⋯ , a M, l ]T is the spatial signature, α l is the complex channel gain, and k l is the delay of path-l signal, respectively. Without loss of generality, we assume that k 1 < k 2 < ⋯< k L . Let I be the CP size and assume that k L < I. A CIR matrix can then be constructed as

$$\begin{array}{@{}rcl@{}} \mathbf{H}= \left[ \mathbf{h}(0),\mathbf{h}(1),\cdots,\mathbf{h}(I-1) \right]. \end{array} $$
(2)

where h(k l ) = α l a l . The mth row of H represents the CIR of the m receive antenna. As we can see, due to the sparsity of the channel, H will only have L non-zero columns in which AoA information is embedded, and the indices of the L non-zero components in each row of H should be the same. Therefore, the CIRs of all receive antennas should have identical indices of the L non-zero components. Figure 1 is an illustration for our channel model. In Fig. 1, there are four paths with path indices k 1, k 2, k 3, and k 4, and AoAs ϕ 1, ϕ 2, ϕ 3, and ϕ 4.

Figure 1
figure 1

Illustration of channel model.

Thanks to CP, the convolution between the transmitted signal and the propagation channel becomes a circular convolution, facilitating signal detection in the frequency domain. Let N be the size of the discrete-Fourier-transform (DFT), F be the N×N DFT matrix, \(\tilde {\mathbf {x}}=[\tilde {x}_{1},\tilde {x}_{2},\cdots ,\tilde {x}_{N}]^{T}\) be the OFDM transmitted signal vector, and h m be the transpose of the mth row of H. It is simple to see that h m is the CIR observed at the mth antenna, and the corresponding frequency-domain channel can be obtained as \(\tilde {\mathbf {h}}_{m}=\mathbf {F}[\mathbf {h}_{m}^{T}, \mathbf {0}^{T}]^{T}\) where 0 is a (NI) × 1 zero vector. Then, the frequency-domain received signal at the mth antenna, denoted by \(\tilde {\mathbf {r}}_{m}\), can be formulated as:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{r}}_{m} =\tilde{\mathbf{X}}\tilde{\mathbf{h}}_{m} + \tilde{\mathbf{w}}_{m} \end{array} $$
(3)

where \(\tilde {\mathbf {X}}\) is a diagonal matrix with \(\tilde {\mathbf {X}}\) as its diagonal components, and \(\tilde {\mathbf {w}}_{m}\) is the additive white Gaussian noise observed at the mth antenna. Each component of \(\tilde {\mathbf {w}}_{m}\) is assume to have a zero mean and variance of \({\sigma ^{2}_{w}}\). The (k, l)th component of F is equal to \(\exp (-j2\pi (k-1)(l-1)/N)/\sqrt N\). Thus, \(\tilde {\mathbf {h}}_{m}\) can be easily estimated from Eq. 3 by

$$\begin{array}{@{}rcl@{}} \check{\mathbf{h}}_{m}={\tilde{\mathbf{X}}}^{-1}\tilde{\mathbf{r}}_{m} \end{array} $$
(4)

where \(\check {\mathbf {h}}_{m}\) is the estimate of \(\tilde {\mathbf {h}}_{m}\). This is referred to as the zero-forcing (ZF) channel estimator.

Now, we describe the relationship between the receive signal in Eq. 3 and CIR h m . First note that the equation \(\tilde {\mathbf {h}}_{m}=\mathbf {F}[\mathbf {h}_{m}^{T}, \mathbf {0}^{T}]^{T}\) can be rewritten as \(\tilde {\mathbf {h}}_{m}=\mathbf {F}_{I}\mathbf {h}_{m}\), where F I is the submatrix containing the first I columns of F. Substituting it into (3), we can rewrite the receive signal vector as:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{r}}_{m} =\tilde{\mathbf{X}}\mathbf{F}_{I}\mathbf{h}_{m} + \tilde{\mathbf{w}}_{m}. \end{array} $$
(5)

Denote \(\mathbf {f}^{I}_{r}\) as the rth row of F I . Then \(\tilde {\mathbf {X}}\mathbf {F}_{I}\) can be represented by:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{X}}\mathbf{F}_{I} = \left[ \begin{array}{c} \tilde{x}_{1}\mathbf{f}^{I}_{1} \\ {\vdots} \\ \tilde{x}_{N}\mathbf{f}^{I}_{N} \end{array} \right]. \end{array} $$
(6)

In a pilot-assisted OFDM systems, however, the reference signal used for channel estimation is only transmitted in some specific subcarriers, known as pilots. Let I p = {i 1,…, i P } be the subcarrier indices of the P pilots, \(\tilde {\mathbf {x}}^{s}=[\tilde {x}_{i_{1}},\tilde {x}_{i_{2}},\cdots ,\tilde {x}_{i_{P}}]^{T}\) be the transmitted pilot vector, \(\tilde {\mathbf {X}}^{s}_{m}\) be a diagonal matrix with \(\tilde {\mathbf {x}}^{s}\) as its diagonal components, and F s be the submatrix of F I by:

$$\begin{array}{@{}rcl@{}} \mathbf{F}^{s} = \left[ \begin{array}{c} \mathbf{f}^{I}_{i_{1}} \\ {\vdots} \\ \mathbf{f}^{I}_{i_{P}} \end{array} \right] \end{array} $$
(7)

Then, Eq. 6 can be reduced in the same manner:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{X}}^{s}\mathbf{F}^{s} = \left[ \begin{array}{c} \tilde{x}_{i_{1}}\mathbf{f}^{I}_{i_{1}} \\ {\vdots} \\ \tilde{x}_{i_{P}}\mathbf{f}^{I}_{i_{P}} \end{array} \right] \end{array} $$
(8)

Replacing \(\tilde {\mathbf {X}}\mathbf {F}_{I}\) in Eq. 5 with \(\tilde {\mathbf {X}}^{s}\mathbf {F}^{s}\) in Eq. 8, we can rewrite the received pilot signal in the pilot-assisted OFDM system as:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{r}}^{s}_{m} = \tilde{\mathbf{X}}^{s}\mathbf{F}^{s}\mathbf{h}_{m} + \tilde{\mathbf{w}}^{s}_{m} \end{array} $$
(9)

where \(\tilde {\mathbf {r}}^{s}_{m}\) and \(\tilde {\mathbf {w}}^{s}_{m}\) are the subvectors of \(\tilde {\mathbf {r}}_{m}\) and \(\tilde {\mathbf {w}}_{m}\) with components corresponding to pilot subcarriers reserved, respectively.

As shown in Eq. 9, we can use the observed signal \(\tilde {\mathbf {r}}^{s}_{m}\) on pilot subcarriers to conduct the estimation of the time-domain CIR, h m . The advantage of the time-domain channel estimation is that the number of parameters to estimate can be much smaller than P, the number of pilots. This is due to the fact that the CIR is generally sparse, meaning that L is much smaller than I. For typical OFDM systems, I is also much smaller than P. To take the advantage of the sparsity, however, we have to rely on the CS technique which will be described in the following section. The other advantage with time-domain processing is that AoA’s can be jointly estimated.

3 Problem Formulation and Proposed Algorithms

In this section, we firstly present the proposed CIR estimation methods. From the model in Eq. 1, we see that there are L paths and corresponding AoAs. For all receive antennas, the delays of the L paths are all the same. Thus, the CIR can be jointly estimated with the receive signal vector in the antenna array. Using the CS technique, we are able to exploit the sparse property of the channel, and significantly enhance the estimation performance. Once the CIR is estimated, a channel gain vector can be constructed for each path and the AoA for the path can then be estimated by the ML method.

3.1 CIR Estimation

Let \(\tilde {\mathbf {F}}=\tilde {\mathbf {X}}^{s}\mathbf {F}^{s}\) in Eq. 9. Also, drop the superscript s in Eq. 9 for notational simplicity. Then, Eq. 9 can be rewritten as:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{r}}_{m} &=& \tilde{\mathbf{F}}\mathbf{h}_{m}+\tilde{\mathbf{w}}_{m}. \end{array} $$
(10)

If PI, the CIR can be easily obtained by a least-squares (LS) method as:

$$\begin{array}{@{}rcl@{}} \hat{\mathbf{h}}_{m} = (\tilde{\mathbf{F}}^{H}\tilde{\mathbf{F}})^{-1}\tilde{\mathbf{F}}^{H}\tilde{\mathbf{r}}_{m} \end{array} $$
(11)

where \(\hat {\mathbf {h}}_{m}\) is the estimate of h m . However, note that there are only L non-zero components in h m and Eq. 11 will result in IL noisy estimated paths. Apparently, the LS method is not optimal. The CS technique is developed to solve the problem; it can recover the L non-zero components of h m . It uses the 0-norm criterion in the optimization [35] to find the sparest solution. However, it is known that this is a NP-hard problem and the solution is difficult to derive. A remedy to this problem is using a relaxed 1-norm as the optimization criterion. There exist many algorithms solving the 1-norm problem, such as basis-pursuit (BP) [10] and least-absolute-shrinkage-and-selection-operator (LASSO) [37]. An alternative approach is the matching-pursuit (MP) based algorithm, including MP [27], orthogonal MP (OMP) [38], subspace-pursuit (SP) [42]. These methods have been proposed to reconstruct the sparse signal with low computational complexity. In [11], SP was successfully applied to the channel estimation problem in OFDM systems. It has been shown that SP outperforms MP and OMP. For this reason, we only consider SP in this work.

Let \(\tilde {\mathbf {F}}=[\tilde {\mathbf {f}}_{1},\tilde {\mathbf {f}}_{2},\cdots ,\tilde {\mathbf {f}}_{I} ]\) where \(\tilde {\mathbf {f}}_{i}\) is the ith column of \(\tilde {\mathbf {F}}\), and \(\mathbf {h}_{m}=[{h_{m}^{1}},{h_{m}^{2}},\) \(\cdots ,{h_{m}^{I}}]^{T}\) where \({h_{m}^{i}}\) is the ith component of h m . Then, we can write (10) as:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{r}}_{m} &=& {h_{m}^{1}}\tilde{\mathbf{f}}_{1}+{h_{m}^{2}}\tilde{\mathbf{f}}_{2}+\cdots+{h_{m}^{I}}\tilde{\mathbf{f}}_{I}+\tilde{\mathbf{w}}_{m}. \end{array} $$
(12)

As we can see from Eq. 12, \(\tilde {\mathbf {r}}_{m}\) is expanded by the basis vectors \(\tilde {\mathbf {f}}_{i}\)’s. The matrix \(\tilde {\mathbf {F}}\) is called the measurement matrix in CS theory and it represents the linear mapping between the observed data \(\tilde {\mathbf {r}}_{m}\) and the sparse signal h m . The idea of MP-based method is to match \(\tilde {\mathbf {r}}_{m}\) with \(\tilde {\mathbf {f}}_{i}\) such that \({h_{m}^{i}}\) can be determined to be present or not (zero or non-zero path). MP-based algorithms are iterative. Both MP and OMP search each path successively at each iteration. Instead of that, SP searches the subspace spanned by a set of candidate paths. At each iteration, half of the old candidates are removed and half of the new candidates are added. We now briefly describe the operation of the SP method. Supposed that at the end of the (i − 1)th iteration, L s candidates have been reserved, and their indices form a set \(R^{i-1}=\{k_{1},k_{2},\cdots ,k_{L_{s}}\}\) where k l is the column index of \(\tilde {\mathbf {F}}\). Also, a residual receive vector, denoted by \(\tilde {\mathbf {s}}_{m}^{i-1}\), has been calculated. Then, at the ith iteration, a matching is first conducted as \(\tilde {\mathbf {p}}_{m}^{i}= \tilde {\mathbf {F}}^{H} \tilde {\mathbf {s}}_{m}^{i-1}\) where \(\tilde {\mathbf {p}}_{m}^{i}\) is the matched result. The indices of L s largest components in \(\tilde {\mathbf {p}}_{m}^{i}\) are then selected to form a new set Q i. Together with the reserved L s candidates, we now have 2L s candidates. Note that some indices may be repeated selected. Let S i = R i − 1Q i, and we can conduct a LS estimate with \(\tilde {\mathbf {F}}_{S^{i}}\) which is defined as the submatrix of \(\tilde {\mathbf {F}}\) constructed by the columns with the indices in S i. Thus, we have a CIR estimation, denoted by \(\hat {\mathbf {h}}^{S^{i}}_{m}\), as:

$$\begin{array}{@{}rcl@{}} \hat{\mathbf{h}}^{S^{i}}_{m} = (\tilde{\mathbf{F}}^{H}_{S^{i}}\tilde{\mathbf{F}}_{S^{i}})^{-1}\tilde{\mathbf{F}}^{H}_{S^{i}}\tilde{\mathbf{r}}_{m}. \end{array} $$
(13)

Finally, the largest L s components of \(\hat {\mathbf {h}}^{S^{i}}_{m}\) are then selected and their indices are used to form R i. Let \(\tilde {\mathbf {F}}_{R^{i}}\) be the submatrix of \(\tilde {\mathbf {F}}\) constructed by the columns with the indices in R i. The CIR estimate at the ith iteration is then calculated by \(\hat {\mathbf {h}}^{R^{i}}_{m}=(\tilde {\mathbf {F}}_{R^{i}}^{H}\tilde {\mathbf {F}}_{R^{i}})^{-1}\tilde {\mathbf {F}}_{R^{i}}^{H}\tilde {\mathbf {r}}_{m}\). Also, the vector spanned by selected channel taps is subtracted from \(\tilde {\mathbf {r}}_{m}\) to obtain the residual vector, \(\tilde {\mathbf {s}}^{i}_{m}\), as:

$$\begin{array}{@{}rcl@{}} \tilde{\mathbf{s}}_{m}^{i} = \tilde{\mathbf{r}}_{m}-\tilde{\mathbf{F}}_{R^{i}}(\tilde{\mathbf{F}}_{R^{i}}^{H} \tilde{\mathbf{F}}_{R^{i}})^{-1}\tilde{\mathbf{F}}_{R^{i}}^{H}\tilde{\mathbf{r}}_{m}. \end{array} $$
(14)

The iteration continues and stops when the magnitude of the residual vector remains unchanged, i.e.,

$$\begin{array}{@{}rcl@{}} \parallel \tilde{\mathbf{s}}_{m}^{i} \parallel_{2} = \parallel \tilde{\mathbf{s}}_{m}^{i-1} \parallel_{2}. \end{array} $$
(15)

The complete SP algorithm is summarized in Algorithm 1. To recover the sparse CIR \(\hat {\mathbf {h}}_{m}\) from noisy observations stably, the measurement matrix \(\tilde {\mathbf {F}}\) needs to be designed with the restricted isometry property (RIP) [11] [9, 12]. The related topics are out of scope of this paper. In the following paragraph, we describe the extension of the SP algorithm to the channel estimation in SIMO-OFDM systems.

figure d

In our system, we have M received antennas and M CIRs to estimate. If we use SP to conduct the CIR estimation for each received antenna, the estimated path delays may be different for all channels. Note that from our model in Eq. 1, we see that the path delays shall be the same among all received antennas. Thus, we propose an extended SP algorithm to jointly estimate the path delays. The main idea is to let R i, Q i, and S i be the same for all channels, and let the matching and selection operations in Algorithm 1 are jointly conducted. In Step 9, let the lth component of \(\tilde {\mathbf {p}}^{i}\) be equal to \({\sum }_{m=1}^{M}|\tilde {\mathbf {f}}_{l}^{H} \tilde {\mathbf {s}}_{m}^{i-1}|\). In Step 13, define \({a_{l}^{i}}\) as the summation of the magnitude of all estimated channel gains for path l at the ith iteration, i.e.,

$$\begin{array}{@{}rcl@{}} {a_{l}^{i}}=\sum\limits_{m=1}^{M} | \hat{h}_{m}^{S^{i},l} | \end{array} $$
(16)

where \(\hat {h}_{m}^{S^{i},l}\) is the lth element in \(\hat {\mathbf {h}}^{S^{i}}_{m}\). Then, \({a_{l}^{i}}\) is used as the selection criterion. The stoping criterion is also changed accordingly, i.e.,

$$\begin{array}{@{}rcl@{}} \sum\limits_{m=1}^{M} \parallel \mathbf{s}_{m}^{i} \parallel_{2} = \sum\limits_{m=1}^{M} \parallel \mathbf{s}_{m}^{i-1} \parallel_{2}. \end{array} $$
(17)

The extended SP (ESP) algorithm is summarized in Algorithm 2.

figure e

Now, we analyze the computational complexity of the ESP algorithm. For the ith iteration, the computational requirement mainly comes from two matrix inversions: \((\tilde {\mathbf {F}}^{H}_{S^{i}}\tilde {\mathbf {F}}_{S^{i}})^{-1}\) and \((\tilde {\mathbf {F}}_{R^{i}}^{H}\tilde {\mathbf {F}}_{R^{i}})^{-1}\). Let the size of the indices S i be \(L_{S^{i}}\) and we have \(L_{s}\leq L_{S^{i}}\leq 2L_{s}\). Also, the size of R i is L s . Therefore, the computational complexity of ESP is T×(O(\(L_{S^{i}}^{3}\)) + O(\({L_{s}^{3}}\)) ) where T is the number of iterations.

Although the SP algorithm is simple, its performance can be easily affected by noise. When the SNR is low, what usually happens in mmWave, wrong decisions can be frequently made in Step 9 and 13 of Algorithm 2. We propose another ESP algorithm, referred to as enhanced ESP (EESP), to improve performance. Unlike ESP, EESP only removes one and adds one candidate at each iteration. In other words, there is only one element in Q i. Also, SNR is used as an aided criterion for path selection. Supposed that at the end of the (i − 1)th iteration, L s candidates have been selected and their indices forms \(R^{i-1}=\{k_{1},k_{2},\cdots ,k_{L_{s}}\}\). At the ith iteration, the index of one candidate is selected and added to R i − 1 to form S i. Thus, there are L s + 1 candidates in S i. Then, the CIR is estimated as that in Eq. 13. Denote \(\mathbf {h}^{S^{i}}_{m}\) as the subvector of h m at the index set S i, \(\hat {\mathbf {h}}^{S^{i}}_{m}\) as the corresponding estimate, and \(\mathbf {C}_{S^{i}}= E[(\hat {\mathbf {h}}^{S^{i}}_{m}-\mathbf {h}^{S^{i}}_{m})(\hat {\mathbf {h}}^{S^{i}}_{m}-\mathbf {h}^{S^{i}}_{m})^{H}]\) as the covariance matrix of the estimation error \(\hat {\mathbf {h}}^{S^{i}}_{m}-\mathbf {h}^{S^{i}}_{m}\). Then, \(\mathbf {C}_{S^{i}}\) is found to be

$$\begin{array}{@{}rcl@{}} \mathbf{C}_{S^{i}} &=& {\sigma_{w}^{2}}(\tilde{\mathbf{F}}_{S^{i}}^{H}\tilde{\mathbf{F}}_{S^{i}})^{-1}. \end{array} $$
(18)

Note that the covariance matrix is independent of antenna index m. Let the lth diagonal elements of \(\mathbf {C}_{S^{i}}\) be \(\gamma ^{i}_{w,l}\), representing the variance of the estimation error on the lth component of \(\hat {\mathbf {h}}^{S^{i}}_{m}\), \(h^{S^{i},l}_{m}\) be the lth component of \(\mathbf {h}^{S^{i}}_{m}\), and \(\gamma ^{i}_{h,l}={\sum }_{m=1}^{M} |h^{S^{i},l}_{m}|^{2}\) be the summation of the channel power on the lth component of \(\mathbf {h}^{S^{i}}_{m}\) over all receive antenna m. Then, \(\gamma ^{i}_{h,l}\) can be calculated by:

$$\begin{array}{@{}rcl@{}} \gamma^{i}_{h,l} = \sum\limits_{m=1}^{M} |\hat{h}_{m}^{S^{i},l}|^{2} - \gamma^{i}_{w,l}. \end{array} $$
(19)

The SNR metric for path l, denoted by \({\rho _{l}^{i}}\), can be defined by:

$$\begin{array}{@{}rcl@{}} {\rho_{l}^{i}}\triangleq\frac{\gamma^{i}_{h,l}}{\gamma^{i}_{w,l}}. \end{array} $$
(20)

Then, \({\rho _{l}^{i}}\) is used as the selection criterion and the path with the worst SNR is removed and the indices of the rest of the candidates forms R i. When all the possible path indices are examined, the iteration stops. The operations of the proposed EESP algorithm are summarized in Algorithm 3. Now, we analyze the computational complexity of the EESP algorithm. There are IL s + 1 iterations in total. For the ith iteration, the computational requirement mainly comes from one matrix inversion: \((\tilde {\mathbf {F}}^{H}_{S^{i}}\tilde {\mathbf {F}}_{S^{i}})^{-1}\). Since the size of the indices S i is fixed as L s + 1, the required computational complexity is (IL s + 1)×O((L s + 1)3). Although the computational complexity of EESP is lower than that of ESP in each iteration, the required iteration number in EESP is generally greater than that in ESP, i.e., TIL s + 1. Note that the computational complexity of both ESP and EESP is much lower than that of extended MUSIC/ESPRIT methods, which is O((N r L s )3).

figure f

As mentioned, L is assumed to be known. For better performance, we let L s > L in the ESP and EESP algorithms. After the L s paths have been estimated, we can then select L best paths among them with the SNR criterion as those depicted in Eqs. 1820. Let R be the set of the indices corresponding to the L s estimated paths. Denote \(\mathbf {h}^{R}_{m}\) as the subvector of h m at the index set R, \(\hat {\mathbf {h}}^{R}_{m}\) as the corresponding estimate, \(\tilde {\mathbf {F}}_{R}\) as the submatrix of \(\tilde {\mathbf {F}}\) constructed by the columns with the indices in R and \(\mathbf {C}_{R}= E[(\hat {\mathbf {h}}^{R}_{m}-\mathbf {h}^{R}_{m})(\hat {\mathbf {h}}^{R}_{m}-\mathbf {h}^{R}_{m})^{H}]\) as the covariance matrix of the estimation error \(\hat {\mathbf {h}}^{R}_{m}-\mathbf {h}^{R}_{m}\). Then, we have the estimation covariance matrix as:

$$\begin{array}{@{}rcl@{}} \mathbf{C}_{R} &=& {\sigma_{w}^{2}}(\tilde{\mathbf{F}}_{R}^{H}\tilde{\mathbf{F}}_{R})^{-1}. \end{array} $$
(21)

Let the sth diagonal element of C R be γ e, s , representing the variance of the estimation error on the sth component of \(\hat {\mathbf {h}}^{R}_{m}\), and γ h, s be the power of the sth path gain, which can be calculated by:

$$\begin{array}{@{}rcl@{}} \gamma_{h,s} = \sum\limits_{m=1}^{M} |\hat{h}_{m}^{R,s}|^{2} - \gamma_{e,s}. \end{array} $$
(22)

The SNR metric is then defined as ρ s = γ h, s /γ e, s for s∈{1,…, L s }. The final L best paths are then selected according to this SNR metric. Note that the other L s L paths in \(\hat {\mathbf {h}}^{R}_{m}\) are simply nulled and \(\hat {\mathbf {h}}^{R}_{m}\) remains only the selected L paths for m = 1,…, M. Using the estimated CIR, we can then conduct AoA estimation which will be described in the following subsection.

3.2 AoA Estimation

We assume that the path indices have been correctly estimated. Recall that h m is the transpose of the mth row in Eq. 2. Suppose that the k l th path is detected as non-zero and its response has been estimated. Then, we can put \(h_{m}^{k_{l}}\)’s, for m = 1,⋯ , M, into a vector and obtain the estimate of the k l th column of H in Eq. 2, i.e., h(k l ). Then, we can have

$$\begin{array}{@{}rcl@{}} \hat{\mathbf{h}}(k_{l}) &=& [\hat{h}^{k_{l}}_{1},\hat{h}^{k_{l}}_{2},\cdots,\hat{h}^{k_{l}}_{M}]^{T}\\ &=& \mathbf{h}(k_{l})+\mathbf{e}_{l}\\ &=& \alpha_{l}\mathbf{a}_{l}+\mathbf{e}_{l} \end{array} $$
(23)

where e l is the estimation error vector. Here, we assume that the components of e l are uncorrelated and Gaussian distributed. Thus, the covariance matrix of e l can be defined as γ e, l I and γ e, l is the variance of the lth selected path in \(\{\gamma _{e,s}|s=1\thicksim L_{s}\}\). We now use the ML method to estimate the AoA, i.e., ϕ l , from Eq. 23. The ML criterion is given by:

$$\begin{array}{@{}rcl@{}} \{\hat{\phi}_{l}, \hat{\alpha}_{l}\} = \arg \min_{\phi_{l},\alpha_{l}} \| \hat{\mathbf{h}}(k_{l}) - \alpha_{l}\mathbf{a}_{l} \|^{2}. \end{array} $$
(24)

As we can see, the channel gain α l in Eq. 23 represents a hidden parameter for ϕ l estimation. We then resort to the EM method, a well-known iterative method solving the ML problem with hidden-parameters. Denote \({\phi _{l}^{k}}\) as the AoA estimation in the kth iteration, \(\hat {\mathbf {a}}_{l}^{k}\) as the array response with the estimated \(\hat {\phi }_{l}^{k}\), and \(\hat {\alpha }^{k}_{l}\) as the posterior mean of α l , which would be used as its prior mean in the (k+1)th iteration. In the (k+1)th iteration, the E step is defined as the conditional expectation with respect to α l :

$$\begin{array}{@{}rcl@{}} Q(\phi_{l}|\hat{\phi}_{l}^{k}) &=& E_{\alpha_{l}}[\ln{p(\alpha_{l}|\phi_{l})}|\hat{\mathbf{h}}(k_{l});\hat{\phi}_{l}^{k}]\\ &=& -\frac{1}{2\gamma_{e,l}}E_{\alpha_{l}}[\parallel \hat{\mathbf{h}}(k_{l}) - \alpha_{l}\mathbf{a}_{l} \parallel^{2}|\hat{\mathbf{h}}(k_{l});\hat{\phi}_{l}^{k}]. \end{array} $$
(25)

For notational simplicity, denote the posterior mean and the power of α l as \(E_{\alpha _{l}}[\alpha _{l}|\hat {\mathbf {h}}(k_{l});\hat {\phi }_{l}^{k}] \triangleq \hat {\alpha }^{k}_{l}\) and \(E_{\alpha _{l}}[|\alpha _{l}|^{2}|\hat {\mathbf {h}}(k_{l});\hat {\phi }_{l}^{k}] \triangleq \bar {\alpha }^{2}_{l}\). Note that in Eq. 25, the scaling factor \(\frac {1}{2\gamma _{e,l}}\), the power of received signal \(\parallel \hat {\mathbf {h}}(k_{l}) \parallel ^{2}\), and \(E_{\alpha _{l}}[\parallel \alpha _{l}\mathbf {a}_{l} \parallel ^{2}|\hat {\mathbf {h}}(k_{l});\hat {\phi }_{l}^{k}]=M\bar {\alpha }^{2}_{l}\) are irrelevant in the optimization of ϕ l . Therefore we can rewrite (25) as:

$$\begin{array}{@{}rcl@{}} Q(\phi_{l}|\hat{\phi}_{l}^{k}) = 2\Re\{\hat{\alpha}_{l}^{k} \hat{\mathbf{h}}(k_{l})^{H}\mathbf{a}_{l}\}. \end{array} $$
(26)

To derive the posterior mean of α l , we start from the Gaussian model in Eq. 23. Denote the covariance matrix of the noise e l as C e, l = γ e, l I where γ e, l is the variance of the CIR estimate on path k l , being equal to the lth diagonal component of C R in Eq. 21. The posterior mean under the Gaussian model is derived in (10.32) of [14] as:

$$\begin{array}{@{}rcl@{}} \hat{\alpha}^{k}_{l}= \bar{\alpha}_{l}+(\gamma_{\alpha,l}^{-1}+(\hat{\mathbf{a}}^{k}_{l})^{H}\mathbf{C}_{e,l}^{-1}\hat{\mathbf{a}}^{k}_{l})^{-1}(\hat{\mathbf{a}}^{k}_{l})^{H}\mathbf{C}_{e,l}^{-1}(\hat{\mathbf{h}}(k_{l})-\hat{\mathbf{a}}^{k}_{l}\bar{\alpha}_{l}) \end{array} $$
(27)

where \(\bar {\alpha }_{l}\) and γ α, l are the prior mean and prior covariance of α l , respectively. The term \(\bar {\alpha }_{l}\) can be treated as the result from the prevous EM iteration, i.e., \(\bar {\alpha }_{l}=\hat {\alpha }_{l}^{k-1}\). The term γ α, l is approximated by the fixed value γ h, l , the estimated path power in Eq. 22, to avoid another recursion. Substituting C e, l = γ e, l I, \((\hat {\mathbf {a}}^{k}_{l})^{H}\hat {\mathbf {a}}^{k}_{l}=M\), γ α, l = γ h, l and \(\bar {\alpha }_{l}=\hat {\alpha }_{l}^{k-1}\) into Eq. 27, we can derive the posterior mean as the recursive form:

$$\begin{array}{@{}rcl@{}} \hat{\alpha}_{l}^{k} &=& \hat{\alpha}_{l}^{k-1}+(M+\frac{\gamma_{e,l}}{\gamma_{h,l}})^{-1}(\hat{\mathbf{a}}_{l}^{k})^{H}(\hat{\mathbf{h}}(k_{l})-\hat{\alpha}_{l}^{k-1}\hat{\mathbf{a}}_{l}^{k}) \end{array} $$
(28)

Then, the M step, maximizing \(Q(\phi _{l}|\hat {\phi }_{l}^{k})\) with respect to ϕ l , is defined by:

$$\begin{array}{@{}rcl@{}} \hat{\phi}_{l}^{k+1} &=& \arg\max_{\phi_{l}} Q(\phi_{l}|\hat{\phi}_{l}^{k})\\ &=& \arg\max_{\phi_{l}} \Re\{\hat{\alpha}_{l}^{k} \hat{\mathbf{h}}(k_{l})^{H}\mathbf{a}_{l}\}. \end{array} $$
(29)

However, the maximization in Eq. 29 is non-linear and it does not have a closed-form solution. Therefore, we propose using Newton’s method to search for the solution. At the ith iteration of Newton’s method, \(\hat {\phi }^{k}_{l}\) is updated as:

$$\begin{array}{@{}rcl@{}} \hat{\phi}^{k}_{l,i+1} = \hat{\phi}^{k}_{l,i} - \mu(\nabla^{2}_{\hat{\phi}^{k}_{l,i}} Q(\phi_{l}|\hat{\phi}^{k}_{l}))^{-1}\nabla_{\hat{\phi}^{k}_{l,i}} Q(\phi_{l}|\hat{\phi}^{k}_{l}) \end{array} $$
(30)

where μ is the step size to control the converging rate. For stability concern, we set the step size as:

$$\begin{array}{@{}rcl@{}} \mu &=& \left\{\begin{array}{ll} 1 \text{,\ \ if\ } \hat{\rho}_{l} \geq \rho_{t}\\ \kappa \text{,\ \ else} \end{array}\right. \\ \hat{\rho}_{l} &=& \frac{\gamma_{h,l}}{\gamma_{e,l}} \end{array} $$
(31)

where κ is a positive number and κ<1, and ρ t is a SNR threshold. The purpose of the setting is to slow the update speed when the SNR of path l is low, preventing possible divergence. Denote \(\hat {\mathbf {a}}_{l,i}^{k}\) as the array response with the estimated \(\hat {\phi }_{l,i}^{k}\) in the ith iteration. The gradient of \(Q(\phi _{l}|\hat {\phi }^{k}_{l})\) with respect to ϕ l can be found as:

$$\begin{array}{@{}rcl@{}} \nabla_{\hat{\phi}^{k}_{l,i}} Q(\phi_{l}|\hat{\phi}^{k}_{l}) &=& \frac{\partial}{\partial\phi_{l}}(-2\Re\{\hat{\alpha}^{k}_{l}\hat{\mathbf{h}}(k_{l})^{H}\mathbf{a}_{l}\})|_{\phi_{l}=\hat{\phi}^{k}_{l,i}}\\ &=& -2\Re\{\hat{\alpha}^{k}_{l}\hat{\mathbf{h}}(k_{l})^{H}\mathbf{D}\hat{\mathbf{a}}_{l,i}^{k}\} \end{array} $$
(32)

where D is a diagonal matrix with diagonal elements of {0,1,⋯ , M − 1}. The second derivative of \(Q(\phi _{l}|\hat {\phi }^{k}_{l})\) with respect to ϕ l can be derived as:

$$\begin{array}{@{}rcl@{}} \nabla^{2}_{\hat{\phi}^{k}_{l,i}} Q(\phi_{l}|\hat{\phi}^{k}_{l}) &=& \frac{\partial^{2}}{\partial(\phi_{l})^{2}} (-2\Re\{\hat{\alpha}^{k}_{l}\hat{\mathbf{h}}(k_{l})^{H}\mathbf{a}_{l}\})|_{\phi_{l}=\hat{\phi}^{k}_{l,i}}\\ &=& -2\Re\{\hat{\alpha}^{k}_{l}\hat{\mathbf{h}}(k_{l})^{H}\mathbf{D}^{2}\hat{\mathbf{a}}_{l,i}^{k}\}. \end{array} $$
(33)

The effectiveness of the EM algorithm greatly depends on its initial condition. Based on the model in Eq. 23, we propose a simple initial estimation for ϕ l given by:

$$\begin{array}{@{}rcl@{}} \hat{\phi}_{l} = \angle [ \frac{1}{M-1} \sum\limits_{m=2}^{M} (\hat{h}_{m-1}^{k_{l}})^{*}\hat{h}_{m}^{k_{l}} ] \end{array} $$
(34)

where \(\hat {h}_{m}^{k_{l}}\) is the mth component of \(\hat {\mathbf {h}}(k_{l})\) and [a] is the operation that takes the phase of the complex symbol a. This method is referred to as the correlation-based AoA estimation method, and it can serve as a good initial condition.

3.3 Performance Analysis

For the CIR estimation, the variance of the channel-gain estimation has been derived in Eq. 21 provided the path-delays are correctly detected. In this section, we will focus on the derivation of the CRLB of the AoA estimation. Denote the unknown parameter vector of an estimate as θ. The Fisher information matrix (FIM) I F I M associated with a complex-valued Gaussian distribution with mean μ(θ) and covariance matrix Q(θ) is given by [14]:

$$\begin{array}{@{}rcl@{}} [\mathbf{I}_{FIM}]_{ij}\!\! &=&\! 2(\frac{\partial \boldsymbol{\mu}}{\partial \boldsymbol{\theta}_{i}})^{H} \mathbf{Q}^{-1} (\frac{\partial \boldsymbol{\mu}}{\partial \boldsymbol{\theta}_{j}}) \,+\,tr(\mathbf{Q}^{-1} \frac{\partial \mathbf{Q}}{\partial \boldsymbol{\theta_{i}}} \mathbf{Q}^{-1} \frac{\partial \mathbf{Q}}{\partial \boldsymbol{\theta_{j}}}) \end{array} $$
(35)

where [I F I M ] i j is the (i, j)th element in I F I M . In our case, Q is independent of θ. Then, \(\frac {\partial \mathbf {Q}}{\partial \boldsymbol {\theta _{i}}}=0\) and Eq. 35 is reduced to

$$\begin{array}{@{}rcl@{}} [\mathbf{I}_{FIM}]_{ij} &=& 2(\frac{\partial \boldsymbol{\mu}}{\partial \boldsymbol{\theta}_{i}})^{H} \mathbf{Q}^{-1} (\frac{\partial \boldsymbol{\mu}}{\partial \boldsymbol{\theta}_{j}}). \end{array} $$
(36)

The CRLB of estimator \(\hat {\boldsymbol {\theta }}_{i}\) is then \([\mathbf {I}_{FIM}^{-1}]_{ii}\).

In AoA estimation, the estimated parameters are θ = [ϕ l , α l ]T. Using the signal model in Eq. 23, we can obtain μ(θ) = α l a l and Q = γ e, l I. From Eq. 36, we have FIM as:

$$\begin{array}{@{}rcl@{}} [\mathbf{I}_{FIM}]_{i,j} &=& 2\gamma_{e,l}^{-1}(\frac{\partial \boldsymbol{\mu}}{\partial \boldsymbol{\theta}_{i}})^{H}(\frac{\partial \boldsymbol{\mu}}{\partial \boldsymbol{\theta}_{j}}), \end{array} $$
(37)

and the derivatives of μ with respect to ϕ l and α l can be found as:

$$\begin{array}{@{}rcl@{}} \frac{\partial \boldsymbol{\mu}}{\partial \phi_{l}} &=& \alpha_{l}\mathbf{D}\mathbf{a}_{l},\\ \frac{\partial \boldsymbol{\mu}}{\partial \alpha_{l}} &=& \mathbf{a}_{l}. \end{array} $$
(38)

Define A = [α l D a l , a l ], and we can rewrite the FIM as:

$$\begin{array}{@{}rcl@{}} \mathbf{I}_{FIM} = 2\gamma^{-1}_{e,l}\mathbf{A}^{H}\mathbf{A} \end{array} $$
(39)

Denote \(\gamma _{\phi _{l}}\) as the variance of \(\hat {\phi }_{l}\), and the CRLB of \(\hat {\phi }_{l}\) can then be derived as:

$$\begin{array}{@{}rcl@{}} \gamma_{\phi_{l}} \geq \frac{1}{\rho_{l}} \cdot \frac{6}{M(M^{2}-1)} \end{array} $$
(40)

where \(\rho _{l}=\gamma _{e,l}^{2}/|\alpha _{l}|^{2}\) is the SNR of path l. As we can see, the CRLB of \(\hat {\phi }_{l}\) is proportional to the mean-square-error of path l and inverse proportional to M 3.

4 Simulation Results

In this section, we report simulation results evaluating the performance of the proposed algorithms. The mmWave channel model developed in [3, 34] is adopted for simulations. Normalized-mean-square-error (NMSE) is used to evaluate the performance of the proposed channel estimation method. The NMSE of channel estimation is defined as:

$$\begin{array}{@{}rcl@{}} NMSE= \frac{{\sum}_{m=1}^{M} {\sum}_{k=N_{l}}^{N_{u}} |\tilde{{h}}_{m}^{k}-\check{{h}}_{m}^{k}|^{2}}{{\sum}_{m=1}^{M} {\sum}_{n=N_{l}}^{N_{u}} |\tilde{{h}}_{m}^{k}|^{2}} \end{array} $$
(41)

where \(\tilde {{h}}_{m}^{k}\) is the kth component of \(\tilde {\mathbf {h}}_{m}\), \(\check {{h}}_{m}^{k}\) is the kth component of \(\check {\mathbf {h}}_{m}\), and N l and N u are the lowest and highest subcarrier used for data transmission, respectively. Here, \(\check {\mathbf {h}}_{m}\) is the estimated \(\tilde {\mathbf {h}}_{m}\), transformed from the time-domain estimated CIR. For AoA estimation, the mean-square-error (MSE) is used to evaluate the performance, which is defined as:

$$\begin{array}{@{}rcl@{}} MSE_{l}=E[(\phi_{l}-\hat{\phi}_{l})^{2}] \end{array} $$
(42)

where \(\hat {\phi }_{l}\) is the estimate of ϕ l .

4.1 Simulation Environment and Settings

The channel model developed in [3] is used in our simulations. It is indicated that the maximum number of detectable paths is four. Thus, we first consider the scenario that the number of paths is four. In [3], it also recommend that the number of paths can be modeled by a Poisson distribution with mean being two. Figure 2 shows the histogram of 100000 generated Poisson observations. To test the limitation of the proposed algorithms, we then consider another scenario that the number of paths is ten. As we can see from Fig. 2, this corresponds to the worst case in CIR estimation.

Figure 2
figure 2

Histogram of generated Poisson random variables with mean being 2.

The path loss in dB is formulated as a+10blog10(d)+ξ, where a, b,and ξ are carrier-frequency related parameters, and d is the distance between the transmitter and receiver. Here, ξ is a Gaussian random variable with zero-mean and variance σ 2, denoted by \(\xi \thicksim N(0,\sigma ^{2})\), modeling the shadowing effect. Let the carrier frequency be 28 GHz. In this case, a = 72, b = 2.92 and σ = 8.7. The normalized power profile follows an exponential decay function:

$$\begin{array}{@{}rcl@{}} \acute{\gamma}_{l} &=& \exp[\tau_{l}\frac{r_{\tau}-1}{\sigma_{\tau}r_{\tau}}]10^{-0.1Z_{l}}\\ \gamma_{l} &=& \frac{\acute{\gamma}_{l}}{{\sum}_{j=1}^{L} \acute{\gamma}_{j}} \end{array} $$
(43)

where γ l is gain of the lth path, L is the number of paths, σ τ is the root-mean-square (RMS) delay spread, τ l is the delay of the l path, \(Z_{l}\thicksim N(0,\zeta ^{2})\), and r τ is the model parameter which is 2.8 for 28 GHz, and 3.0 for 73 GHz. The path delay τ l is assumed to be exponentially distributed modeled by:

$$\begin{array}{@{}rcl@{}} \tau_{l} = -r_{\tau}\sigma_{\tau}\ln U_{l} \end{array} $$
(44)

where U l is a uniformly distributed random variable, denoted by \(U_{l} \thicksim U[0,1]\). We let the RMS delay spread σ τ be 42.5 ns, which is the maximum observed value in [34]. The AoA of each path is simply generated with a uniform distribution in [0,2π]. To evaluate the performance of AoA estimation, we let the delay τ l be generated once and then fixed, similar to the power delay profiles (PDP) defined in 3GPP [2] for link level simulations. The shadowing effect, ξ, is not modeled either. Also, the channel is assumed to be static during one OFDM symbol.

The setting for the OFDM system is summarized in Table 1. From Table 1, we can find that the sampling rate is 552.96 MHz, the number of subcarriers for data transmission in one OFDM symbol is 1852, the sampling interval is 1.8 ns, the OFDM symbol duration is 4147.2 ns, and the CP length is 460.8 ns. Note that the CP length is assumed to be larger than the maximum path delay.

Table 1 OFDM system parameters.

In our simulations, the path loss is absorbed in the SNR calculation. Assume that the implementation loss is 5 dB, then the noise power is −174+10log10(552.96×106)+5=−81.5731 dBm. The transmit power from BS is assumed to be 30 dBm. Based on above settings, we can then calculate the mean of the SNR as a function of the distance by 30+81.5731−(a+10blog10(d)). Figure 3 shows the mean of the SNR varied from −10 dB to 40 dB. As we can see, when d is 50 meters, the SNR is −10 dB. The low SNR is due to the high propagation loss in the mmWave channel. As mentioned, antenna arrays can be used to increase the SNR, extending the coverage range to a couple of hundred meters. In our simulations, we let the lowest SNR for signal transmission be −10 dB.

Figure 3
figure 3

SNR observed at various distance.

Similar to the LTE system, we place a reference signal every 6 subcarriers. Consequently, there are 309 subcarriers carrying reference signals with QPSK modulated symbols. For the proposed algorithm, the number of iterations in the EM algorithm is set to be 10. The SNR threshold ρ t is set to 5 in the M-step Newton’s method. When the path SNR is larger than the threshold, the number of update is set to be 20. Otherwise, the number of update is 40 with κ = 0.1. The channel estimation only uses the pilots in one OFDM symbol, which means no channel tracking is conducted.

4.2 Simulation Results

4.2.1 4-Path Scenario

The simulated PDP is depicted in Fig. 4. As seen, two sets of parameters need to be estimated. One set is the path delays and the other is the path gains. To test the optimality of the proposed algorithm, we first let the path delays be perfectly known, and only channel gains are to be estimated. In this case, note that both proposed ESP and EESP methods are reduced to the LS method with only L unknowns. Figure 5 shows the NMSE of the proposed channel estimate.

Figure 4
figure 4

PDP of simulated channels (AoAs: 0.266,−0.8742,−1.6163,0.9116).

Figure 5
figure 5

Performance of proposed channel estimate when path delays are perfectly known.

From Eq. 4, we see that the post SNR of the ZF channel estimate is the same as the system SNR since the power of transmitted reference signal is one. As shown in Fig. 5, the LS method provides 18 dB gain over the ZF method. This huge gain comes from the exploitation of the CIR sparsity in the array antennas. For the ZF method, the estimated channel responses cannot be coherently combined to obtain the array gain (12 dB). Even it can, the LS method can still outperform ZF by 6 dB.

Figure 6 shows the MSE of AoA estimate when the LS method is used. In this set of simulations, the random variable Z l in Eq. 43 is not modeled such that the CRLB of the AoA estimation can be evaluated. Here, the paths 1 to 4 are arranged such than their power is in the descending order. As we can see, the MSE curves of the proposed AoA estimation algorithm are almost overlapped with those of the CRLB. There is only one exception that the MSE of path 4 deviates from the CRLB when SNR is −10 dB. This is because in such low SNR condition, the EM algorithm can’t converge to the ML solution. However, since its power is weak, its contribution to the NMSE of the whole CIR estimation is relatively small. Figure 7 shows the comparison between the AoA estimation in the initial condition and the final estimation of the EM algorithm. To make the comparison easier to observe, we show the results of path 1 and path 4, representing the best and the worst paths. This comparison shows that the iterations in the EM algorithm provides about 5 dB gain. The initial estimation, which uses the correlation-based method, introduces correlation to the noise vector. When SNR is very low, it will seriously bias the estimation. As a result, the EM algorithm cannot converge to the optimum solution. From Fig. 7, we see that when SNR is -10dB, the EM algorithm does not provide any gain for the AoA estimation in path 4.

Figure 6
figure 6

Performance of proposed AoA estimate when path delays are perfectly known.

Figure 7
figure 7

Performance of proposed AoA estimation when path delays are perfectly known: improvement of EM iterations.

Then, we exam the performance of delay estimation by the proposed CIR algorithm. The missing probability is used as the performance measure, defined as N m, n /N r where N m, n is the number of runs for which the delay of path m is not correctly estimated and N r is the total number of runs. Figure 8 shows the missing probability for ESP and EESP. As shown, ESP and EESP achieve the similar missing probability. However, its missing probability seems still high for path 2 to 4 in low SNR regions. As we will see, however, this does no harms to the performance of EESP. Figure 9 shows the NMSE of the channel estimate with ESP and EESP when the path delays are not known. Comparing it with Fig. 5, we see that the NMSE yielded by EESP in Fig. 9 and that in Fig. 5 are almost the same. This indicates that only when the power of a path is small, will EESP miss the path and the contribution of the error to the overall NMSE is small. As we can also see, the performance of the estimate with EESP is better than that of ESP. The fact that the miss probability of ESP is similar for ESP and EESP only indicates that these two algorithms have the same probability to include the correct four paths (L = 4) in its twelve estimated paths (L s = 12). However, the other detected 8 paths in ESP may be different from those in EESP. In ESP, the L s paths are estimated by maximizing a correlation criterion. In EESP, however, they are estimated by minimizing a MSE criterion. Therefore, it is the criterion in Eq. 21 that makes the estimation-error variance of EESP smaller than that of ESP. ESP would require one additional LS operation on the selected L paths to perform as good as EESP.

Figure 8
figure 8

Missing probability of ESP and EESP.

Figure 9
figure 9

Performance of proposed channel estimates (ESP and EESP) when path delays are not known.

4.2.2 10-Path Scenario

Figure 10 depicts the simulated PDP of the 10-path scenario. Since the power of path 9 and 10 is too small, we neglect the performance evaluation of these two paths in our simulations. Figures 11 and 12 show the MSE of AOA estimates on path \(1\thicksim 4\) and path \(5\thicksim 8\), respectively. As we can see, for path \(1\thicksim 4\), the MSE curves of the proposed AoA algorithm almost overlap with those of the CRLB. For path \(5\thicksim 7\), their CRLBs are almost the same due to their similar power. The MSE of the proposed algorithm on these paths also overlap with the CRLB except for low SNR region. For path 8, its MSE curve is lower than that of the CRLB due to its small power. In this scenario, reliable AoA information on path \(1\thicksim 4\) can still be obtained. Figure 13 shows the NMSE of the proposed channel estimates. Note that the performance of ZF method is not shown in Fig. 13 since it is the same as that in Fig. 5. As shown, the LS estimate in the 10-path case is somewhat worse than in the 4-path case shown in Fig. 5. In the 10-path case, ESP performs almost the same as EESP. The performance difference between ESP, EESP and LS method is less than 1dB. Figures 14 and 15 show the missing probability for path \(1\thicksim 4\) and path \(5\thicksim 8\), respectively. As shown, ESP and EESP still have the same missing probability on all paths. It indicates that ESP and EESP have the same probability to include the correct ten paths. In this scenario, the indices of the other two paths (L s = 12) in ESP may be different from that in EESP. However, its influence on MSE is relatively small. Therefore, the estimation-error variance on the selected L paths is almost the same for ESP and EESP.

Figure 10
figure 10

PDP of simulated channels (AoAs: −1.7397,−1.4519, 0.8331,−1.1739,1.529,−1.94,−2.7823,1.9082,0.2106,1.738).

Figure 11
figure 11

Performance of proposed AoA estimate when the path delays are perfectly known.

Figure 12
figure 12

Performance of proposed AoA estimate when the path delays are perfectly known.

Figure 13
figure 13

Performance of proposed channel estimates (ESP, EESP, and LS with ideally detected path delays).

Figure 14
figure 14

Missing probability of ESP and EESP on path 1 to path 4.

Figure 15
figure 15

Missing probability of ESP and EESP on path 5 to path 8.

5 Conclusions

In this paper, we propose new matching-pursuit CS based methods for joint AoA and channel estimation in SIMO-OFDM systems. The proposed methods can fully exploit the sparsity of the channels and the estimation performance significantly outperforms the conventional ZF method. With the estimated channels, AoA’s can then be estimated by the ML method. The ML solution is found by the EM algorithm and simulations show that it can reach the Cramer-Rao lower bound (CRLB). The proposed methods required low computational complexity, facilitating its real-world applications. As known, the mmWave channels are highly sparse, and the path loss is severe. The proposed methods can simultaneously estimated AoA and CIR with frequency pilots, and can be used in future 5G systems for beamforming and data detection. While the proposed methods assume the use of digital arrays, they can be also extended to the scenario of hybrid arrays. It is also possible to include some transmitter processing technique such that the AoD can be estimated. Research in these topics is now underway.