# Hierarchical Matched Filter Based on FPGA for Mobile Systems

Dr. Abdul-Karim A-R. Kadhim Nahrain University Baghdad, Iraq Email: ak kadhim@yahoo.com

**Abstract** The paper is concerned with the design of Field Programmable Gate Array (FPGA) based Hierarchical Matched Filter (HMF) for new generation cellular mobile radio systems. This is considered as the most important task of the receiver and should be implemented with minimum hardware. The synchronization codes considered in the work is the one used in W-CDMA system as described by the 3GPP specifications. The proposed system relies on the use of the hierarchical matched filter using transposed structure of the FIR filter. The HMF implementation is verified to perform the required partial correlation. A simplified design for the realization of the filter tap is presented. This has the advantage of being configurable and flexible to reduce the overall hardware complexity of the system.

Key words: Code search, FPGA, Matched filter, FIR, W-CDMA.

## I. Introduction

W-CDMA (Wideband Code Division Multiple Access) and CDMA2000 systems both use DS-CDMA technology for 3G (Third Generation) and future cellular systems. In these systems, spreading codes are used to differentiate physical channels from the same transmitter, and scrambling codes are used to differentiate transmitters [1].

In synchronous systems such as CDMA2000, the mobile station searches the code timing for initial code acquisition. On the other hand, in an asynchronous W-CDMA system each base station uses its own scrambling code, the mobile station searches for both timing and code uncertainty for initial code synchronization. The use of 512 complex Gold codes in the W-CDMA makes it impractical to exhaustively search all the possible codes for the code timing. This problem can be simplified by using the three-step cell search scheme in the W-CDMA system, adopted by third Generation Partnership Project (3GPP) specifications [1-3].

Three-step cell search scheme operation in the W-CDMA can be processed by using three channels. Primary Synchronization Channel (P-SCH), Secondary Synchronization Channel (S-SCH), and Common Pilot Channel (CPICH). P-SCH provides information on slot timing which consists of 2560 chip timing candidates. S-SCH provides information on the code group and frame

Dr. Abdul-Aziz A. Hussain University of Technology Baghdad, Iraq Email: engabd49@yahoo.com

boundary which consists of 960 candidates. Then, one candidate among eight scrambling codes in a code group has to be selected by the cell searcher. Among the many synchronization signals, P-SCH is a common sequence of 256 chips that is periodically transmitted by each cell. The terminal looks for this common sequence to establish the slot boundary of at least one of the cells. Without this first step of processing to establish a coarse synchronization, the search of the cell specific scrambling code over all possible offsets will be very complicated [4].

The first step is the most elaborating process since it should resolve the largest amount of uncertainty. In the three step search, the receiver searches for the slot timing by correlating the received signal with the P-SCH code using match filter (MF). In addition to code and time uncertainty, the degree of frequency uncertainty can be large during initial search which may be as large as 20 kHz. This frequency offset is due to the uncertainty of receiver's inaccurate crystal oscillator. This frequency error is taken care of by partial symbol despreading and noncoherent combining [1].

Currently, most of the hardware solutions for mobile terminal implementation are a combination of applicationspecific integrated circuits (ASICs) and digital signal processor (DSP) devices. The rapidly expanding 3G and future mobile terminals will necessitate integrating reconfigurable architectures with System-on-a-Chip (SoC) solution. Reconfigurable architectures based on FPGA open new dimensions for signal processing engineer, maintaining the flexibility of software-based solutions, but with performance closer to the ASIC. FPGA based DSP systems allow the designer to construct signal processing hardware that is closely corresponding to the natural data flow of the desired algorithm, as well as to exploit rich parallel characteristic of so many DSP algorithms [5].

In any hardware implementation, there is strong economic imperative to minimize the number of complexity of the arithmetic operations employed in the data path. This paper propose a design for a configurable transposed form match filter (MF) for Primary Synchronization Code (PSC) detection based on FPGA, with the complexity of multiply accumulate (MAC) in the filter to be minimized. The aim is to reduce the hardware complexity using FPGA logic resources required to realize the filter. The paper is organized as follows; in Section-2 an overview of P-SCH signal structure is provided together with slot synchronization procedure. Section-3 describes the background for MF design, while Section-4 discusses the configurable Hierarchical MF (HMF) structure and simulation results. Finally, the conclusion is presented in Section-5.

### **II. P-SCH Code Structure**

During step-1 of the cell search procedure, the MS uses the P-SCH's Primary Synchronization Code (PSC) to acquire slot synchronization to a cell. This is typically done with a single matched filter matched to the PSC which is common to all cells. The starting position of the synchronization code may be determined from observations over one slot duration. PSC consists of 256 chips and is termed Generalized Hierarchical Golay Sequence (GHGS) denoted by Cpsc, which can be generated by two-component sequences  $X_1$  and  $X_2$  of length 16 as shown in the following equations [6] :

 $\begin{array}{l} X_1 = [1,1,1,-1,-1,1,1,1,-1,1,-1,1,1] \\ X_2 = [1,1,1,1,1,1,-1,-1,1,-1,1,-1,1,-1,1] \\ Cpsc = X_2(n.mod.16) + X_1(n.div.16) mod 2 \qquad (1) \\ where n = 0, 1, \ldots, 255, \quad (n \ div \ 16) \ and \ (n \ mod \ 16) \ are \\ the remainder \ of \ n/16 \ and \ the \ quotient, \ respectively. \end{array}$ 

A matched filter correlator is used to detect the received samples with the PSC. Frequency offset greatly affects the performance of the matched filter. Hence the length of the matched filter has to be chosen carefully to avoid incoherency loss that arises from frequency offset [7]. When the frequency inaccuracy of the crystal oscillator in the receiver is as much as 10 ppm (part per million), the frequency offset achieves 20 kHz when the carrier operates at 2 GHz. Under these conditions, it is known that the scheme of 64 chips partial correlation is the most appropriate choice [1]. The PSC can be treated as a concatenation of a few short sequences. The Efficient Golay Correlators (EGC) can not be employed because they can not perform the partial correlation [8,9]. A hierarchical MF correlator with low complexity can be used [10], to implement this partial correlation. A reduced complexity FIR filter can be efficiently implemented using the transposed FIR structure [5] as shown in Figure-1.

## **III. Matched Filter Design**

The traditional MF design employs a taped delay line, with a coefficient multiplication at each tap. This MF uses an adder tree to sum the result of multiplications to produce the filter output. This adder tree is difficult to construct (scale) for configurable length filters and when pipelined, can add considerable filter latency (complexity) of hardware which means increase in power consumption. The transposed (reverse) form MF structure corrects the shortcomings in the traditional MF structure. In the transpose form, structure the data samples are supplied simultaneously to all coefficient multipliers [6].



Figure-1 Structure of the hierarchical matched filter

Delay register in the adder chain restores the effect of the delay line. This implementation scales easily and its latency is no greater than the filter length. Since the MF coefficients are limited to  $\pm 1$ , the multiplications reduced to selective negation operations. As adder/subtractor, the adder incorporates this negation, eliminating the need for separate multiplication operation. The input data is represented by digital words. The tap shift registers have a width of several bits (typically 4 to 16 bits). If the oversampling rate is 4 times the chip rate, the matched filter observes all the input samples, but only performs the correlation process with samples separated by the chip period. However, higher sampling rates result in higher processing rate, and a presumed requirement of more FPGA resources.

The transposed form MF flow of data aids performance and enables parametrizable number of filter taps. Although the total number of adder bits is large, the performance is improved and the ease of design is simplified. The hierarchical matched filter consists of two concatenated matched filter blocks. The design in Figure-2 shows the transposed form HMF using 16 chips and 16 symbols accumulations, for a 256 chips length sequence. The transposed form HMF, Figure-2, shows the reconfigurable hierarchical matched filter using two transposed MF blocks with flexible hardware and easier to construct for reconfigurable design.

The first MF receives the input signal serially from BS. For each clock cycle (sys. clock = 3.84 MHz), the output of the first MF is stored in the memory cells of the second MF taps. These memory cells are 256 shift registers, if a straight forward design of MF is used. Whereas in the proposed design, these 256 registers divided into 16 registers. These registers are optional and of configurable variable length within each filter tap. In this work a new filter tap design is proposed and shown in Figure-3. This design has the advantage that it is configurable and flexible serving the aim to reduce hardware complexity as can be seen herein after.



Figure-2 Transposed form HMF structure

Another hardware reduction is proposed in this work by reducing the number of memory cells in 2<sup>nd</sup> MF to 16 instead of 256 cells. This is achieved by using two clocks, the system clock ( 3.84 MHz in W-CDMA ) and a slow clock which runs at 1/16 of the system clock, as shown in Figure-2. This means that instead of using 256 memory cells (16 cells in each tap), at the same rate in straight forward implementation, a 16 memory cells are used (1cell in each tap). This gives a reduction of 240 memory cells, which results in a significant gain in power consumption and reduction in hardware.

### **IV. Configurable HMF Structure**

The main component of FIR filters is the MAC tap. Figure-4 shows the structure of the proposed configurable MF block (mfstage) for the HMF. It has (Ind) and (Inc) as the received data samples and the P-SCH code, respectively. The proposed tap (mftap), as shown in Figure-3, can be configured according to the design. A 16-bit delay line is an option which could be used for over sampling from 1 to 16 or rejected if not necessary using the configuration input "cnf". Table-1 describes the input/output terminals of the tap. Figure-5 shows the configurable HMF which consists of two "mfstage" and a code register to store PSC code.

The filter is implemented and tested using the Xilinx Navigator ISE 4.1i and the Modelsim 5.4a. The hardware complexity is shown in Table-2 using Xilinx Virtex-E (xcv300e-bg432) chip. As can be seen from this table, the task of the HMF can be implemented successfully on the assumed chip. The other required functions performed by the transmitter/receiver components may give an idea about the suitable device to be used. These functions are beyond the scope of the present work.







Figure-4 Configurable matched filter tap



Figure-5 Configurable FPGA HMF Module

 Table -1

 Input/Output pins for matched filter taps

| Symbol | Meaning                   |
|--------|---------------------------|
| Ind    | 8-bit input data          |
| code   | 1-bit reference code      |
| prev.  | input from previous stage |
| rst    | reset (1-bit)             |
| clk    | input clock               |
| cnf    | 1-bit configuration flag  |
| rslt   | 16-bit tap output         |

A behavioral testing is performed using Xillinx Modsim 5.4a. The testing parameters are as follows; the received data sample is 8-bit word, system clock is 20 MHz, the output is a 16-bit word. Figure-6 shows the tested timing waveforms. The input data started at 100 ns. The peak output is appeared after 1800 ns with peak value of 32512. As a result of the timing simulation involved, the minimum clock that can be handled is about 6.08 ns. This means that a maximum clock of about 164 MHz can be tolerated by the proposed implementation.



Figure-6 The behavioral performance timing waveforms

### V. Conclusion

A design of hierarchical matched filter for primary code synchronization channel is presented in the work. This is an important step in receiving a synchronized data for new generation cellular mobile systems. The main design features of the proposed system are the configurability and flexibility. A simple implementation of the tap used in the matched filter is presented. The designed component is software implemented using Virtex-E and proved to work successfully. A max clock frequency of about 164 MHz can be used. A reduction of 240 memory cells (each 16-bit register) is achieved by reducing the clock frequency to the second MF block in the proposed HMF. This means a reduction of about 88% if the same clock frequencies, for both MF blocks, are used.

### References

1- Y. Wang & T. Ottson, "Cell Search in WCDMA", IEEE Journal on Selected Areas in Communications, Vol. 18, No. 8, August 2000.

Table-2 Design summary using cv300e-bg432 chip

| Element Type              | Number of used<br>elements | Percentage<br>of used<br>elements |
|---------------------------|----------------------------|-----------------------------------|
| Slices                    | 653 out of 3,072           | 21%                               |
| Flip-Flop Slices          | 554 out of 6,144           | 9%                                |
| 4-input LUTs              | 1,090 out of<br>6,144      | 17%                               |
| Bonded IOBs               | 27 out of 316              | 8%                                |
| GCLKs                     | 1 out of 4                 | 25%                               |
| GCLKIOBs                  | 1 out of 4                 | 25%                               |
| Total<br>Equivalent gates | 12,934                     |                                   |

2- K. Higuchi, M.Sawahashi, and F. Adachi, "Fast cell search algorithm in inter-cell asynchronous DS-CDMA mobile radio", IEICE Trans. Commun. , Vol.E-81, No. 7, July 1998.

3- 3<sup>rd</sup> Generation Partnership Project, "Spreading and modulation (FDD)", 3GPP Tech. Spec.,TS 25.213, V4.0.0 (2001-2003).

4- J. Guey, Y. Wang & J. Cheng, "Improving the robustness of Target Cell Search in WCDMA Using Interference Cancellation", IEEE International Conference on Wireless Network, Communication and Mobile Computing 2005.

5- C. Dick, F. Harris, "Configurable Logic for Digital Communications: Some Signal Processing Perspectives", IEEE Communication Magazine, August 1999.

6- Siemens and Texas Instruments, "Generalized Hierarchical Golay Sequence for PSC With Low Complexity Correlation Using Pruned Efficient Golay Correlators", 3GPP TSG RAN WG1 TSGR1-554/99,1999.

7- Y. Wang & T. Ottson,"Initial Frequency Acquisition in WCDMA", IEEE Conference VTC'99.

8- J. Moon, Y. Lee, "Rapid Slot Synchronization in the Presence of Large Frequency Offset and Doppler Spread in WCDMA Systems", IEEE Trans. on Wireless Communications, Vol. 4, No. 4, July 2005.

9- J. Moon, Y. Lee, "Cell Search Robust to Initial Frequency offset in WCDMA systems", IEEE Conference PIMRC 2002.

10- Q. Cai, A. Wilzeck & T. Kaiser, "A Compound Method for Initial Frequency Acquisition in WCDMA Systems", IEE, The 2<sup>nd</sup> IEE/EURASIP Conference on DSP Enabled Radio, 19-20 September 2005.