CuSora: Real-time software radio using multi-core graphics processing unit

https://doi.org/10.1016/j.sysarc.2013.10.009Get rights and content

Abstract

The Sora platform, which is a fully programmable, high-performance software radio platform based on a commodity general purpose PC, has recently received significant attention. However, acceleration techniques used in Sora are too complicated for developers, which can prevent researchers from modifying physical layer (PHY) processing. This paper presents the CuSora platform, which integrates the Sora platform with a popular multi-core graphics processing unit (GPU) as the modem processor to achieve high-speed PHY signal processing. CuSora also exploits software techniques to fulfill requirements for real-time communication. A software controller is presented to achieve multi-mode communication. The features of the single-instruction multiple data parallel computation of the GPU are also employed to accelerate PHY processing. Several wireless protocols, such as WiFi (802.11a) or WiMAX (802.16), are demonstrated on the CuSora platform for verification. CuSora meets the requirement of real-time communication and has an excellent bit error ratio performance. CuSora has a higher performance, shorter development cycle, and better coding flexibility than the Sora platform.

Introduction

Software-defined radio (SDR) technology is designed to support various communication standards through software configuration without altering hardware platforms. Numerous SDR platforms are currently based on digital signal processors (DSPs) or field programming gate arrays (FPGAs). However, these platforms have several drawbacks. Although DSPs have good code flexibility, the arithmetic operation capability of these DSPs is insufficient in fulfilling the demand of real-time communication. In comparison, FPGAs provide high computation power to support the requirement of wireless communication, but developers need to learn hardware description languages and be familiar with the development environment of programming and debugging tools to use FPGAs. To overcome this problem, Tan [1] proposed Sora, an SDR platform based on general purpose processors (GPPs) that run on WiFi and LTE protocols. Several acceleration techniques in GPPs, such as the use of dedicated central processing unit (CPU) cores and lookup tables, and streaming single instruction multiple data (SIMD) extension (SSE) instructions, are used in Sora and promote the performance of baseband processing. However, Sora has the same problem as FPGAs. Developers should learn how to use dedicated CPU cores in a Windows driver and to utilize the SSE in instruction-level optimization. The code length is as long as hundreds of thousands of lines. Acceleration techniques used in Sora are too complicated for developers, which can prevent researchers from modifying physical layer (PHY) processing. Therefore, the study [2] that adopted Sora merely concentrated on the upper layer beyond the PHY layer, which prevents Sora from being utilized in other wireless standards.

The SDR platform based on GPUs can solve the need to compromise on either performance or efficiency. SDR algorithms in signal processing require math-intensive vector operations that are appropriate for parallel SIMD execution on the GPU platform. The development trend of GPUs matches Moore’s Law, and the peak performance can reach tera floating operations per second. Numerous GPU manufacturers recently proposed their own programming models. For example, the NVIDIA Corporation presented the Compute Unified Device Architecture (CUDA) [3], which provides a software environment that enables developers to use C as a high-level programming language and facilitates the development of high-performance applications through a sophisticated programming and debugging environment with considerably greater flexibility [4], [5], [6]. CUDA only focuses on parallelism of algorithms instead of new technologies in the system driver or new language learning in Sora. Beyond this, GPU has a lower cost and can be integrated into a commodity PC, which is adopted by Sora.

Some studies recently focused on wireless communication systems implemented for GPUs [7], [8], [9]. Kim was the first to propose the implementation of an SDR system that uses GPUs [7]. In their study, the authors presented two types of platforms: one for performance and another for flexibility. The platform for performance was only designed with the essential media access controller (MAC) layer to maximize system performance and reduce CPU usage. The platform for flexibility conversely had a radio framework, which can support the design of multiple protocols. Several GPU-based real-time communication protocols were proposed, such as DVB-T2 [8] and WiMAX [9]. However, WiMAX terminals in [7], [9] were implemented on the platform for performance, where service programs, such as the network and modem, used the least CPU usage. A radio framework was not included in their platform to smoothly operate the user application. Therefore, the system had low flexibility and does not allow for the deployment of various wireless standards. The radio framework, including the MAC layer, is considered on CuSora, in which system flexibility allows for the deployment of various wireless standards. The authors in [8] merely implemented the PHY blocks in the DVB-T2 protocol on the GPU platform and failed to realize the real-time communication of the DVB-T2 protocol. In comparison, the WiFi and WiMAX protocols on CuSora, which include the PHY and MAC layers, are both realized to achieve real-time performance.

In this paper, we propose a CUDA-based software radio named CuSora, which is a GPU-integrated SDR system based on the Sora platform. CuSora is a flexible platform that has a complete software and hardware system framework that can support the development of multiple protocols. We develop a software MAC to achieve communication of multiple protocols such as WiFi (802.11a) [10] and WiMAX (802.16) [11]. We employ an efficient parallel strategy to improve the performance of baseband signal processing. CuSora also exploits the high computation power of GPUs to accelerate the PHY layer, which achieves better throughput and a shorter frame duration than Sora.

This paper is organized into six sections. Section 2 provides a background on CUDA. Section 3 describes the architecture of CuSora, including the hardware and software components. Section 4 presents parallel strategy and optimization methods for accelerating the PHY modules. Section 5 discusses the performance evaluation. Section 6 concludes the paper.

Section snippets

CUDA

The GPU generally has a SIMD architecture in which multiple threads can perform a single instruction with independent sets of data. The CUDA architecture is depicted in Fig. 1. In the figure, CUDA comprises a logical and physical hierarchy. In the logical hierarchy, the CUDA structure consists of a grid, blocks, and threads. The device program is organized as a grid executed on the GPU, which is hierarchically governed as the grid-block-thread step. A grid is organized as a three-dimensional

Hardware

CuSora is a high-performance SDR platform that exploits GPUs as modem processors. Fig. 2 shows that the hardware platform of CuSora has three components: CPU, GPU, and front end. The front-end components of CuSora are Sora kits, which are utilized in the Sora platform [1]. The Sora kits include the RF module, USRP adapter, and radio control board (RCB). The RF module is used to receive/send the RF signal by using a dual-mode antenna and to conduct down-/up-conversion of the frequency. The USRP

Parallel strategy and optimization methods for accelerating PHY modules

In this section, we present a parallel strategy for accelerating the two types of PHY modules, i.e., blocked and consecutive modules. Several optimization methods are also exploited to enhance the throughput of the PHY modules.

Experiment setup

In the experiment, we adopted four modulation types for the 802.11a and the 802.16 protocols. For each modulation type, we used the highest code rate as an example. The code rate of RS code RRS, code rate of convolutional code RCC, the number of coded bits per sample NCBPS, and the number of coded bits per OFDM symbol NCBPM of each modulation are listed in Table 1. The maximum frame length is 12,694 bits, of which 12,672 are data bits, 16 are service parameter bits, and 6 are encoding tail

Conclusion

In this paper, we present a high-performance software radio platform that uses GPU as its modem processor for high-speed signal processing. The proposed platform is named CuSora, which is short for CUDA-based software radio. CuSora has a complete software and hardware system framework that can support the development of multiple protocols. The multi-mode MAC layer is presented to support multiple protocols. We integrated the GPU into the CuSora platform to accelerate PHY signal processing. We

Future work

CuSora is a software radio platform based on the CPU–GPU heterogeneous platform that aims to achieve rapid prototyping of wireless protocols by using GPUs. More protocols and SDR applications, such as MIMO-based protocols, on CuSora will be developed in future studies.

Rongchun Li received the B.S. in Computer Science and Technology from Wuhan University, Wuhan, China, in 2007, and M.S. in Computer Science and Technology from National University of Defense Technology, Changsha, China, in 2009. Currently, he is a Ph.D. candidate in the National Laboratory for Parallel and Distributed Processing, National University of Defense Technology. His research interests include wireless algorithms on GPU and reconfigurable architectures, and high performance wireless

References (15)

  • K. Tan et al.

    Sora: high-performance software radio using general-purpose multi-core processors

    Commun. ACM

    (2011)
  • J. Bajaj et al.

    Cognitive radio implementation in ISM bands with Microsoft SORA, 2011

    IEEE Int. Symp. Personal Indoor Mobile Radio Commun.

    (2011)
  • NVIDIA Corporation, NVIDIA CUDA Compute Unified Device Architecture Programming Guide version 4.0, available at...
  • C. Yang et al.

    Programming for scientific computing on peta-scale heterogeneous parallel systems

    J. Central S. Univ.

    (2013)
  • X. Yang et al.

    MPtostream: an OpenMP compiler for CPU–GPU heterogeneous parallel systems

    Sci. China-inf. Sci.

    (2012)
  • Q. Wu et al.

    A fast parallel implementation of molecular dynamics with the morse potential on a heterogeneous petascale supercomputer

    26th IEEE Int. Parallel Distributed Process. Symp.

    (2012)
  • J. Kim et al.

    Implementation of an SDR system using graphics processing unit

    IEEE Commun. Magazine.

    (2010)
There are more references available in the full text version of this article.

Cited by (9)

  • HOSVD prototype based on modular SW libraries running on a high-performance CPU+GPU platform

    2021, Journal of Systems Architecture
    Citation Excerpt :

    Specifically, HCPs composed of CPU + GPU are commonly used for prototyping because they can handle several requirements simultaneously, such as high performance and variable levels of parallelism, which, in turn, define a variable throughput, re-usability, flexibility, etc. [10–12]. Some examples of applications are signal processing [13], wireless communications [14], data processing for the internet [15], video processing [16], and deep learning [17]. Due to the large number of applications, important matrix decompositions are Singular Value Decomposition (SVD) [18], QR decomposition [19], Karuhnen Loeve Transformation (KLT) [20], and Principal Component Analysis (PCA) [21]; all these are commonly calculated based on iterative Jacobi rotations.

  • A DSEL for high throughput and low latency software-defined radio on multicore CPUs

    2023, Concurrency and Computation: Practice and Experience
  • Testing different channel estimation techniques in real-time software defined radio environment

    2020, International Journal of Advanced Computer Science and Applications
  • Real-time data transfer based on software defined radio technique using gnu radio/usrp

    2019, International Journal of Engineering and Advanced Technology
  • Software radio architecture: A mathematical perspective

    2019, EAI/Springer Innovations in Communication and Computing
View all citing articles on Scopus

Rongchun Li received the B.S. in Computer Science and Technology from Wuhan University, Wuhan, China, in 2007, and M.S. in Computer Science and Technology from National University of Defense Technology, Changsha, China, in 2009. Currently, he is a Ph.D. candidate in the National Laboratory for Parallel and Distributed Processing, National University of Defense Technology. His research interests include wireless algorithms on GPU and reconfigurable architectures, and high performance wireless transceiver designs.

Yong Dou received his B.S., M.S., and Ph.D. degrees in Computer Science and Technology at National University of Defense Technology in 1995. Now he is a professor and Ph.D. supervisor in the National Laboratory for Parallel and Distributed Processing, National University of Defense Technology. He is senior membership of China Computer Federation and a member of the IEEE and ACM. His research interests include high performance computer architecture, high performance embedded microprocessor, reconfigurable computing, and software defined radio.

Jie Zhou received his D.S. degree in Computer Science and Technology at National University of Defense Technology in 2011, and now he is an assistant professor at National University of Defense Technology. His research interests include software-defined radio, MIMO, and high performance computer architecture.

Lin Deng received his D.S. degree in Computer Science and Technology at National University of Defense Technology in 2011, and now he is an assistant professor at National University of Defense Technology. His research interests include CPU architecture design and its applications.

Shi Wang received his M.S. degree in National University of Defense Technology in 2009. His research interests include software-defined radio and equipment management.

Supported by National Science Foundation of China (61125201).

View full text