ABSTRACT
Wireless communication standards such as Long Term Evolution (LTE) are rapidly changing to support the high data rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation applications enabled by the 5G standard. Existing base station transceivers utilize customized Digital Signal Processing (DSP) cores or fixed-function hardware accelerators for physical layer baseband processing. However, these approaches incur significant non-recurring engineering costs and are inflexible to newer standards or updates. Software programmable processors offer more adaptability. However, it is challenging to sustain guaranteed worst-case latency and throughput at reasonably low-power on shared-memory many-core architectures featuring inherently unpredictable design choices, such as caches and network-on chip. We propose SPECTRUM, a predictable software defined many-core architecture that exploits the massive parallelism of the LTE baseband processing. The focus is on designing a scalable lightweight hardware that can be programmed and defined by sophisticated software mechanisms. SPECTRUM employs hundreds of lightweight in-order cores augmented with custom instructions that provide predictable timing, a purely software-scheduled on-chip network that orchestrates the communication to avoid any contention and per-core software controlled scratchpad memory with deterministic access latency. Compared to a many-core architecture like Skylake-SP (average power 215W) that drops 14% packets at high traffic load, 256-core SPECTRUM by definition has zero packet drop rate at significantly lower average power of 24W. SPECTRUM consumes 2.11x lower power than C66x DSP cores+accelerator platform in baseband processing. SPECTRUM is also well-positioned to support future 5G workloads.
- 2009. Alcatel-Lucent 9926 digital 2U eNodeB baseband unit. Alcatellucent product brief.Google Scholar
- 2010. Amber ARM-Compatible Core. https://opencores.org/project, amber .Google Scholar
- 2011. LTE baseband targeted design platform. Xilinx product brief. http://www.origin.xilinx.com/publications/prod_mktg/LTE-Baseband-SellSheet.pdf.Google Scholar
- 2011. Temperature Control Solution of Communication Base Station. https://bit.ly/2Bpa9jH .Google Scholar
- 2012. LTE baseband targeted design platform. Xilinx product brief. https://www.intel.com/content/dam/alterawww/global/en_US/pdfs/literature/po/wireless-channel-card.pdf.Google Scholar
- 2012. Octean Fusion-M CN73XX. https://bit.ly/2TypyW7.Google Scholar
- 2013. 66AK2Hxx Multicore DSP+ARM Keystone II SoC. https://bit.ly/2zgPDjO.Google Scholar
- 2013. QorIQ ® Qonverge B4860 Baseband Processor. https://bit.ly/2uT6lnp.Google Scholar
- 2013. SoC and ASIC Design At Ericsson. https://bit.ly/2TOMLmP .Google Scholar
- 2014. Open Air Interface. http://www.openairinterface.org/.Google Scholar
- 2016. Transcede t3K Concurrent Dual-Mode SoC Family Communiation Infrastructure. https://intel.ly/2OvK4aY.Google Scholar
- 2017. LTE 3GPP releases Overview. https://bit.ly/2DNNnoh.Google Scholar
- 2018. Personal Communication with base station manufacturer.Google Scholar
- Sebastian Altmeyer et al. 2014. Evaluation of cache partitioning for hard real-time systems. In ECRTS. Google ScholarDigital Library
- Oren Avissar, Rajeev Barua, and Dave Stewart. 2002. An Optimal Memory Allocation Scheme for Scratch-pad-based Embedded Systems. ACM Trans. Embed. Comput. Syst. 1, 1 (Nov. 2002), 6–26. Google ScholarDigital Library
- Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad Memory: Design Alternative for Cache On-chip Memory in Embedded Systems. In Proceedings of the Tenth International Symposium on Hardware/Software Codesign (CODES ’02). ACM, New York, NY, USA, 73–78. Google ScholarDigital Library
- Sandro Belfanti, Christoph Roth, Michael Gautschi, Christian Benkeser, and Qiuting Huang. 2013. A 1Gbps LTE-advanced turbo-decoder ASIC in 65nm CMOS. In VLSI Circuits (VLSIC), 2013 Symposium on. IEEE.Google Scholar
- Paul Bender, Peter Black, Matthew Grob, Roberto Padovani, Nagabhushana Sindhushayana, and Andrew Viterbi. 2010. CDMA/HDR: A bandwidth-efficient high-speed wireless data service for nomadic users. In The Foundations Of The Digital Wireless World: Selected Works of AJ Viterbi. World Scientific, 161–168.Google Scholar
- Sourjya Bhaumik, Shoban Preeth Chandrabose, Manjunath Kashyap Jataprolu, Gautam Kumar, Anand Muralidhar, Paul Polakos, Vikram Srinivasan, and Thomas Woo. 2012. CloudIQ: A framework for processing base stations in a data center. In Proceedings of the 18th annual international conference on Mobile computing and networking. ACM. Google ScholarDigital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1–7. Google ScholarDigital Library
- Ouajdi Brini and Mounir Boukadoum. 2017. Virtualization of the LTE physical layer symbol processing with GPUs. In New Circuits and Systems Conference (NEWCAS), 2017 15th IEEE International. IEEE.Google ScholarCross Ref
- Dai Bui, Alessandro Pinto, and Edward A Lee. 2009. On-time network on-chip: Analysis and architecture. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-59 (2009).Google Scholar
- Dai N Bui, Hiren D Patel, and Edward A Lee. 2010. Deploying hard real-time control software on chip-multiprocessors. In Embedded and Real-Time Computing Systems and Applications (RTCSA), 2010 IEEE 16th International Conference on. IEEE, 283–292. Google ScholarDigital Library
- Divya Chitimalla, Koteswararao Kondepu, Luca Valcarenghi, and Biswanath Mukherjee. 2015. Reconfigurable and efficient fronthaul of 5G systems. In 2015 IEEE International Conference on Advanced Networks and Telecommuncations Systems, ANTS 2015, Kolkata, India, December 15-18, 2015. 1–5.Google ScholarCross Ref
- Christoph Cullmann et al. 2010. Predictability considerations in the design of multi-core embedded systems. RTSS.Google Scholar
- W. J. Dally. 1992. Virtual-Channel Flow Control. IEEE Trans. Parallel Distrib. Syst. 3, 2 (March 1992), 194–205. Google ScholarDigital Library
- Benoît Dupont de Dinechin, Pierre Guironnet de Massas, Guillaume Lager, Clément Léger, Benjamin Orgogozo, Jérôme Reybert, and Thierry Strudel. 2013. A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore Processor.. In ICCS, Vol. 13.Google ScholarCross Ref
- Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. 2005. Heap Data Allocation to Scratch-pad Memory in Embedded Systems. J. Embedded Comput. 1, 4 (Dec. 2005), 521–540. Google ScholarDigital Library
- Stephen A Edwards and Edward A Lee. 2007. The case for the precision timed (PRET) machine. In 2007 44th ACM/IEEE DAC. IEEE, 264–265. Google ScholarDigital Library
- R. Damodaran et al. 2012. A 1.25GHz 0.8W C66x DSP Core in 40nm CMOS. In VLSID. Google ScholarDigital Library
- Heiko Falk et al. 2007. Compile-time decided instruction cache locking using worst-case execution paths. In CODES+ISSS. Google ScholarDigital Library
- Heiko Falk et al. 2009. Optimal static WCET-aware scratchpad allocation of program code. In DAC. Google ScholarDigital Library
- Arnon Friedmann and Sandeep Kumar. 2009. LTE emerges as early leader in 4G technologies. In White Paper. Texas Instruments.Google Scholar
- Nan Guan et al. 2009. Cache-aware scheduling and analysis for multicores. In EMSOFT. Google ScholarDigital Library
- Andreas Hansson, Kees Goossens, and Andrei Rˇadulescu. 2005. A Unified Approach to Constrained Mapping and Routing on Networkon-chip Architectures. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS ’05). ACM, New York, NY, USA, 75–80. Google ScholarDigital Library
- Andreas Hansson, Mahesh Subburaman, and Kees Goossens. 2009. Aelite: A Flit-synchronous Network on Chip with Composable and Predictable Services. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE ’09). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, 250–255. Google ScholarDigital Library
- S. Hesham, J. Rettkowski, D. Goehringer, and M. A. Abd El Ghany. 2017. Survey on Real-Time Networks-on-Chip. IEEE Transactions on Parallel and Distributed Systems 28, 5 (May 2017), 1500–1517. Google ScholarDigital Library
- Huawei. 2017. Base Station Operation Increases the Efficiency of Network Construction. https://bit.ly/2GtCd6N .Google Scholar
- Yiming Huo, Xiaodai Dong, and Wei Xu. 2017. 5G cellular user equipment: From theory to practical hardware design. IEEE Access 5 (2017).Google Scholar
- Xianfeng Li et al. 2007. Chronos: A timing analyzer for embedded software. Science of Computer Programming (2007). Google ScholarDigital Library
- Jing Lu, Ke Bai, and Aviral Shrivastava. 2015. Efficient Code Assignment Techniques for Local Memory on Software Managed Multicores. ACM Trans. Embed. Comput. Syst. 14, 4, Article 71 (Dec. 2015), 24 pages. Google ScholarDigital Library
- Timothy G Mattson, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Vangal, Nitin Borkar, Greg Ruhl, et al. 2010. The 48-core scc processor: The programmer’s view. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1–11. Google ScholarDigital Library
- S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006. A Methodology for Mapping Multiple Use-Cases onto Networks on Chips. In Proceedings of the Design Automation Test in Europe Conference, Vol. 1. 1–6. Google ScholarDigital Library
- Imtiaz Parvez, Ali Rahmati, Ismail Guvenc, Arif I Sarwat, and Huaiyu Dai. 2017. A Survey on Low Latency Towards 5G: RAN, Core Network and Caching Solutions. arXiv preprint arXiv:1708.02562 (2017).Google Scholar
- Klaus I Pedersen, Gilberto Berardinelli, Frank Frederiksen, Preben Mogensen, and Agnieszka Szufarska. 2016. A flexible 5G frame structure design for frequency-division duplex cases. IEEE Communications Magazine 54, 3 (2016), 53–59.Google ScholarDigital Library
- Maxime Pelcat, Karol Desnos, Julien Heulot, Clément Guy, Jean François Nezan, and Slaheddine Aridhi. 2014. Preesm: A dataflowbased rapid prototyping framework for simplifying multicore dsp programming. In EDERC. 36.Google Scholar
- Martin Schoeberl, Sahar Abbaspour, Benny Akesson, Neil Audsley, Raffaele Capasso, Jamie Garside, Kees Goossens, Sven Goossens, Scott Hansen, Reinhold Heckmann, et al. 2015. T-CREST: Time-predictable multi-core architecture for embedded systems. Journal of Systems Architecture 61, 9 (2015), 449–471. Google ScholarDigital Library
- Martin Schoeberl, Florian Brandner, Jens Sparsø, and Evangelia Kasapaki. 2012. A Statically Scheduled Time-Division-Multiplexed Network-on-Chip for Real-Time Systems. In Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip (NOCS ’12). IEEE Computer Society, Washington, DC, USA, 152–160. Google ScholarDigital Library
- Philipp Schulz, Maximilian Matthe, Henrik Klessig, Meryem Simsek, Gerhard Fettweis, Junaid Ansari, Shehzad Ali Ashraf, Bjoern Almeroth, Jens Voigt, Ines Riedel, et al. 2017. Latency critical IoT applications in 5G: Perspective on the design of radio interface and network architecture. IEEE Communications Magazine 55, 2 (2017), 70–78. Google ScholarDigital Library
- Silexica. 2016. Multi-core Software Design For an LTE Base Station, White Paper. https://bit.ly/2TyE7sx.Google Scholar
- Magnus Sjalander, Sally A. McKee, Peter Brauer, David Engdal, and Andras Vajda. 2012. An LTE Uplink Receiver PHY Benchmark and Subframe-based Power Management. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS ’12). IEEE Computer Society, Washington, DC, USA, 25–34. Google ScholarDigital Library
- Avinash Sodani. 2015. Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor. In Hot Chips 27 Symposium (HCS). IEEE, 1–24.Google ScholarCross Ref
- Manikantan Srinivasan, C Siva Ram Murthy, and Anusuya Balasubramanian. 2015. Modular performance analysis of Multicore SoC-based small cell LTE base station. In Very Large Scale Integration (VLSI-SoC), 2015 IFIP/IEEE International Conference on. IEEE, 37–42.Google ScholarCross Ref
- Christoph Studer, Christian Benkeser, Sandro Belfanti, and Quiting Huang. 2011. Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE. IEEE Journal of Solid-State Circuits 46, 1 (2011).Google ScholarCross Ref
- Vivy Suhendra et al. 2005. WCET centric data allocation to scratchpad memory. In RTSS. Google ScholarDigital Library
- Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Johnson, et al. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE micro (2002). Google ScholarDigital Library
- Sumesh Udayakumaran, Angel Dominguez, and Rajeev Barua. 2006. Dynamic Allocation for Scratch-pad Memory Using Compile-time Decisions. ACM Trans. Embed. Comput. Syst. 5, 2 (May 2006), 472–511.Google ScholarDigital Library
- Theo Ungerer, Francisco Cazorla, Pascal Sainrat, Guillem Bernat, Zlatko Petrov, Christine Rochange, Eduardo Quinones, Mike Gerdes, Marco Paolieri, Julian Wolf, et al. 2010. Merasa: Multicore execution of hard real-time applications supporting analyzability. IEEE Micro 30, 5 (2010), 66–75. Google ScholarDigital Library
- Leslie G. Valiant. 1982. A scheme for fast parallel communication. SIAM journal on computing 11, 2 (1982), 350–361.Google Scholar
- Vanchinathan Venkataramani, Mun Choon Chan, and Tulika Mitra. 2019. Scratchpad-Memory Management for Multi-Threaded Applications on Many-Core Architectures. ACM Transactions on Embedded Computing Systems (TECS) 18, 1 (2019), 10.Google ScholarDigital Library
- Xavier Vera et al. 2007. Data cache locking for tight timing calculations. TECS (2007). Google ScholarDigital Library
- Reinhard Wilhelm et al. 2008. The worst-case execution-time problemoverview of methods and survey of tools. TECS. Google ScholarDigital Library
- Qi Zheng, Yajing Chen, Ronald G. Dreslinski, Chaitali Chakrabarti, Achilleas Anastasopoulos, Scott A. Mahlke, and Trevor N. Mudge. 2013. WiBench: An open source kernel suite for benchmarking wireless systems. In Proceedings of the IEEE International Symposium on Workload Characterization, IISWC 2013, Portland, OR, USA, September 22-24, 2013.Google ScholarCross Ref
- Qi Zheng, Yajing Chen, Hyunseok Lee, Ronald Dreslinski, Chaitali Chakrabarti, Achilleas Anastasopoulos, Scott Mahlke, and Trevor Mudge. 2015. Using Graphics Processing Units in an LTE Base Station. Journal of Signal Processing Systems 78, 1 (01 Jan 2015), 35–47. Google ScholarDigital Library
Index Terms
- SPECTRUM: a software defined predictable many-core architecture for LTE baseband processing
Recommendations
SPECTRUM: A Software-defined Predictable Many-core Architecture for LTE/5G Baseband Processing
Special Issue on LCETES, Part 1, Real-Time, Critical Systems, and ApproximationWireless communication standards such as Long-term Evolution (LTE) are rapidly changing to support the high data-rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation ...
Using Graphics Processing Units in an LTE Base Station
Base stations have been built from ASICs, DSP processors, or FPGAs. This paper studies the feasibility of building wireless base stations from commercial graphics processing units (GPUs). GPUs are attractive because they are widely used massively ...
Comments