Skip to main content

Advertisement

Log in

Exploring the Design Space of an Energy-Efficient Accelerator for the SKA1-Low Central Signal Processor

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The Square Kilometre Array (SKA) will be the biggest radio telescope ever built, with unprecedented sensitivity, angular resolution, and survey speed. Collectively, the SKA’s antennas are expected to gather exabytes of data per second and store one petabyte of data every day, requiring exa operations per second for the processing. This paper focuses on the SKA1-Low, the SKA’s aperture-array instrument consisting of 131,072 antennas that will be built in the first phase of the deployment of the project. In particular, our work explores the design of a custom architecture for the central signal processor (CSP) of the SKA1-Low. The CSP processes digitized samples sent by antennas receiving extra-terrestrial radio-frequency signals between 50 and 350 MHz. We describe the challenges in building the CSP, and present a quantitative study for the implementation of a custom hardware architecture for executing the main CSP algorithms. By taking advantage of emerging 3D-stacked-memory devices and by exploring the design space for a 14-nm implementation, we estimate a power consumption of 9.62 W for processing all channels of a sub-band and an energy efficiency at application level of up to 312 GFLOPS/W for our architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. In the CSP specifications, input samples are received as 16-bit long (i.e., 8 bits for both the real and imaginary part), while output samples are 64-bit long. For sake of simplicity, we consider samples written to the accelerator’s memory as 64-bit values.

  2. IBM and Blue Gene is a trademark of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product or service names may be trademarks or service marks of IBM or other companies.

References

  1. Balamurugan, G., Kennedy, J., Banerjee, G., Jaussi, J.E., Mansuri, M., O’Mahony, F., Casper, B., Mooney, R.: A scalable 5–15 Gbps, 14–75 mW low-power I/O transceiver in 65 nm CMOS. IEEE J. Solid-State Circuits 43(4), 1010–1019 (2008). doi:10.1109/JSSC.2008.917522

    Article  Google Scholar 

  2. Borkar, R., Bohr, M., Jourdan, S.: Advancing Moore’s Law in 2014—The Road to 14 nm. Intel Presentation (2014)

  3. Chen, G., Anders, M.A., Kaul, H., Satpathy, S.K., Mathew, S.K., Hsu, S.K., Agarwal, A., Krishnamurthy, R.K., Borkar, S., De, V.: A 340 mV-to-0.9 V 20.2 Tb/s source-synchronous hybrid packet/circuit-switched 16 x 16 network-on-chip in 22 nm tri-gate CMOS. In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE International, pp. 276–277 (2014). doi:10.1109/ISSCC.2014.6757432

  4. Clark, B.G.: An efficient implementation of the algorithm ‘CLEAN’. Astron. Astrophys. 89(3), 377–378 (1980)

    Google Scholar 

  5. Clark, M.A., La Plante, P.C., Greenhill, L.J.: Accelerating radio astronomy cross-correlation with graphics processing units. Int. J. High Perform. Comput. Appl. 27(2), 178–192 (2013). doi:10.1177/1094342012444794

    Article  Google Scholar 

  6. D’Addario, L.R.: Low-power correlator architecture for the mid-frequency SKA, Memo 133. Tech. rep., Jet Propulsion Laboratory, California Institute of Technology (2011)

  7. de Souza, L., Bunton, J., Campbell-Wilson, D., Cappallo, R., Kincaid, B.: A radio astronomy correlator optimized for the Xilinx Virtex-4 SX FPGA. In: Field Programmable Logic and Applications (FPL), IEEE International Conference on, pp. 62–67 (2007). doi:10.1109/FPL.2007.4380626

  8. Fiorin, L., Vermij, E., Van Lunteren, J., Jongerius, R., Hagleitner, C.: An energy-efficient custom architecture for the SKA1-low central signal processor. In: Computing Frontiers (CF), 12th ACM International Conference on, pp. 5:1–5:8. ACM, New York (2015). doi:10.1145/2742854.2742855

  9. Galal, S., Horowitz, M.: Energy-efficient floating-point unit design. IEEE Trans. Comput. 60(7), 913–922 (2011). doi:10.1109/TC.2010.121

    Article  MathSciNet  Google Scholar 

  10. Geraci, J.R., Sacco, S.M.: A transpose-free in-place SIMD optimized FFT. ACM Trans. Archit. Code Optim. (TACO) 9(3), 23:1–23:21 (2012). doi:10.1145/2355585.2355596

    Google Scholar 

  11. Giridhar, B., Cieslak, M., Duggal, D., Dreslinski, R., Chen, H.M., Patti, R., Hold, B., Chakrabarti, C., Mudge, T., Blaauw, D.: Exploring DRAM organizations for energy-efficient and resilient exascale memories. In: High Performance Computing, Networking, Storage and Analysis (SC), International Conference for, pp. 1–12 (2013). doi:10.1145/2503210.2503215

  12. He, Y., Pu, Y., Kleihorst, R., Ye, Z., Abbo, A.A., Londono, S.M., Corporaal, H.: Xetal-Pro: an ultra-low energy and high throughput SIMD processor. In: Design Automation Conference (DAC), 47th ACM/IEEE, pp. 543–548 (2010)

  13. Hybrid Memory Cube Consortium: Hybrid Memory Cube specification 2.0 (2014)

  14. ITRS Committee: International Technology Roadmap for Semiconductors, 2012 Update (2012). http://www.itrs2.net

  15. Jayasena, N., Erez, M., Ahn, J.H., Dally, W.J.: Stream register files with indexed access. In: High-Performance Computer Architecture (HPCA), 10th IEEE International Symposium on, pp. 60–71 (2004). doi:10.1109/HPCA.2004.10007

  16. Jeddeloh, J., Keeth, B.: Hybrid Memory Cube: new DRAM architecture increases density and performance. In: VLSI Technology (VLSIT), 2012 Symposium on, pp. 87–88 (2012). doi:10.1109/VLSIT.2012.6242474

  17. Jongerius, R., Corporaal, H., Broekema, C., Engbersen, T.: Analyzing LOFAR station processing on multi-core platforms. In: ICT Open 2012 (2012). http://www.ictopen2013.nl/content/proceedings+2012

  18. Jongerius, R., Wijnholds, S., Nijboer, R., Corporaal, H.: An end-to-end computing model for the square kilometre array. Computer 47(9), 48–54 (2014). doi:10.1109/MC.2014.235

    Article  Google Scholar 

  19. Karner, H., Auer, M., Ueberhuber, C.W.: Top speed FFTs for FMA architectures. Tech. rep., Institute for Applied and Numerical Mathematics, Technical University of Vienna, Austria (1998)

  20. Lippert, T., Petkov, N., Palazzari, P., Schilling, K.: Hyper-systolic matrix multiplication. Parallel Comput. 27(6), 737–759 (2001). doi:10.1016/S0167-8191(00)00108-3

    Article  MathSciNet  MATH  Google Scholar 

  21. Nair, R., Antao, S., Bertolli, C., Bose, P., Brunheroto, J., et al.: Active memory cube: a processing-in-memory architecture for exascale systems. IBM J. Res. Dev. 59(2/3), 17:1–17:14 (2015). doi:10.1147/JRD.2015.2409732

    Article  Google Scholar 

  22. Pedram, A., McCalpin, J., Gerstlauer, A.: Transforming a linear algebra core to an FFT accelerator. In: Application-Specific Systems, Architectures and Processors (ASAP), 24th IEEE International Conference on, pp. 175–184 (2013). doi:10.1109/ASAP.2013.6567572

  23. Pugsley, S., Jestes, J., Zhang, H., Balasubramonian, R., Srinivasan, V., et al.: NDC: analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In: Performance Analysis of Systems and Software (ISPASS), IEEE International Symposium on, pp. 190–200 (2014). doi:10.1109/ISPASS.2014.6844483

  24. Romein, J.W., Broekema, P.C., Mol, J.D., van Nieuwpoort, R.V.: The LOFAR correlator: implementation and performance analysis. In: Principles and Practice of Parallel Programming (PPoPP), 15th ACM SIGPLAN Symposium on, pp. 169–178 (2010). doi:10.1145/1693453.1693477

  25. SKA organisation: Square Kilometer Array. ”http://www.skatelescope.org/

  26. Thoziyoor, S., Ahn, J., Monchiero, M., Brockman, J., Jouppi, N.: A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In: Computer Architecture (ISCA), ACM/IEEE 35th International Symposium on, pp. 51–62 (2008). doi:10.1109/ISCA.2008.16

  27. Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., et al.: An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS. Solid-State Circuits IEEE J. 43(1), 29–41 (2008). doi:10.1109/JSSC.2007.910957

    Article  Google Scholar 

  28. van Lunteren, J.: Towards memory centric computing: a flexible address mapping scheme. In: Electrical and Computer Engineering, IEEE Canadian Conference on, vol. 1, pp. 385–390 (1999). doi:10.1109/CCECE.1999.807229

  29. van Lunteren, J.: High-performance pattern-matching for intrusion detection. In: INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, pp. 1–13 (2006). doi:10.1109/INFOCOM.2006.204

  30. van Lunteren, J.: A novel processor architecture for high-performance stream processing. In: High Performance Chips (HC), Hot Chips: A Symposium on (2006). http://www.hotchips.org/wp-content/uploads/hc_archives/hc18/3_Tues/HC18.S7/HC18.S7T2.pdf

  31. van Lunteren, J.: Memory-driven near-data acceleration and its application to DOME/SKA. Presentation at the 2014 HPC User Forum (2014). http://www.hpcuserforum.com/presentations/seattle2014/VanLunteren.pdf

  32. van Nieuwpoort, R.V., Romein, J.W.: Correlating radio astronomy signals with many-core hardware. Int. J. Parall. Program. 39(1), 88–114 (2011). doi:10.1007/s10766-010-0144-3

    Article  Google Scholar 

  33. Vermij, E., Fiorin, L., Hagleitner, C., Bertels, K.: Exascale radio astronomy: can we ride the technology wave? In: Kunkel, J., Ludwig, T., Meuer, H. (eds.) Supercomputing, Lecture Notes in Computer Science, vol. 8488, pp. 35–52. Springer International Publishing (2014). doi:10.1007/978-3-319-07518-1_3

  34. Vermij, E., Fiorin, L., Jongerius, R., Hagleitner, C., Bertels, K.: Challenges in exascale radio astronomy: can the SKA ride the technology wave? Int. J. High Perform. Comput. Appl. 29(1), 37–50 (2015). doi:10.1177/1094342014549059

    Article  Google Scholar 

  35. Waeijen, L., She, D., Corporaal, H., He, Y.: SIMD made explicit. In: Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), International Conference on, pp. 330–337 (2013). doi:10.1109/SAMOS.2013.6621142

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leandro Fiorin.

Additional information

This work was conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fiorin, L., Vermij, E., van Lunteren, J. et al. Exploring the Design Space of an Energy-Efficient Accelerator for the SKA1-Low Central Signal Processor. Int J Parallel Prog 44, 1003–1027 (2016). https://doi.org/10.1007/s10766-016-0420-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0420-y

Keywords

Navigation