Abstract
The Square Kilometre Array (SKA) will be the most sensitive radio telescope in the world. This unprecedented sensitivity will be achieved by combining and analyzing signals from 262,144 antennas and 350 dishes at a raw datarate of petabits per second. The processing pipeline to create useful astronomical data will require exa-operations per second, at a very limited power budget. We analyze the compute, memory and bandwidth requirements for the key algorithms used in the SKA. By studying their implementation on existing platforms, we show that most algorithms have properties that map inefficiently on current hardware, such as a low compute-bandwidth ratio and complex arithmetic. In addition, we estimate the power breakdown on CPUs and GPUs, analyze the cache behavior on CPUs, and discuss possible improvements. This work is complemented with an analysis of supercomputer trends, which demonstrates that current efforts to use commercial off-the-shelf accelerators results in a two to three times smaller improvement in compute capabilities and power efficiency than custom built machines. We conclude that waiting for new technology to arrive will not give us the instruments currently planned in 2018: one or two orders of magnitude better power efficiency and compute capabilities are required. Novel hardware and system architectures, to match the needs and features of this unique project, must be developed.
This work is conducted in the context of the joint ASTRON and IBM DOME project and is funded the Netherlands Organization for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe, The Netherlands.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
SKA: Square Kilometer Array, http://www.skatelescope.org/
SKA: SKA Baseline design (2013), https://www.skatelescope.org/wp-content/uploads/2012/07/SKA-TEL-SKO-DD-001-1_BaselineDesign1.pdf
Perley, R.A.E.: A proposal for a large, low frequency array located at the VLA site. VLA Scientific Memorandum 146 (1984)
van Haarlem, M., Wise, M., Gunst, A., Heald, G., McKean, J., et al.: LOFAR: The LOw-Frequency ARray. Astronomy & Astrophysics (May 2013)
Jeffs, B.: Beamforming presentation, http://ens.ewi.tudelft.nl/Education/courses/et4235/Beamforming.pdf
Thompson, A.R., Moran, J.M., Swenson, G.W.: Interferometry and Synthesis in Radio Astronomy, 2nd edn. Wiley-VCH, Weinheim (2001)
Bridle, A.H., Schwab, F.R.: Wide Field Imaging I: Bandwidth and Time-Average Smearing. Synthesis Imaging in Radio Astronomy 6, 247 (1989)
Tasse, C., van der Tol, B., van Zwieten, J., van Diepen, G., Bhatnagar, S.: Applying full polarization A-Projection to very wide field of view instruments: An imager for LOFAR. Instrumentation and Methods for Astrophysics (December 2012)
Cornwell, T., Golap, K., Bhatnagar, S.: The non-coplanar baselines effect in radio interferometry: The W-Projection algorithm. IEEE Journal of Selected Topics in Signal Processing 2 (2008)
Clark, B.G.: An efficient implementation of the algorithm ‘CLEAN’. Astronomy and Astrophysics 89, 377–378 (1980)
Jongerius, R., Wijnholds, S., Nijboer, R., Corporaal, H.: End-to-end compute model of the square kilometre array. IEEE Computer (accepted, 2014)
Romein, J.W.: An efficient work-distribution strategy for gridding radio-telescope data on gpus. In: ACM International Conference on Supercomputing (ICS 2012), Venice, Italy, pp. 321–330 (2012)
Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., Taylor, M.B.: Conservation cores: reducing the energy of mature computations. SIGARCH Comput. Archit. News 38, 205–218 (2010)
ARM: big.little, http://www.arm.com/products/processors/technologies/biglittleprocessing.php
Vassiliadis, S., Wong, S., Gaydadjiev, G., Bertels, K., Kuzmanov, G., Panainte, E.: The MOLEN polymorphic processor. IEEE Transactions on Computers 53, 1363–1375 (2004)
Convey: Convey computer website, http://www.conveycomputer.com
Intel: Intel SSE and AVX extensions, http://software.intel.com/en-us/intel-isa-extensions
Intel: Intel random number generator, http://software.intel.com/sites/default/files/m/d/4/1/d/8/441_Intel_R__DRNG_Software_Implementation_Guide_final_Aug7.pdf
Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Efficient vectorization of the FIR filter. In: Proc. 16th Annual Workshop on Circuits, Systems and Signal Processing (ProRISC), pp. 432–437 (2005)
Jongerius, R., Corporaal, H., Broekema, C., Engbersen, T.: Analyzing LOFAR station processing on multi-core platforms. ICT Open 2012 (2012)
Romein, J.: Signal Processing on GPUs for Radio Telescopes. In: GPU Technology Conference 2013 (2013)
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE (1998)
Xu, W., Yan, Z., Shunying, D.: A high performance FFT library with single instruction multiple data (SIMD) architecture. In: International Conference on Electronics, Communications and Control (ICECC), pp. 630–633 (2011)
Lobeiras, J., Amor, M., Doallo, R.: FFT Implementation on a Streaming Architecture. In: 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 119–126 (2011)
Szomoru, A.: The UniBoard: A multi-purpose scalable high-performance computing platform for radio-astronomical applications. In: XXXth URSI General Assembly and Scientific Symposium, pp. 1–4 (2011)
Nieuwpoort, R., Romein, J.: Correlating radio astronomy signals with many-core hardware. International Journal of Parallel Programming 39, 88–114 (2011)
Romein, J.W., Broekema, P.C., Mol, J.D., van Nieuwpoort, R.V.: The LOFAR correlator: implementation and performance analysis. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 169–178. ACM, New York (2010)
Clark, M.A., Plante, P.C.L., Greenhill, L.J.: Accelerating Radio Astronomy Cross-Correlation with Graphics Processing Units. CoRR abs/1107.4264 (2011)
Woods, A.: Accelerating software radio astronomy fx correlation with gpu and fpga co-processors. Master’s thesis, University of Cape Town (2010)
de Souza, L., Bunton, J., Campbell-Wilson, D., Cappallo, R., Kincaid, B.: A Radio Astronomy Correlator Optimized for the Xilinx Virtex-4 SX FPGA. In: International Conference on Field Programmable Logic and Applications, FPL 2007, pp. 62–67 (2007)
van Amesfoort, A.S., Varbanescu, A.L., Sips, H.J., van Nieuwpoort, R.V.: Evaluating Multi-core Platforms for HPC Data-intensive Kernels. In: Proceedings of the 6th ACM Conference on Computing Frontiers, CF 2009, pp. 207–216. ACM, New York (2009)
Humphreys, B., Cornwell, T.: Analysis of convolutional resampling algorithm performance (2011), http://www.skatelescope.org/uploaded/59116_132_Memo_Humphreys.pdf
Varbanescu, A.L., van Amesfoort, A.S., Cornwell, T., van Diepen, G., van Nieuwpoort, R., Elmegreen, B.G., Sips, H.: Building high-resolution sky images using the Cell/B.E. Sci. Program. 17, 113–134 (2009)
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39, 1–7 (2011)
Li, S., Ahn, J.H., Strong, R., Brockman, J., Tullsen, D., Jouppi, N.: McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In: MICRO-42. 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 469–480 (2009)
Leng, J., Hetherington, T., ElTantawy, A., Gilani, S., Kim, N.S., Aamodt, T.M., Reddi, V.J.: GPUWattch: Enabling Energy Optimizations in GPGPUs. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA 2013, pp. 487–498. ACM, New York (2013)
Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA Workloads Using a Detailed GPU Simulator. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), pp. 163–174 (2009)
Top500: Top500 website, http://www.top500.org/
Green500: Green500 website, http://www.green500.org/
Kamil, S., Shalf, J., Strohmaier, E.: Power efficiency in high performance computing. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–8 (2008)
Dongarra, J.: HPCG benchmarking, http://www.sandia.gov/~maherou/docs/HPCG-Benchmark.pdf
Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 451–460 (2010)
Dennard, R., Gaensslen, F., Yu, H.N., Leo Rideovt, V., Bassous, E., Leblanc, A.R.: Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Solid-State Circuits Society Newsletter 12, 38–50 (2007)
Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. SIGARCH Comput. Archit. News 39, 365–376 (2011)
Keckler, S., Dally, W., Khailany, B., Garland, M., Glasco, D.: GPUs and the Future of Parallel Computing. IEEE Micro 31, 7–17 (2011)
Wulf, W.A., McKee, S.A.: Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News 23, 20–24 (1995)
Patterson, P.D.: Latency lags bandwidth. In: Proceedings of the 2005 International Conference on Computer Design, ICCD 2005, pp. 3–6. IEEE Computer Society, Washington, DC (2005)
Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2012)
Hennessy, J.L., Patterson, D.A.: Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Vermij, E., Fiorin, L., Hagleitner, C., Bertels, K. (2014). Exascale Radio Astronomy: Can We Ride the Technology Wave?. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)