ABSTRACT
The transition to Exascale computing is going to be characterised by an increased range of application classes. In addition to traditional massively parallel "number crunching" applications, new classes are emerging such as real-time HPC and data-intensive scalable computing. Furthermore, Exascale computing is characterised by a "democratisation" of HPC: to fully exploit the capabilities of Exascale-level facilities, HPC is moving towards enabling access to its resources to a wider range of new players, including SMEs, through cloud-based approaches [1]. Finally, the need for much higher energy efficiency is pushing towards deep heterogeneity, widening the range of options for acceleration, moving from the traditional CPU-only organization, to the CPU plus GPU which currently dominates the Green5001, to more complex options including programmable accelerators and even (reconfigurable) hardware accelerators [2].
- B. Koller, N. Struckmann, J. Buchholz, and M. Gienger, "Towards an environment to deliver high performance computing to small and medium enterprises," in Sustained Simulation Performance 2015. Cham: Springer International Publishing, 2015, pp. 41--50.Google Scholar
- J. Flich, G. Agosta, P. Ampletzer, D. A. Alonso, A. Cilardo, W. Fornaciari, M. Kovac, F. Roudet, and D. Zoni, "The MANGO FET-HPC Project: An overview," in IEEE 18th Int'l Conf on Computational Science and Engineering (CSE). IEEE, 2015, pp. 351--354. Google ScholarDigital Library
- J. Flich, G. Agosta, P. Ampletzer, D. A. Alonso, C. Brandolese, A. Cilardo, W. Fornaciari, Y. Hoornenborg, M. Kovac, B. Maitre, G. Massari, H. Mlinaric, E. Papastefanakis, F. Roudet, R. Tornero, and D. Zoni, "Enabling HPC for QoS-sensitive applications: The MANGO approach," in 2016 Design, Automation Test in Europe Conference Exhibition (DATE), March 2016, pp. 702--707. Google ScholarDigital Library
- G. Agosta, W. Fornaciari, G. Massari, A. Pupykina, F. Reghenzani, and M. Zanella, "Managing Heterogeneous Resources in HPC Systems," in Proc. of PARMA-DITAM '18. ACM, 2018, pp. 7--12. {Online}. Available Google ScholarDigital Library
- A. Pupykina and G. Agosta, "Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators," in 2017 46th Int'l Conf on Parallel Processing Workshops (ICPPW), Aug 2017, pp. 291--300.Google Scholar
- J. Flich, G. Agosta, P. Ampletzer, D. A. Alonso, C. Brandolese, E. Cappe, A. Cilardo, L. Dragic, A. Dray, A. Duspara, W. Fornaciari, E. Fusella, M. Gagliardi, G. Guillaume, D. Hofman, Y. Hoornenborg, A. Iranfar, M. Kovac, S. Libutti, B. Maitre, J. M. Martínez, G. Massari, K. Meinds, H. Mlinaric, E. Papastefanakis, T. Picornell, I. Piljic, A. Pupykina, F. Reghenzani, I. Staub, R. Tornero, M. Zanella, M. Zapater, and D. Zoni, "Exploring manycore architectures for next-generation HPC systems through the MANGO approach," Microprocessors and Microsystems, vol. 61, pp. 154 -- 170, 2018. {Online}. Available: http://www.sciencedirect.com/science/article/pii/S0141933118300243Google ScholarCross Ref
- L. Huang and Q. Xu, "Characterizing the lifetime reliability of manycore processors with core-level redundancy," in 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov 2010, pp. 680--685. Google ScholarDigital Library
- C. L. Chou and R. Marculescu, "Farm: Fault-aware resource management in noc-based multiprocessor platforms," in 2011 Design, Automation Test in Europe, March 2011, pp. 1--6.Google Scholar
- P. Mercati, F. Paterna, A. Bartolini, L. Benini, and T. Rosing, "Warm: Workload-aware reliability management in linux/android," IEEE Trans on CAD of Integrated Circuits and Systems, 2016.Google Scholar
- M. H. Haghbayan, A. Miele, A. M. Rahmani, P. Liljeberg, and H. Tenhunen, "A lifetime-aware runtime mapping approach for many-core systems in the dark silicon era," in 2016 Design, Automation Test in Europe Conference Exhibition (DATE), March 2016, pp. 854--857. Google ScholarDigital Library
- P. Bellasi, G. Massari, and W. Fornaciari, "Effective runtime resource management using linux control groups with the barbequertrm framework," ACM Trans. Embed. Comput. Syst., vol. 14, no. 2, pp. 39:1--39:17, Mar. 2015. {Online}. Available Google ScholarDigital Library
- A. Iranfar, F. Terraneo, W. A. Simon, L. Dragic, I. Piljic, M. Zapater, W. Fornaciari, M. Kovac, and D. Atienza Alonso, "Thermal characterization of next-generation workloads on heterogeneous mpsocs," in International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2017, pp. 1--6.Google Scholar
- F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, and M. Snir, "Toward exascale resilience," Int. J. High Perform. Comput. Appl., vol. 23, no. 4, pp. 374--388, Nov. 2009. {Online}. Available Google ScholarDigital Library
- C. Curtsinger and E. D. Berger, "Stabilizer: Statistically sound performance evaluation," SIGARCH Comput. Archit. News, vol. 41, no. 1, pp. 219--228, Mar. 2013. {Online}. Available Google ScholarDigital Library
- F. J. Cazorla, J. Abella, J. Andersson, T. Vardanega, F. Vatrinet, I. Bate, I. Broster, M. Azkarate-askasua, F. Wartel, L. Cucu, F. Cros, G. Farrall, A. Gogonel, A. Gianarro, B. Triquet, C. Hernández, C. Lo, C. Maxim, D. Morales, E. Quiñones, E. Mezzetti, L. Kosmidis, I. Agirre, M. Fernández, M. Slijepcevic, P. Conmy, and W. Talaboulma, "PROXIMA: improving measurement-based timing analysis through randomisation and probabilistic analysis," in 2016 Euromicro DSD, 2016, pp. 276--285. {Online}. AvailableGoogle Scholar
- F. J. Cazorla, T. Vardanega, E. Quiñones, and J. Abella, "Upper-bounding Program Execution Time with Extreme Value Theory," in 13th Int'l Workshop on Worst-Case Execution Time Analysis, ser. OASIcs, vol. 30, Germany, 2013, pp. 64--76. {Online}. Available: http://drops.dagstuhl.de/opus/volltexte/2013/4123Google Scholar
- A. K. Coskun, T. S. Rosing, K. Mihic, G. De Micheli, and Y. Leblebici, "Analysis and optimization of mpsoc reliability," Journal of Low Power Electronics, vol. 2, no. 1, pp. 56--69, 2006. {Online}. Available: https://www.ingentaconnect.com/content/asp/jolpe/2006/00000002/00000001/art0008Google ScholarCross Ref
- A. K. Coskun, T. S. Rosing, and K. C. Gross, "Temperature management in multiprocessor socs using online learning," in 2008 45th ACM/IEEE Design Automation Conference, June 2008, pp. 890--893. Google ScholarDigital Library
- W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusamy, "Compact thermal modeling for temperature-aware design," in Proceedings. 41st Design Automation Conference, 2004., July 2004, pp. 878--883. Google ScholarDigital Library
- J. K. M. Stansberry, "Uptime institute 2013 data center industry survey," 2013.Google Scholar
- A. Seuret, A. Iranfar, M. Zapater, J. R. Thome, and D. Atienza, "Design of a two-phase gravity-driven micro-scale thermosyphon cooling system for high-performance computing data centers," in Intersociety Conf on Thermal and Thermomechanical Phenomena in Electronic Systems (ITHERM), 2018.Google Scholar
- A. Sridhar, M. M. S. Aly, and D. Atienza Alonso, "A semi-analytical thermal modeling framework for liquid-cooled ics," IEEE T Comput Aid D, vol. 33, no. 8, pp. 14. 1145--1158, 2014.Google Scholar
- W. Piatek, A. Oleksiak, M. vor dem Berge, J. Hagemeyer, and E. Senechal, "Intelligent thermal management in M2DC system," in Proc. 8th Int'l Conf on Future Energy Systems, 2017, pp. 309--315. {Online}. Available Google ScholarDigital Library
- W. Piatek, A. Oleksiak, and G. Da Costa, "Energy and thermal models for simulation of workload and resource management in computing systems," Simul Model Pract Th, vol. 58, pp. 40 -- 54, 2015. {Online}. Available: http://www.sciencedirect.com/science/article/pii/S1569190X15000684Google ScholarCross Ref
- A. Sridhar, A. Vincenzi, M. Ruggiero, and D. Atienza, "Neural network-based thermal simulation of integrated circuits on gpus," IEEE T Comput Aid D, vol. 31, no. 1, pp. 23--36, Jan 2012. Google ScholarDigital Library
- S. Raghav, M. Ruggiero, A. Marongiu, C. Pinto, D. Atienza, and L. Benini, "Gpu acceleration for simulating massively parallel many-core platforms," IEEE T Parall Distr, vol. 26, no. 5, pp. 1336--1349, May 2015.Google ScholarCross Ref
- M. M. Sabry, D. Atienza Alonso, and F. Catthoor, "Ocean: An optimized hw/sw reliability mitigation approach for scratchpad memories in real-time socs," ACM T Embed Comput S, vol. 13, pp. 26. 138.1--138.26, 2014. Google ScholarDigital Library
- D. Zoni, L. Cremona, and W. Fornaciari, "Powerprobe: Run-time power modeling through automatic RTL instrumentation," in 2018 Design, Automation & Test in Europe Conference & Exhibition, DATE 2018, Dresden, Germany, March 19-23, 2018, 2018, pp. 743--748. {Online}. AvailableGoogle ScholarCross Ref
- D. Zoni, L. Colombo, and W. Fornaciari, "Darkcache: Energy-performance optimization of tiled multi-cores by adaptively power-gating llc banks," ACM Trans. Archit. Code Optim., vol. 15, no. 2, pp. 21:1--21:26, May 2018. {Online}. Available Google ScholarDigital Library
- S. Libutti, G. Massari, and W. Fornaciari, "Co-scheduling tasks on multi-core heterogeneous systems: An energy-aware perspective," IET Computers Digital Techniques, vol. 10, no. 2, pp. 77--84, 2016.Google ScholarCross Ref
- D. Zoni, A. Barenghi, G. Pelosi, and W. Fornaciari, "A comprehensive side channel information leakage analysis of an in-order risc cpu microarchitecture," ACM TODAES, vol. 23, no. 5, Sep. 2018. {Online}. Available Google ScholarDigital Library
- R. Rabenseifner, G. Hager, and G. Jost, "Hybrid mpi/openmp parallel programming on clusters of multi-core smp nodes," in 17th Euromicro PDP, Feb 2009, pp. 427--436. Google ScholarDigital Library
- J. Diaz, C. M. noz Caro, and A. N. no, "A survey of parallel programming models and tools in the multi and many-core era," IEEE T Parall Distr, vol. 23, no. 8, pp. 1369--1386, Aug 2012. Google ScholarDigital Library
- J. L. Reyes-Ortiz, L. Oneto, and D. Anguita, "Big data analytics in the cloud: Spark on hadoop vs mpi/openmp on beowulf," Procedia Computer Science, vol. 53, pp. 121 -- 130, 2015, iNNS Conference on Big Data 2015 Program San Francisco, CA, USA 8--10 August 2015.Google ScholarCross Ref
- M. Jarus and A. Oleksiak, "Top-down characterization approximation based on performance counters architecture for amd processors," Simul Model Pract Th, vol. 68, pp. 146 -- 162, 2016.Google ScholarCross Ref
- Reliable power and time-constraints-aware predictive management of heterogeneous exascale systems
Recommendations
Achieving Exascale Capabilities through Heterogeneous Computing
This article provides an overview of AMD's vision for exascale computing, and in particular, how heterogeneity will play a central role in realizing this vision. Exascale computing requires high levels of performance capabilities while staying within ...
Towards exascale computing with heterogeneous architectures
DATE '17: Proceedings of the Conference on Design, Automation & Test in EuropeThe goal of reaching exascale computing is made especially challenging by the highly heterogeneous nature of modern platforms and the energy they consume. As compute nodes typically utilize multiple multi-core CPU and are increasingly equipped with PCIe ...
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and AnalysisHybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Comments