Application-specific feature selection and clustering approach with HPC system profiling data

Shin, Mincheol; Park, Geunchul; Park, Chan Yeol; Lee, Jongmin; Kim, Mucheol

doi:10.1007/s11227-020-03533-2

Application-specific feature selection and clustering approach with HPC system profiling data

Published: 04 January 2021

Volume 77, pages 6817–6831, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Mincheol Shin¹,
Geunchul Park²,
Chan Yeol Park²,
Jongmin Lee³ &
…
Mucheol Kim⁴

360 Accesses
1 Citation
Explore all metrics

Abstract

Exascale computing, the next-generation computing environment, is expected to be applied to scientific and engineering applications. Accordingly, high-performance computing (HPC) technology is also being developed to improve the performance and high-speed parallelism of many-core processors. Previous researches on improving HPC performance have developed in the form of improving the overall system performance by analyzing the state of the system occurring in the range of the knowledge of expert. However, performance events occurring in a processor in a many-core environment have a large number of indicators, and it is difficult to analyze the correlation between them. In this paper, we propose an application-specific feature selection and clustering approach with HPC system profiling data. The proposed approach performs PCA-based feature selections for efficient performance analysis methods. In addition, the application-specific characteristics from profiling data can be analyzed by unsupervised learning. In our experiments, we evaluated highly parallel supercomputers with NAS parallel benchmark and were able to cluster applications efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques

I/O-signature-based feature analysis and classification of high-performance computing applications

Article 24 September 2023

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

Article Open access 15 July 2020

References

Álvarez C-F, Javier AP, Milian RA, Ishii K, Morizawa RK, Sala Rosa MB (2019) Efficient development of high-performance data analytics in Python. Future Gener Comput Syst 111:571–580
Runting RK, Phinn S, Xie Z, Venter O, JEM Watson (2020) Opportunities for big data in conservation and sustainability. Nat Commun 11(1):1–4
Kim A, Lee J, Kim M (2016) Resource management model based on cloud computing environment. Int J Distrib Sens Netw 12(11):1550147716676554
Tansley S, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery, vol 1. Microsoft Research, Redmond
Kim A, Lee J, Kim M (2016) Context-aware recommendation model based on mobile application analysis platform. Multimed Tools Appl 75(22):14783–14794
Article Google Scholar
Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68
Article Google Scholar
Khaleghzadeh H, Manumachu RR, Lastovetsky A (2018) A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans Parallel Distrib Syst 29(10):2176–2190
Article Google Scholar
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361
Article Google Scholar
Cappello F, et al (2014) Toward exascale resilience: 2014 update. Supercomput Front Innov Int J 1(1):5–28
Stegailov VV, Orekhov ND, Smirnov GS (2015) HPC hardware efficiency for quantum and classical molecular dynamics. In: International Conference on Parallel Computing Technologies. Springer, Cham, pp 469–473
Erez M, Ahn JH, Garg A, Dally WJ, Darve E (2004) Analysis and performance results of a molecular modeling application on Merrimac. In: SC’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. IEEE, pp 42–42
Williams S, Kalamkar DD, Singh A, Deshpande AM, Van Straalen B, Smelyanskiy M, Almgren A, Dubey P, Shalf J, Oliker L (2012) Optimization of geometric multigrid for emerging multi and manycore processors. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser SC ’12. IEEE Computer Society Press, Salt Lake City, Utah, pp 96:1–96:11
Kumbhare N, Tunc C, Machovec D, Akoglu A, Hariri S, Siegel HJ (2017) Value based scheduling for oversubscribed power-constrained homogeneous HPC systems. In: International Conference on Cloud and Autonomic Computing (ICCAC), pp 120–130
Richoux AN (2012) Scheduling in a high-performance computing (HPC) system. U.S. Patent No. 8,209,395
Di S, Bouguerra M, Bautista-Gomez L, Cappello F (2014) Optimization of multi-level checkpoint model for large scale HPC applications. In: Proceedings of IEEE 28th international parallel distributing processing symposium, pp 1181–1190
Wang Y et al (2014) Characterization and optimization of memory-resident mapreduce on HPC systems. In: 2014 IEEE 28th international parallel and distributed processing symposium. IEEE
Lastovetsky A, Reddy R (2017) New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans Parallel Distrib Syst 28(4):1119–1133
Article Google Scholar
O’brien K et al (2017) A survey of power and energy predictive models in HPC systems and applications. ACM Comput Surv 50(3):1–38
Ren G, Tune E, Moseley T, Shi Y, Rus S, Hundt R (2010) Google-wide profiling: a continuous profiling infrastructure for data centers. IEEE Micro 30(4):65–78
Article Google Scholar
Schlegel B, Karnagel T, Kiefer T, Lehner W (2013) Scalable frequent itemset mining on many-core processors. In: Proceedings of the ninth international workshop on data management on new hardware, ser DaMoN’13. ACM, New York, NY, pp 3:1–3:8
Klinkenberg J, Terboven C, Lankes S, Müller MS (2017) Data mining-based analysis of HPC center operations. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 766–773
Choi J, Park G, Nam D (2018) Efficient classification of application characteristics by using hardware performance counters with data mining. In: 2018 IEEE 3rd international workshops on foundations and applications of self* systems (FAS* W). IEEE, pp 24–29
Collectl. [online]. Available http://collectl.sourceforge.net/
Wieërs D. Dstat: Versatile resource statistics tool. http://dag.wieers.com/home-made/dstat/
Sar [online]. Available https://linux.die.net/man/1/sar
Ware H, Frdrick F (1994) Linux Man Page: Vmstat (8, Book Linux Man Page: Vmstat), vol 8
Free [online]. Available https://linux.die.net/man/1/free
Godard S (2015) Sysstat Utilities Home Page! [online]. Available http://sebastien.godard.pagesperso-orange.fr/index.html
Blagodurov S, Fedorova A (2011) User-level scheduling on numa mul-tic ore systems under linux
Lv Y, Sun B, Luo Q, Wang J, Yu Z, Qian X (2018) CounterMiner: mining big performance data from hardware counters. In: 51st Annual IEEE/ACM international symposium on microarchitecture (MICRO), Fukuoka, pp 613–626. https://doi.org/10.1109/MICRO.2018.00056
Zellweger G, Lin D, Roscoe T (2016) So many performance events so little time. In: Proceedings of the ACM Asia-Pacific workshop on systems (APSys)
May JM (2001) Mpx: software for multiplexing hardware performance counters in multithreaded programs. In: Proceedings of IEEE international symposium on parallel and distributed processing
Mytkowicz T, Sweeney PF, Hauswirth M, Diwan A (2007) Time interpolation: so many metrics so few registers. In: Proceedings of IEEE international symposium on microarchitecture
Weaver VM, McKee SA (2008) Can hardware performance counters be trusted. In: Proceedings of IEEE international symposium on workload characterization
Mytkowicz T, Diwan A, Hauswirth M, Sweeney PF (2009) Producing wrong data without doing anything obviously wrong. In: Proceedings of the 14th ACM symposium on architectural support for programming languages and operating systems
Yasin A, Top-Down A (2014) method for performance analysis and counters architecture. In: IEEE international symposium on performance analysis of systems and software (ISPASS), Monterey, CA, pp 35–44. https://doi.org/10.1109/ISPASS.2014.6844459
Zhang J et al (2006) Application classification through monitoring and learning of resource consumption patterns. In: Proceedings of the parallel and distributed processing symposium
Molka D et al (2017) Detecting memory-boundedness with hardware performance counters. In: Proceedings of the 8th ACM/SPEC International Conference on Performance Engineering, pp 27–38
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar

Download references

Acknowledgements

This work was supported by Korea Institute of Science and Technology Information (KISTI) Grant (No. K-20-L02-C08-S01).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea
Mincheol Shin
Division of Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
Geunchul Park & Chan Yeol Park
Department of Computer and Software Engineering, Wonkwang University, Iksan, Republic of Korea
Jongmin Lee
School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea
Mucheol Kim

Authors

Mincheol Shin
View author publications
You can also search for this author in PubMed Google Scholar
Geunchul Park
View author publications
You can also search for this author in PubMed Google Scholar
Chan Yeol Park
View author publications
You can also search for this author in PubMed Google Scholar
Jongmin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Mucheol Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jongmin Lee or Mucheol Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shin, M., Park, G., Park, C.Y. et al. Application-specific feature selection and clustering approach with HPC system profiling data. J Supercomput 77, 6817–6831 (2021). https://doi.org/10.1007/s11227-020-03533-2

Download citation

Accepted: 21 November 2020
Published: 04 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11227-020-03533-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application-specific feature selection and clustering approach with HPC system profiling data

Abstract

Access this article

Similar content being viewed by others

Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques

I/O-signature-based feature analysis and classification of high-performance computing applications

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application-specific feature selection and clustering approach with HPC system profiling data

Abstract

Access this article

Similar content being viewed by others

Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques

I/O-signature-based feature analysis and classification of high-performance computing applications

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation