Abstract
Multicore hardware and software are becoming increasingly more complex. The programmability problem of multicore software has led to the use of parallel patterns. Parallel patterns reduce the effort and time required to develop multicore software by effectively capturing its thread communication and data sharing characteristics. Hence, detecting the parallel pattern used in a multi-threaded application is crucial for performance improvements and enables many architectural optimizations; however, this topic has not been widely studied. We apply machine learning techniques in a novel approach to automatically detect parallel patterns and compare these techniques in terms of accuracy and speed. We experimentally validate the detection ability of our techniques on benchmarks including PARSEC and Rodinia. Our experiments show that the k-nearest neighbor, decision trees, and naive Bayes classifier are the most accurate techniques. Overall, decision trees are the fastest technique with the lowest characterization overhead producing the best combination of detection results. We also show the usefulness of the proposed techniques on synthetic benchmark generation.











Similar content being viewed by others
References
Aldinucci, M., Campa, S., Danelutto, M., Kilpatrick, P., Torquati, M.: Design patterns percolating to parallel programming framework implementation. Int. J. Parallel Prog. 42(6), 1012–1031 (2014)
Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press, Cambridge (2010)
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: a view from Berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550 (1994)
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC Benchmark suite: characterization and architectural implications. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)
Bird, S., Phansalkar, A., John, L.K., Mercas, A., Idukuru, R.: Performance characterization of SPEC CPU benchmarks on Intel’s Core microarchitecture based processor. In: SPEC Benchmark Workshop, pp. 1–7 (2007)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Secaucus (2006)
Cammarota, R., Beni, L.A., Nicolau, A., Veidenbaum, A.V.: Effective evaluation of multi-core based systems. In: International Symposium on Parallel and Distributed Computing (ISPDC), pp. 19–25. IEEE (2013)
Cammarota, R., Kejariwal, A., D’Alberto, P., Panigrahi, S., Veidenbaum, A.V., Nicolau, A.: Pruning hardware evaluation space via correlation-driven application similarity analysis. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, p. 4. ACM (2011)
Cammarota, R., Nicolau, A., Veidenbaum, A.V., Kejariwal, A., Donato, D., Madhugiri, M.: On the Determination of inlining vectors for program optimization. In: Jhala, R., De Bosschere, K. (eds.) Compiler Construction. Lecture Notes in Computer Science, vol. 7791, pp. 164–183. Springer Berlin Heidelberg (2013)
Campa, S., Danelutto, M., Goli, M., González-Vélez, H., Popescu, A.M., Torquati, M.: Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud. Fut. Gener. Comput. Syst. 37, 354–366 (2014)
Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M.F., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: International Symposium on Code Generation and Optimization (CGO), pp. 185–197. IEEE (2007)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009)
Che, S., Sheaffer, J., Boyer, M., Szafaryn, L., Wang, L., Skadron, K.: A Characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 1–11 (2010)
Deniz, E., Sen, A., Kahne, B., Holt, J.: MINIME: pattern-aware multicore benchmark synthesizer. IEEE Trans. Comput. 64(8), 2239–2252 (2015)
Deshpande, A., Riehle, D.: The total growth of open source. In: Open Source Development, Communities and Quality, IFIP? The International Federation for Information Processing, vol. 275, pp. 197–209. Springer, Berlin (2008)
Ding, W., Hernandez, O., Curtis, T., Chapman, B.: Porting applications with OpenMP using similarity analysis. In: Caşcaval, C., Montesinos, P. (eds.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 8664, pp. 20–35. Springer International Publishing (2014)
DiscoPoP: a profiling tool to identify parallelization opportunities. http://www.grs-sim.de/research/parallel-programming/multicore-programming/discopop-project.html (2015)
Dunteman, G.H.: Principal Component Analysis. Sage, London (1989)
DynamoRIO Dynamic Instrumentation Tool Platform, http://dynamorio.org/ (2015)
Eeckhout, L., Vandierendonck, H., Bosschere, K.D.: Quantifying the impact of input data sets on program behavior and its applications. J. Instr. Level Parallelism 5, 1–33 (2003)
Embedded microprocessor benchmark consortium. http://www.eembc.org (2015)
FastFlow: Pattern-based multi/many-core parallel programming framework. http://sourceforge.net/projects/mc-fastflow/ (2015)
Ferrari, D.: On the foundations of artificial workload design. Perform. Eval. 3(2), 153 (1983)
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software, 1st edn. Addison-Wesley Professional, Reading (1994)
Ganapathi, A., Datta, K., Fox, A., Patterson, D.: A case for machine learning to optimize multicore performance. In: First USENIX Workshop on Hot Topics in Parallelism (HotPar), pp. 1–6 (2009)
Ganesan, K., John, L.K.: Automatic generation of miniaturized synthetic proxies for target applications to efficiently design multicore processors. IEEE Trans. Comput. 63(4), 833–846 (2014)
Ganesan, K., John, L.K., Salapura, V., Sexton, J.C.: A performance counter based workload characterization on blue gene/P. In: International Conference on Parallel Processing (ICPP), pp. 330–337 (2008)
Goswami, D., Singh, A., Preiss, B.R.: Building parallel applications using design patterns. In: Erdogmus, H., Tanir, O. (eds.) Advances in Software Engineering: Topics in Comprehension, Evolution and Evaluation, pp. 243–265. Springer, New York (2002)
Hammond, K., Aldinucci, M., Brown, C., Cesarini, F., Danelutto, M., González-Vélez, H., Kilpatrick, P., Keller, R., Rossbory, M., Shainer, G.: The ParaPhrase Project: parallel patterns for adaptive heterogeneous multicore systems. In: Formal Methods for Components and Objects, Lecture Notes in Computer Science, vol. 7542, pp. 218–236. Springer, Berlin (2013)
Hoste, K., Eeckhout, L.: Comparing benchmarks using key microarchitecture-independent characteristics. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 83–92 (2006)
Huda, Z.U., Jannesari, A., Wolf, F.: Using template matching to infer parallel design patterns. ACM Trans. Archit. Code Optim. (TACO) 11(4), 64 (2015)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc, Upper Saddle River (1988)
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann (1995)
Joshi, A., Eeckhout, L., Jr., R.H.B., John, L.K.: Performance cloning: a technique for disseminating proprietary applications as benchmarks. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 105–115 (2006)
Joshi, A., Phansalkar, A., Eeckhout, L., John, L.K.: Measuring benchmark similarity using inherent program characteristics. IEEE Trans. Comput. (TC) 55, 769–782 (2006)
Lin, C.-Y., Kuan, C.-B., Shih, W.-L., Lee, J.K.: Compilers for low power with design patterns on embedded multicore systems. J. Signal Process. Syst. 80(3), 277–293 (2015). doi:10.1007/s11265-014-0917-9
MATLAB: The Language of Technical Computing—MathWorks. http://www.mathworks.com/products/matlab/ (2015)
Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley, Reading (2005)
McCool, M.D.: Structured parallel programming with deterministic patterns. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism, HotPar’10, pp. 1–6 (2010)
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)
Mitchell, T.M.: The discipline of machine learning. Tech. Rep. CMU-ML-06-108, Machine Learning Department, School of Computer Science, Carnegie Mellon University (2006)
Moller, M.F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 6(4), 525–533 (1993)
The OpenMP API Specification for Parallel Programming. http://openmp.org (2015)
Ortega-Arjona, J.L., Roberts, G.: Architectural patterns for parallel programming. In: European Conference on Pattern Languages of Programs (EuroPLoP), pp. 225–260 (1998)
Poovey, J.A., Railing, B.P., Conte, T.M.: Parallel pattern detection for architectural improvements. In: USENIX Conference on Hot Topic in Parallelism (HotPar), pp. 12–12 (2011)
Poovey, J.A., Rosier, M.C., Conte, T.M.: Pattern-aware dynamic thread mapping mechanisms for asymmetric manycore architectures. Tech. Rep. 2011-1, School of Computer Science, Georgia Institute of Technology (2011)
IEEE Std 1003.1, 2013 Edition. http://www.unix.org/version4/ieee_std.html (2015)
Ruparelia, N.B.: Software development lifecycle models. ACM SIGSOFT Softw. Eng. Notes 35(3), 8–13 (2010)
Skillicorn, D.B.: Models for practical parallel computation. Int. J. Parallel Prog. 20(2), 133–158 (1991)
Wang, Z., O’boyle, M.F.P.: Using machine learning to partition streaming programs. ACM Trans. Archit. Code Optim. (TACO) 10(3), 20:1–20:25 (2008)
Zandifar, M., Abdul Jabbar, M., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15, pp. 415–424. ACM (2015)
Zanoni, M., Fontana, F.A., Stella, F.: On applying machine learning techniques for design pattern detection. J. Syst. Softw. 103, 102–117 (2015)
Zhao, Q., Bruening, D., Amarasinghe, S.: Umbra: Efficient and scalable memory shadowing. In: IEEE/ACM international symposium on code generation and optimization, pp. 22–31 (2010)
Acknowledgments
We would like to thank Prof. Ethem Alpaydin for his very helpful comments on early versions of the paper. This work was supported in part by Semiconductor Research Corporation under task 2082.001, Bogazici University Research Fund 7223, and the Turkish Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deniz, E., Sen, A. Using Machine Learning Techniques to Detect Parallel Patterns of Multi-threaded Applications. Int J Parallel Prog 44, 867–900 (2016). https://doi.org/10.1007/s10766-015-0396-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-015-0396-z