Abstract
We propose and study a mapping algorithm optimized for shared-cache multicore processors. Performance requirement of modern applications is constantly growing. Processing huge amount of data in real-time is a trend even for mobile devices. It is common to find a octa-core processor in mobile phones or tablets. We will be able to see embedded devices with tens of cores in the next few years, if the trend continues. Conventional mapping algorithms are not well designed for shared-cache multicore processors. We discuss the importance of application mapping in terms of inter-application communication and shared-cache access delay. An algorithm is proposed with optimizations of the two aspects. We introduce a method with low computation complexity. First the mapping region is calculated with the congregate degree of nodes, then the region is expanded with a strategy in which the nearest nodes with lowest average cache latency are selected. The comparison with other mapping algorithms shows up to 13.9% improvement in average inter-application communication distance, with near optimal values considering the average cache latency. The results from real applications show that, the execution time and power consumption of the proposed algorithm has improved for 8% and 16.7% respectively, compared with an incremental mapping algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AMD: Family 10th amd phenom processor product data sheet (November 2008), http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs
Chen, Y.J., Yang, C.L., Chang, Y.S.: An architectural co-synthesis algorithm for energy-aware network-on-chip design. Journal of Systems Architecture 55(5-6), 299–309 (2009)
Choi, I., Zhao, M., Yang, X., Yeung, D.: Experience with improving distributed shared cache performance on tilera’s tile processor. Computer Architecture Letters 10(2), 45–48 (2011)
Chou, C.L., Ogras, U., Marculescu, R.: Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27(10), 1866–1879 (2008)
Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Sharma, D., Pradhan, D.K.: Processor allocation in hypercube multicomputers: Fast and efficient strategies for cubic and noncubic allocation. IEEE Trans. Parallel Distrib. Syst. 6(10), 1108–1122 (1995)
Fattah, M., Rahmani, A.M., Xu, T., Kanduri, A., Liljeberg, P., Plosila, J., Tenhunen, H.: Mixed-criticality run-time task mapping for noc-based many-core systems. In: 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 458–465 (February 2014)
Fleig, T., Mattes, O., Karl, W.: Evaluation of adaptive memory management techniques on the tilera tile-gx platform. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8 (February 2014)
Hakem, M., Butelle, F.: Dynamic critical path scheduling parallel programs onto multiprocessors. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005), Workshop 8, vol. 9, pp. 203.2. IEEE Computer Society, Washington, DC (2005)
Hu, J., Marculescu, R.: Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2004, vol. 1, pp. 10234. IEEE Computer Society, Washington, DC (2004)
Kahng, A.B., Li, B., Peh, L.S., Samadi, K.: Orion 2.0: a fast and accurate noc power and area model for early-stage design space exploration. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2009, pp. 423–428. European Design and Automation Association, 3001 Leuven, Belgium (2009)
Kim, C., Burger, D., Keckler, S.W.: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS X, pp. 211–222. ACM, New York (2002)
Laudon, J., Lenoski, D.: The sgi origin: A ccnuma highly scalable server. In: The 24th Annual International Symposium on Computer Architecture, Conference Proceedings, pp. 241–251 (June 1997)
Lei, T., Kumar, S.: A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: Proceedings. Euromicro Symposium on Digital System Design, pp. 180–187 (September 2003)
Leutenegger, S.T., Vernon, M.K.: The performance of multiprogrammed multiprocessor scheduling algorithms. SIGMETRICS Perform. Eval. Rev. 18(1), 226–236 (1990)
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A full system simulation platform. Computer 35(2), 50–58 (2002)
Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. Computer Architecture News (September 2005)
TGG: Task graph generator (July 2014), http://taskgraphgen.sourceforge.net/
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: Characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24–36 (June 1995)
Xu, T., Guang, L., Yin, A., Yang, B., Liljeberg, P., Tenhunen, H.: An analysis of designing 2d/3d chip multiprocessor wit different cache architecture. In: NORCHIP 2010, p. 1–6 (November 2010)
Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: Exploration of heuristic scheduling algorithms for 3d multicore processors. In: Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems, SCOPES 2012, pp. 22–31. ACM, New York (2012)
Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: A high-efficiency low-cost heterogeneous 3d network-on-chip design. In: Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 2012, pp. 37–42. ACM, New York (2012)
Xu, T.C., Liljeberg, P., Tenhunen, H.: A Minimal Average Accessing Time Scheduler for Multicore Processors. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 287–299. Springer, Heidelberg (2011)
Yang, C.Q., Reddy, A.: A taxonomy for congestion control algorithms in packet switching networks. IEEE Network 9(4), 34–45 (1995)
Zhou, X., Chen, W., Zheng, W.: Cache sharing management for performance fairness in chip multiprocessors. In: 18th International Conference on Parallel Architectures and Compilation Techniques, PACT 2009, pp. 384–393 (September 2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Xu, T.C., Leppänen, V. (2015). Cache- and Communication-aware Application Mapping for Shared-cache Multicore Processors. In: Pinho, L., Karl, W., Cohen, A., Brinkschulte, U. (eds) Architecture of Computing Systems – ARCS 2015. ARCS 2015. Lecture Notes in Computer Science(), vol 9017. Springer, Cham. https://doi.org/10.1007/978-3-319-16086-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-16086-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16085-6
Online ISBN: 978-3-319-16086-3
eBook Packages: Computer ScienceComputer Science (R0)