Cache- and Communication-aware Application Mapping for Shared-cache Multicore Processors

Xu, Thomas Canhao; Leppänen, Ville

doi:10.1007/978-3-319-16086-3_5

Thomas Canhao Xu¹⁷ &
Ville Leppänen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9017))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1140 Accesses
2 Citations

Abstract

We propose and study a mapping algorithm optimized for shared-cache multicore processors. Performance requirement of modern applications is constantly growing. Processing huge amount of data in real-time is a trend even for mobile devices. It is common to find a octa-core processor in mobile phones or tablets. We will be able to see embedded devices with tens of cores in the next few years, if the trend continues. Conventional mapping algorithms are not well designed for shared-cache multicore processors. We discuss the importance of application mapping in terms of inter-application communication and shared-cache access delay. An algorithm is proposed with optimizations of the two aspects. We introduce a method with low computation complexity. First the mapping region is calculated with the congregate degree of nodes, then the region is expanded with a strategy in which the nearest nodes with lowest average cache latency are selected. The comparison with other mapping algorithms shows up to 13.9% improvement in average inter-application communication distance, with near optimal values considering the average cache latency. The results from real applications show that, the execution time and power consumption of the proposed algorithm has improved for 8% and 16.7% respectively, compared with an incremental mapping algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AMD: Family 10th amd phenom processor product data sheet (November 2008), http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs
Chen, Y.J., Yang, C.L., Chang, Y.S.: An architectural co-synthesis algorithm for energy-aware network-on-chip design. Journal of Systems Architecture 55(5-6), 299–309 (2009)
Google Scholar
Choi, I., Zhao, M., Yang, X., Yeung, D.: Experience with improving distributed shared cache performance on tilera’s tile processor. Computer Architecture Letters 10(2), 45–48 (2011)
Article Google Scholar
Chou, C.L., Ogras, U., Marculescu, R.: Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27(10), 1866–1879 (2008)
Article Google Scholar
Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Google Scholar
Sharma, D., Pradhan, D.K.: Processor allocation in hypercube multicomputers: Fast and efficient strategies for cubic and noncubic allocation. IEEE Trans. Parallel Distrib. Syst. 6(10), 1108–1122 (1995)
Google Scholar
Fattah, M., Rahmani, A.M., Xu, T., Kanduri, A., Liljeberg, P., Plosila, J., Tenhunen, H.: Mixed-criticality run-time task mapping for noc-based many-core systems. In: 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 458–465 (February 2014)
Google Scholar
Fleig, T., Mattes, O., Karl, W.: Evaluation of adaptive memory management techniques on the tilera tile-gx platform. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8 (February 2014)
Google Scholar
Hakem, M., Butelle, F.: Dynamic critical path scheduling parallel programs onto multiprocessors. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005), Workshop 8, vol. 9, pp. 203.2. IEEE Computer Society, Washington, DC (2005)
Google Scholar
Hu, J., Marculescu, R.: Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2004, vol. 1, pp. 10234. IEEE Computer Society, Washington, DC (2004)
Google Scholar
Kahng, A.B., Li, B., Peh, L.S., Samadi, K.: Orion 2.0: a fast and accurate noc power and area model for early-stage design space exploration. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2009, pp. 423–428. European Design and Automation Association, 3001 Leuven, Belgium (2009)
Google Scholar
Kim, C., Burger, D., Keckler, S.W.: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS X, pp. 211–222. ACM, New York (2002)
Google Scholar
Laudon, J., Lenoski, D.: The sgi origin: A ccnuma highly scalable server. In: The 24th Annual International Symposium on Computer Architecture, Conference Proceedings, pp. 241–251 (June 1997)
Google Scholar
Lei, T., Kumar, S.: A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: Proceedings. Euromicro Symposium on Digital System Design, pp. 180–187 (September 2003)
Google Scholar
Leutenegger, S.T., Vernon, M.K.: The performance of multiprogrammed multiprocessor scheduling algorithms. SIGMETRICS Perform. Eval. Rev. 18(1), 226–236 (1990)
Article Google Scholar
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A full system simulation platform. Computer 35(2), 50–58 (2002)
Article Google Scholar
Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. Computer Architecture News (September 2005)
Google Scholar
TGG: Task graph generator (July 2014), http://taskgraphgen.sourceforge.net/
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: Characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24–36 (June 1995)
Google Scholar
Xu, T., Guang, L., Yin, A., Yang, B., Liljeberg, P., Tenhunen, H.: An analysis of designing 2d/3d chip multiprocessor wit different cache architecture. In: NORCHIP 2010, p. 1–6 (November 2010)
Google Scholar
Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: Exploration of heuristic scheduling algorithms for 3d multicore processors. In: Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems, SCOPES 2012, pp. 22–31. ACM, New York (2012)
Google Scholar
Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: A high-efficiency low-cost heterogeneous 3d network-on-chip design. In: Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 2012, pp. 37–42. ACM, New York (2012)
Google Scholar
Xu, T.C., Liljeberg, P., Tenhunen, H.: A Minimal Average Accessing Time Scheduler for Multicore Processors. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 287–299. Springer, Heidelberg (2011)
Chapter Google Scholar
Yang, C.Q., Reddy, A.: A taxonomy for congestion control algorithms in packet switching networks. IEEE Network 9(4), 34–45 (1995)
Article Google Scholar
Zhou, X., Chen, W., Zheng, W.: Cache sharing management for performance fairness in chip multiprocessors. In: 18th International Conference on Parallel Architectures and Compilation Techniques, PACT 2009, pp. 384–393 (September 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, University of Turku, 20014, Turku, Finland
Thomas Canhao Xu & Ville Leppänen

Authors

Thomas Canhao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ville Leppänen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Canhao Xu .

Editor information

Editors and Affiliations

CISTER/INESC TEC, ISEP Research Center, Porto, Portugal
Luís Miguel Pinho Pinho
Karlsruher Institut für Technologie, Karlsruhe, Germany
Wolfgang Karl
Inria and École Normale Supérieure, Paris, France
Albert Cohen
Goethe University Fachbereich Informatik und Mathematik, Frankfurt am Main, Germany
Uwe Brinkschulte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, T.C., Leppänen, V. (2015). Cache- and Communication-aware Application Mapping for Shared-cache Multicore Processors. In: Pinho, L., Karl, W., Cohen, A., Brinkschulte, U. (eds) Architecture of Computing Systems – ARCS 2015. ARCS 2015. Lecture Notes in Computer Science(), vol 9017. Springer, Cham. https://doi.org/10.1007/978-3-319-16086-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-16086-3_5
Published: 11 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16085-6
Online ISBN: 978-3-319-16086-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics