Abstract
An important issue of current multi-core processors is the off-chip bandwidth sharing. Sharing is helpful to improve resource utilization and but more importantly and it may cause performance degradation due to contention. However and there is not enough research work on characterizing the workloads from bandwidth perspective. Moreover and the understanding of the impact of the bandwidth constraint on performance is still limited. In this paper and we propose the phase execution model and and evaluate the arithmetic to memory ratio (AMR) of each phase to characterize the bandwidth requirements of arbitrary programs. We apply the model to a set of SPEC benchmark programs and obtain two results. First and we propose a new taxonomy of workloads based on their bandwidth requirements. Second and we find that prefetching techniques are useful to improve system throughput of multi-core processors only when there is enough spare memory bandwidth.
Chapter PDF
Similar content being viewed by others
References
Uhlig, R., Mudge, T.: Trace-driven memory simulation: A suvey. ACM Computing Surveys 29(2) (June 1997)
Tan, G.M., Fan, D.R., Zhang, J.C., Russo, A., Gao, G.R.: Experience on optimizing irregular computation for memory hierarchy in manycore architecture. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (February 2008)
Yuan, N., Yu, L., Fan, D.: An efficient and flexible task management for many-core architecture. In: Proceedings of Workshop on Software and Hardware Challenges of Manycore Platforms, In conjunction with the 35th International Symposium on Computer Architecture (June 2008)
Long, G.P., Fan, D.R., Zhang, J.C.: Architectural support for cilk computations on many-core architectures. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (February 2009)
Hu, W.W., Zhang, F.X., Li, Z.S.: Microarchitecture and performance of godson-2 processor. Journal of Computer Science and Technology 20(2) (2005)
Rob, A.P., Mandal, F.A., Lim, M.Y.: Empirical evaluation of multi-core memory concurrency initial version (January 2009)
Weidendorfer, J.: Understanding memory access bottlenecks on multicore (2007)
Ahsan, B., Zahran, M.: Cache performance, system performance, and off-chip bandwidth... pick any two. In: Proceedings of INA-OCMC (2009)
Long, G.P., Fan, D.R., Zhang, J.C., Song, F.L., Yuan, N., Lin, W.: A performance model of dense matrix operations on many-core architectures. In: Proceedings of European Conference on Parallel and Distributed Computing (August 2008)
Tan, G.M., Sun, N.H., Gao, G.R.: A parallel dynamic programming algorithm on a multi-core architecture. In: Proceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures (2007)
Chou, Y.: Low-cost epoch-based correlation prefetching for commercial applications. In: Proceedings of International Symposium on Microarchitecture (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Long, G., Fan, D., Zhang, J. (2009). Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-03869-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)