Abstract
Memory has emerged as a primary performance and energy bottleneck for emerging embedded platforms that integrate heterogeneous compute units. Applications require a balance between performance and energy-efficiency and finding the optimal operating point on embedded platforms is challenging. There exist many opportunities to manage the memory subsystem efficiently at runtime to save energy without compromising quality in the face of dynamic workloads. Previous works have used memory bandwidth utilization to determine memory requirements and develop runtime policies to configure system knobs (e.g., memory controller frequency) accordingly. However, bandwidth utilization as a singular metric is not always sufficient: policies for a range of workload scenarios require insight into an application’s memory access pattern and working set size. Alternatively, memory profilers provide fine-grained information such as the memory access pattern for the entire virtual address space, and the load/store density of different regions of the memory. However, parsing this detailed information frequently at runtime induces excessive overhead. In this work, we propose a profiling mechanism that considers both (1) the working set size of running workloads and (2) memory bandwidth utilization to compute WBP (Working Set Size-Bandwidth Product). WBP can be estimated with low overhead, and the combined metric provides insights that runtime policies can use to determine desirable configurations for specific workload scenarios. Our early results show that a static configuration devised with this metric yields an optimal memory controller frequency for 8 out of 10 PARSEC workloads, demonstrating the promise of this approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nvidia Jetson TX2 Architecture. https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite. In: PACT, New York, New York, USA, pp. 72. ACM Press (2008)
David, H., Fallin, C., Gorbatov, E., Hanebutte, U.R., Mutlu, O.: Memory power management via dynamic voltage/frequency scaling. In: Proceedings of the 8th International Conference on Autonomic Computing, ICAC 2011, Karlsruhe, Germany, 14–18 June 2011, pp. 31–40 (2011)
Deng, Q., Meisner, D., Ramos, L., Wenisch, T.F., Bianchini, R.: Memscale: active low-power modes for main memory. SIGARCH Comput. Archit. News 39(1), 225–238 (2011)
Denning, P.J.: Working sets past and present. IEEE Trans. Softw. Eng. 6(1), 64–84 (1980)
Donyanavard, B., Mück, T., Sarma, S., Dutt, N.: Sparta: runtime task allocation for energy efficient heterogeneous many-cores. In: Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES 2016, New York, NY, USA, pp. 1–10. ACM (2016)
Kanev, S., et al.: Profiling a warehouse-scale computer. In: Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA 2015, New York, NY, USA, pp. 158–169. ACM (2015)
David, H., Fallin, C., Gorbatov, E., Hanebutte, U.R., Mutlu, O.: Memory-aware dynamic voltage and frequency prediction for portable devices. In: The Fourteenth IEEE Internationl Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2008, Kaohisung, Taiwan, 25–27 August 2008, pp. 229–236. Proceedings (2008)
Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: an alternative to very large instruction windows for out-of-order processors. In: Proceedings of the 9th International Symposium on High-Performance Computer Architecture, HPCA 2003, Washington, DC, USA, pp. 129-140. IEEE Computer Society (2003)
Basireddy, K.R., Wachter, E.W., Al-Hashimi, B.M., Merrett, G.: Workload-aware runtime energy management for HPC systems. In: 2018 International Conference on High Performance Computing & Simulation, HPCS 2018, Orleans, France, 16–20 July 2018, pp. 292–299 (2018)
Schaller, R.R.: Moore’s law: past, present and future. IEEE Spectr. 34(6), 52–59 (1997)
Spiliopoulos, V., Kaxiras, S., Keramidas, G.: Green governors: a framework for continuously adaptive DVFS. In: Proceedings of the 2011 International Green Computing Conference and Workshops, IGCC 2011, Washington, DC, USA, pp. 1–8. IEEE Computer Society (2011)
Weiser, M., Welch, B., Demers, A., Shenker, S.: Scheduling for reduced CPU energy. In: Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation, OSDI 1994, Berkeley, CA, USA. USENIX Association (1994)
Wu, D., Al-Hashimi, B.M., Eles, P.: Scheduling and mapping of conditional task graph for the synthesis of low power embedded systems. IEE Proc. Comput. Digit. Tech. 150(5), 262–273 (2003)
Acknowledgement
This work was partially supported by NSF grant CCF-1704859.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this paper
Cite this paper
Maity, B., Donyanavard, B., Venkatasubramanian, N., Dutt, N. (2023). Workload Characterization for Memory Management in Emerging Embedded Platforms. In: Wehrmeister, M.A., Kreutz, M., Götz, M., Henkler, S., Pimentel, A.D., Rettberg, A. (eds) Analysis, Estimations, and Applications of Embedded Systems. IESS 2019. IFIP Advances in Information and Communication Technology, vol 576. Springer, Cham. https://doi.org/10.1007/978-3-031-26500-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-26500-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26499-3
Online ISBN: 978-3-031-26500-6
eBook Packages: Computer ScienceComputer Science (R0)