research-article

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

Authors:
Amirali Boroumand

Carnegie Mellon University, Moffett Field, CA, USA

Carnegie Mellon University, Moffett Field, CA, USA
View Profile

,
Saugata Ghose

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Youngsok Kim

Seoul National University, Seoul, South Korea

Seoul National University, Seoul, South Korea
View Profile

,
Rachata Ausavarungnirun

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Eric Shiu

Google, Mountain View, USA

Google, Mountain View, USA
View Profile

,
Rahul Thakur

Google, Mountain View, USA

Google, Mountain View, USA
View Profile

,
Daehyun Kim

Samsung Research, Google, Seoul, South Korea

Samsung Research, Google, Seoul, South Korea
View Profile

,
Aki Kuusela

Google, Mountain View, USA

Google, Mountain View, USA
View Profile

,
Allan Knies

Google, Mountain View, USA

Google, Mountain View, USA
View Profile

,
Parthasarathy Ranganathan

Google, Mountain View, USA

Google, Mountain View, USA
View Profile

,
Onur Mutlu

ETH Zürich&Carnegie Mellon University, Zurich, Switzerland

ETH Zürich&Carnegie Mellon University, Zurich, Switzerland
View Profile

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2018Pages 316–331https://doi.org/10.1145/3173162.3173177

Published:19 March 2018Publication History

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 316–331

ABSTRACT

We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the limited battery capacity and thermal power budget. We find that data movement is a major contributor to the total system energy and execution time in consumer devices. The energy and performance costs of moving data between the memory system and the compute units are significantly higher than the costs of computation. As a result, addressing data movement is crucial for consumer devices. In this work, we comprehensively analyze the energy and performance impact of data movement for several widely-used Google consumer workloads: (1) the Chrome web browser; (2) TensorFlow Mobile, Google's machine learning framework; (3) video playback, and (4) video capture, both of which are used in many video services such as YouTube and Google Hangouts. We find that processing-in-memory (PIM) can significantly reduce data movement for all of these workloads, by performing part of the computation close to memory. Each workload contains simple primitives and functions that contribute to a significant amount of the overall data movement. We investigate whether these primitives and functions are feasible to implement using PIM, given the limited area and power constraints of consumer devices. Our analysis shows that offloading these primitives to PIM logic, consisting of either simple cores or specialized accelerators, eliminates a large amount of data movement, and significantly reduces total system energy (by an average of 55.4% across the workloads) and execution time (by an average of 54.2%).

References

D. Abts, “Lost in the Bermuda Triangle: Complexity, Energy, and Performance,” in WCED, 2006.Google Scholar
R. Adolf, S. Rama, B. Reagen, G.-Y. Wei, and D. Brooks, “Fathom: Reference Workloads for Modern Deep Learning Methods,” in IISWC, 2016.Google ScholarCross Ref
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, “A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing,” in ISCA, 2015. Google ScholarDigital Library
J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture,” in ISCA, 2015. Google ScholarDigital Library
B. Akin, F. Franchetti, and J. C. Hoe, “Data Reorganization in Memory Using 3D-Stacked DRAM,” in ISCA, 2015. Google ScholarDigital Library
A. Al-Shuwaili and O. Simeone, “Energy-Efficient Resource Allocation for Mobile Edge Computing-Based Augmented Reality Applications,” IEEE Wireless Communications Letters, 2017.Google Scholar
Alexa Internet, Inc., “Website Traffic, Statistics and Analytics,” http://www.alexa.com/siteinfo/.Google Scholar
M. Alzantot, Y. Wang, Z. Ren, and M. B. Srivastava, “RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices,” in EMDL, 2017. Google ScholarDigital Library
ARM Holdings PLC, “ARM Cortex-R8,” https://developer.arm.com/products/processors/cortex-r/cortex-r8.Google Scholar
N. Binkert, B. Beckman, A. Saidi, G. Black, and A. Basu, “The gem5 Simulator,” Comp. Arch. News, 2011. Google ScholarDigital Library
J. Bonwick and B. Moore, “ZFS: The Last Word in File Systems,” https://csde.washington.edu/ mbw/OLD/UNIX/zfs_lite.pdf, 2007.Google Scholar
A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, “LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory,” IEEE CAL, 2017.Google ScholarCross Ref
F. Bossen, B. Bross, K. Suhring, and D. Flynn, “HEVC Complexity and Implementation Analysis,” IEEE CSVT, 2012. Google ScholarDigital Library
Q. Cao, N. Balasubramanian, and A. Balasubramanian, “MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU,” in EMDL, 2017. Google ScholarDigital Library
A. Carroll and G. Heiser, “An Analysis of Power Consumption in a Smartphone,” in USENIX ATC, 2010. Google ScholarDigital Library
G. Chadha, S. Mahlke, and S. Narayanasamy, “EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications,” in PACT, 2014. Google ScholarDigital Library
G. Chadha, S. Mahlke, and S. Narayanasamy, “Accelerating Asynchronous Programs Through Event Sneak Peek,” in ISCA, 2015. Google ScholarDigital Library
D. Chatzopoulos, C. Bermejo, Z. Huang, and P. Hui, “Mobile Augmented Reality Survey: From Where We Are to Where We Go,” IEEE Access, 2017.Google ScholarCross Ref
Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” JSSC, 2017.Google ScholarCross Ref
J.-A. Choi and Y.-S. Ho, “Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding,” in PCM, 2008. Google ScholarDigital Library
C. Chou, P. Nair, and M. K. Qureshi, “Reducing Refresh Power in Mobile Devices with Morphable ECC,” in DSN, 2015. Google ScholarDigital Library
Chromium Project, “Blink Rendering Engine,” https://www.chromium.org/blink.Google Scholar
Chromium Project, “Catapult: Telemetry,” https://chromium.googlesource.com/catapult/Google Scholar
/HEAD/telemetry/README.md.Google Scholar
Chromium Project, “GPU Rasterization in Chromium,” https://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome, 2014.Google Scholar
Cisco Systems, Inc., “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016--2021 White Paper,” http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11--520862.html, 2017.Google Scholar
E. Cuervo, A. Balasubramanian, D. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “MAUI: Making Smartphones Last Longer with Code Offload,” in MobiSys, 2010. Google ScholarDigital Library
H. Deng, X. Zhu, and Z. Chen, “An Efficient Implementation for H.264 Decoder,” in ICCSIT, 2010.Google Scholar
T. M. T. Do, J. Blom, and D. Gatica-Perez, “Smartphone Usage in the Wild: A Large-Scale Analysis of Applications and Context,” in ICMI, 2011. Google ScholarDigital Library
J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca, “The Architecture of the DIVA Processing-in-memory Chip,” in ICS, 2002. Google ScholarDigital Library
M. Drumond, A. Daglis, N. Mirzadeh, D. Ustiugov, J. Picorel, B. Falsafi, B. Grot, and D. Pnevmatikatos, “The Mondrian Data Engine,” in ISCA, 2017. Google ScholarDigital Library
P. Dubroy and R. Balakrishnan, “A Study of Tabbed Browsing Among Mozilla Firefox Users,” in CHI, 2010. Google ScholarDigital Library
eMarketer, Inc., “Slowing Growth Ahead for Worldwide Internet Audience,” https://www.emarketer.com/article/slowing-growth-ahead-worldwide-internet-audience/1014045'soc1001, 2016.Google Scholar
Ericsson, Inc., “Ericsson Mobility Report: On the Pulse of the Networked Society,” https://www.ericsson.com/res/docs/2015/ericsson-mobility-report-june-2015.pdf, 2015.Google Scholar
Facebook, Inc., “Instagram,” https://www.instagram.com/.Google Scholar
M. Gao, G. Ayers, and C. Kozyrakis, “Practical Near-Data Processing for In-Memory Analytics Frameworks,” in PACT, 2015. Google ScholarDigital Library
M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory,” in ASPLOS, 2017. Google ScholarDigital Library
S. Ghose, K. Hsieh, A. Boroumand, R. Ausavarungnirun, and O. Mutlu, “Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions,” arxiv {cs.AR}, 2018.Google Scholar
Google LLC, “Android,” https://www.android.com/.Google Scholar
Google LLC, “Chrome Browser,” https://www.google.com/chrome/browser/.Google Scholar
Google LLC, “Chromebook,” https://www.google.com/chromebook/.Google Scholar
Google LLC, “gemmlowp: a small self-contained low-precision GEMM library,” https://github.com/google/gemmlowp.Google Scholar
Google LLC, “Gmail,” https://www.google.com/gmail/.Google Scholar
Google LLC, “Google Calendar,” https://calendar.google.com/.Google Scholar
Google LLC, “Google Docs,” https://docs.google.com/.Google Scholar
Google LLC, “Google Hangouts,” https://hangouts.google.com/.Google Scholar
Google LLC, “Google Photos,” https://photos.google.com/.Google Scholar
Google LLC, “Google Search,” https://www.google.com/.Google Scholar
Google LLC, “Google Search: About Google App,” https://www.google.com/search/about/.Google Scholar
Google LLC, “Google Translate,” https://translate.google.com/.Google Scholar
Google LLC, “Google Translate App,” https://translate.google.com/intl/en/about/.Google Scholar
Google LLC, “Skia Graphics Library,” https://skia.org/.Google Scholar
Google LLC, “TensorFlow: Mobile,” https://www.tensorflow.org/mobile/.Google Scholar
Google LLC, “YouTube,” https://www.youtube.com/.Google Scholar
Google LLC, “YouTube for Press,” https://www.youtube.com/yt/about/press/.Google Scholar
A. Grange, P. de Rivaz, and J. Hunt, “VP9 Bitstream & Decoding Process Specification,” http://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6--20160331-draft.pdf.Google Scholar
Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti, “3D-Stacked Memory-Side Acceleration: Accelerator and System Design,” in WoNDP, 2014.Google Scholar
A. Gutierrez, R. G. Dreslinski, T. F. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver, “Full-System Analysis and Characterization of Interactive Smartphone Applications,” in IISWC, 2011. Google ScholarDigital Library
H. Habli, J. Lilius, and J. Ersfolk, “Analysis of Memory Access Optimization for Motion Compensation Frames in MPEG-4,” in SOC, 2009. Google ScholarDigital Library
R. Hadidi, L. Nai, H. Kim, and H. Kim, “CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-in-Memory,” ACM TACO, 2017. Google ScholarDigital Library
M. Halpern, Y. Zhu, and V. J. Reddi, “Mobile CPU's Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction,” in HPCA, 2016.Google ScholarCross Ref
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and B. Dally, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in ISCA, 2016. Google ScholarDigital Library
K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” in ECCV, 2016.Google ScholarCross Ref
B. Heater, “As Chromebook Sales Soar in Schools, Apple and Microsoft Fight Back,” https://techcrunch.com/2017/04/27/as-chromebook-sales-soar-in-schools-apple-and-microsoft-fight-back/, 2017.Google Scholar
M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC Baseline Profile Decoder Complexity Analysis,” CSVT, 2003. Google ScholarDigital Library
K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Conner, N. Vijaykumar, O. Mutlu, and S. Keckler, “Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems,” in ISCA, 2016. Google ScholarDigital Library
K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu, “Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation,” in ICCD, 2016.Google ScholarCross Ref
“HTTP Archive,” http://httparchive.org/.Google Scholar
Y. Huang, Z. Zha, M. Chen, and L. Zhang, “Moby: A Mobile Benchmark Suite for Architectural Simulators,” in ISPASS, 2014.Google ScholarCross Ref
D. A. Huffman, “A Method for the Construction of Minimum Redundancy Codes,” Proc. IRE, 1952.Google ScholarCross Ref
D. Hwang, “Native One-Copy Texture Uploads,” https://01.org/chromium/2016/native-one-copy-texture-uploads-for-chrome-OS, 2016.Google Scholar
Hybrid Memory Cube Consortium, “HMC Specification 2.0,” 2014.Google Scholar
Intel Corp., “Intel Celeron Processor N3060,” https://ark.intel.com/products/91832/Intel-Celeron-Processor-N3060--2M-Cache-up-to-2_48-GHz.Google Scholar
Intel Corp., “Software vs. GPU Rasterization in Chromium,” https://software.intel.com/en-us/articles/software-vs-gpu-rasterization-in-chromium.Google Scholar
J. Jeddeloh and B. Keeth, “Hybrid Memory Cube New DRAM Architecture Increases Density and Performance,” in VLSIT, 2012.Google ScholarCross Ref
JEDEC Solid State Technology Assn., “JESD235: High Bandwidth Memory (HBM) DRAM,” 2013.Google Scholar
S. Jennings, “Transparent Memory Compression in Linux,” https://events.static.linuxfound.org/sites/events/files/slides/tmc_sjennings_linuxcon2013.pdf, 2013.Google Scholar
E. Kalali and I. Hamzaoglu, “A Low Energy HEVC Sub-Pixel Interpolation Hardware,” in ICIP, 2014.Google ScholarCross Ref
J. Kane and Q. Yang, “Compression Speed Enhancements to LZO for Multi-Core Systems,” in SBAC-PAD, 2012. Google ScholarDigital Library
Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System,” in ICCD, 2012.Google ScholarDigital Library
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, “GPUs and the Future of Parallel Computing,” IEEE Micro, 2011. Google ScholarDigital Library
D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “NeuroCube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory,” in ISCA, 2016. Google ScholarDigital Library
J. S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu, “GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies,” BMC Genomics, 2018.Google ScholarCross Ref
Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter, “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” in HPCA, 2010.Google Scholar
Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” in MICRO, 2010. Google ScholarDigital Library
Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A Fast and Extensible DRAM Simulator,” IEEE CAL, 2015. Google ScholarDigital Library
P. M. Kogge, “EXECUBE: A New Architecture for Scaleable MPPs,” in ICPP, 1994. Google ScholarDigital Library
Z. Lai, Y. C. Hu, Y. Cui, L. Sun, and N. Dai, “Furion: Engineering High-Quality Immersive Virtual Reality on Today's Mobile Devices,” in MobiCom, 2017. Google ScholarDigital Library
M. J. Langroodi, J. Peters, and S. Shirmohammadi, “Decoder-Complexity-Aware Encoding of Motion Compensation for Multiple Heterogeneous Receivers,” TOMM, 2015. Google ScholarDigital Library
C. Lee and Y. Yu, “Design of a Motion Compensation Unit for H.264 Decoder Using 2-Dimensional Circular Register Files,” in ISOCC, 2008.Google Scholar
D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu, “Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost,” ACM TACO, 2016. Google ScholarDigital Library
P. Lewis, “Avoiding Unnecessary Paints,” https://www.html5rocks.com/en/tutorials/speed/unnecessary-paints/, 2013.Google Scholar
T. Li, C. An, X. Xiao, A. T. Campbell, and X. Zhou, “Real-Time Screen-Camera Communication Behind Any Scene,” in MobiSys, 2015. Google ScholarDigital Library
F. Liu, P. Shu, H. Jin, L. Ding, J. Yu, D. Niu, and B. Li, “Gearing Resource-Poor Mobile Devices with Powerful Clouds: Architectures, Challenges, and Applications,” IEEE Wireless Communications, 2013.Google Scholar
G. H. Loh, “3D-Stacked Memory Architectures for Multi-Core Processors,” in ISCA, 2008. Google ScholarDigital Library
K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” in ISCA, 2000. Google ScholarDigital Library
Mentor Graphics Corp., “Catapult High-Level Synthesis,” https://www.mentor.com/hls-lp/catapult-high-level-synthesis/.Google Scholar
Microsoft Corp., “Skype,” https://www.skype.com/.Google Scholar
A. Mirhosseini, A. Agrawal, and J. Torrellas, “Survive: Pointer-Based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery,” IEEE CAL, 2017.Google ScholarCross Ref
N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot, “Sort vs. Hash Join Revisited for Near-Memory Execution,” in ASBD, 2007.Google Scholar
B. Moatamed, Arjun, F. Shahmohammadi, R. Ramezani, A. Naeim, and M. Sarrafzadeh, “Low-Cost Indoor Health Monitoring System,” in BSN, 2016.Google ScholarCross Ref
A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “CABA: Continuous Authentication Based on BioAura,” IEEE TC, 2017. Google ScholarDigital Library
A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Wearable Medical Sensor-Based System Design: A Survey,” MSCS, 2017.Google ScholarCross Ref
S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” in MICRO, 2011. Google ScholarDigital Library
N. Muralimanohar, R. Balasubramonian, and N. Jouppi, “Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0,” in MICRO, 2007. Google ScholarDigital Library
N. C. Nachiappan, H. Zhang, J. Ryoo, N. Soundararajan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “VIP: Virtualizing IP Chains on Handheld Platforms,” in ISCA, 2015. Google ScholarDigital Library
L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim, “GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks,” in HPCA, 2017.Google ScholarCross Ref
G. Narancic, P. Judd, D. Wu, I. Atta, M. Elnacouzi, J. Zebchuk, J. Albericio, N. E. Jerger, A. Moshovos, K. Kutulakos, and S. Gadelrab, “Evaluating the Memory System Behavior of Smartphone Workloads,” in SAMOS, 2014.Google Scholar
Net Applications, “Market Share Statistics for Internet Technologies,” https://www.netmarketshare.com/.Google Scholar
A. M. Nia, M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Energy-Efficient Long-term Continuous Personal Health Monitoring,” MSCS, 2015. Google ScholarDigital Library
Nielsen Norman Group, “Page Parking: Millennials' Multi-Tab Mania,” https://www.nngroup.com/articles/multi-tab-page-parking/.Google Scholar
M. F. X. J. Oberhumer, “LZO Real-Time Data Compression Library,” http://www.oberhumer.com/opensource/lzo/, 2018.Google Scholar
M. Oskin, F. T. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory,” in ISCA, 1998. Google ScholarDigital Library
D. Pandiyan, S.-Y. Lee, and C.-J. Wu, “Performance, Energy Characterizations and Architectural Implications of an Emerging Mobile Platform Benchmark Suite -- MobileBench,” in IISWC, 2013.Google ScholarCross Ref
D. Pandiyan and C.-J. Wu, “Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms,” in IISWC, 2014.Google Scholar
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, “A Case for Intelligent RAM,” IEEE Micro, 1997. Google ScholarDigital Library
A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das, “Scheduling Techniques for GPU Architectures with Processing-in-Memory Capabilities,” in PACT, 2016. Google ScholarDigital Library
B. Popper, “Google Services Monthly Active Users,” https://www.theverge.com/2017/5/17/15654454/android-reaches-2-billion-monthly-active-users, 2017.Google Scholar
Qualcomm Technologies, Inc., “Snapdragon 835 Mobile Platform,” https://www.qualcomm.com/products/snapdragon/processors/835.Google Scholar
B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators,” in ISCA, 2016. Google ScholarDigital Library
J. Ren and N. Kehtarnavaz, “Comparison of Power Consumption for Motion Compensation and Deblocking Filters in High Definition Video Coding,” in ISCE, 2007.Google ScholarCross Ref
P. V. Rengasamy, H. Zhang, N. Nachiappan, S. Zhao, A. Sivasubramaniam, M. T. Kandemir, and C. R. Das, “Characterizing Diverse Handheld Apps for Customized Hardware Acceleration,” in IISWC, 2017.Google ScholarCross Ref
O. Rodeh, J. Bacik, and C. Mason, “BTRFS: The Linux B-Tree Filesystem,” ACM TOS, 2013. Google ScholarDigital Library
S. Rosen, A. Nikravesh, Y. Guo, Z. M. Mao, F. Qian, and S. Sen, “Revisiting Network Energy Efficiency of Mobile Apps: Performance in the Wild,” in IMC, 2015. Google ScholarDigital Library
F. Ross, “Migrating to LPDDR3: An Overview of LPDDR3 Commands, Operations, and Functions,” in JEDEC LPDDR3 Symposium, 2012.Google Scholar
V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Fast Bulk Bitwise AND and OR in DRAM,” CAL, 2015. Google ScholarDigital Library
V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology,” in MICRO, 2017. Google ScholarDigital Library
V. Seshadri and O. Mutlu, “The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR,” arXiv:1610.09603 {cs:AR}, 2016.Google Scholar
V. Seshadri and O. Mutlu, “Simple Operations in Memory to Reduce Data Movement,” in Advances in Computers, Volume 106, 2017.Google Scholar
D. E. Shaw, S. J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, and J. A. Andrews, “The NON-VON Database Machine: A Brief Overview,” IEEE DEB, 1981.Google Scholar
D. Shingari, A. Arunkumar, and C.-J. Wu, “Characterization and Throttling-Based Mitigation of Memory Interference for Heterogeneous Smartphones,” in IISWC, 2015. Google ScholarDigital Library
K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in ICLR, 2015.Google Scholar
R. Smith, “Apple's A9 SoC Is Dual Sourced From Samsung & TSMC,” https://www.anandtech.com/show/9665/apples-a9-soc-is-dual-sourced-from-samsung-tsmc, 2015.Google Scholar
J. Stankowski, D. Karwowski, K. Klimaszewski, K. Wegner, O. Stankiewicz, and T. Grajek, “Analysis of the Complexity of the HEVC Motion Estimation,” in IWSSIP, 2016.Google ScholarCross Ref
H. S. Stone, “A Logic-in-Memory Computer,” IEEE TC, 1970. Google ScholarDigital Library
R. Sukale, “What Are Reflows and Repaints and How to Avoid Them,” http://javascript.tutorialhorizon.com/2015/06/06/what-are-reflows-and-repaints-and-how-to-avoid-them/, 2015.Google Scholar
S. Sutardja, “The Future of IC Design Innovation,” in ISSCC, 2015.Google ScholarCross Ref
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” in AAAI, 2017.Google ScholarCross Ref
X. Tang, O. Kislal, M. Kandemir, and M. Karakoy, “Data Movement Aware Computation Partitioning,” in MICRO, 2017. Google ScholarDigital Library
TechInsights, “Samsung Galaxy S6,” http://www.techinsights.com/about-techinsights/overview/blog/inside-the-samsung-galaxy-s6/.Google Scholar
R. Thompson, “Improve Rendering Performance with Dev Tools,” ttps://inviqa.com/blog/improve-rendering-performance-dev-tools, 2014.Google Scholar
G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen, J. Shor, and M. Covell, “Full Resolution Image Compression with Recurrent Neural Networks,” in CVPR, 2017.Google ScholarCross Ref
Twitter, Inc., “Twitter,” https://www.twitter.com/.Google Scholar
E. Vasilakis, “An Instruction Level Energy Characterization of ARM Processors,” Foundation of Research and Technology Hellas, Inst. of Computer Science, Tech. Rep. FORTH-ICS/TR-450, 2015.Google Scholar
WebM Project, “Hardware: SoCs Supporting VP8/VP9,” http://wiki.webmproject.org/hardware/socs.Google Scholar
WebM Project, “WebM Repositories -- libvpx: VP8/VP9 Codec SDK,” https://www.webmproject.org/code/.Google Scholar
WebM Project, “WebM Video Hardware RTLs,” https://www.webmproject.org/hardware/.Google Scholar
S. Wegner, A. Cowsky, C. Davis, D. James, D. Yang, R. Fontaine, and J. Morrison, “Apple iPhone 7 Teardown,” http://www.techinsights.com/about-techinsights/overview/blog/apple-iphone-7-teardown/, 2016.Google Scholar
A. Wei, “Qualcomm Snapdragon 835 First to 10 nm,” http://www.techinsights.com/about-techinsights/overview/blog/qualcomm-snapdragon-835-first-to-10-nm/, 2017.Google Scholar
WordPress Foundation, “WordPress,” https://www.wordpress.com/.Google Scholar
S. L. Xi, O. Babarinsa, M. Athanassoulis, and S. Idreos, “Beyond the Wall: Near-Data Processing for Databases,” in DaMoN, 2015. Google ScholarDigital Library
C. Xie, S. L. Song, J. Wang, W. Zhang, and X. Fu, “Processing-in-Memory Enabled Graphics Processors for 3D Rendering,” in HPCA, 2017.Google ScholarCross Ref
Xiph.Org Foundation, “Derf Video Test Collection,” https://media.xiph.org/video/derf/.Google Scholar
D. P. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, “TOP-PIM: Throughput-Oriented Programmable Processing in Memory,” in HPDC, 2014. Google ScholarDigital Library
H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-To-SleepGoogle Scholar
Content CachingGoogle Scholar
Display Caching: A Recipe for Energy-eficient Video Streaming on Handhelds,” in MICRO, 2017.Google Scholar
H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-to-SleepGoogle Scholar
Content CachingGoogle Scholar
Display Caching: A Recipe for Energy-Efficient Video Streaming on Handhelds,” in MICRO, 2017.Google Scholar
X. Zhang, J. Li, H. Wang, D. Xiong, J. Qu, H. Shin, J. P. Kim, and T. Zhang, “Realizing Transparent OS/Apps Compression in Mobile Devices at Zero Latency Overhead,” IEEE TC, 2017.Google ScholarCross Ref
S. Zhu and K.-K. Ma, “A New Diamond Search Algorithm for Fast Block Matching Motion Estimation,” in ICICS, 1997.Google Scholar
Y. Zhu and V. J. Reddi, “WebCore: Architectural Support for Mobile Web Browsing,” in ISCA, 2014. Google ScholarDigital Library
Y. Zhu and V. J. Reddi, “GreenWeb: Language Extensions for Energy-Efficient Mobile Web Computing,” in PLDI, 2016. Google ScholarDigital Library
J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” TIT, 1977. Google ScholarDigital Library

Index Terms

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
1. Hardware
  1. Power and energy
2. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile devices

Recommendations

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
ASPLOS '18

We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the ...
Read More
Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation
GLSVLSI '19: Proceedings of the 2019 on Great Lakes Symposium on VLSI

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: 1) data access from memory is already a ...
Read More
Processing data where it makes sense: Enabling in-memory computation
Abstract
Today’s systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
March 2018
827 pages
ISBN:9781450349116
DOI:10.1145/3173162
General Chairs:
Xipeng Shen
North Carolina State University, USA
,
James Tuck
North Carolina State University, USA
,
Program Chairs:
Ricardo Bianchini
Microsoft Research, USA
,
Vivek Sarkar
Georgia Institute of Technology, USA
ACM SIGPLAN Notices Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 March 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
consumer workloads
data movement
energy efficiency
memory systems
processing-in-memory
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '18 Paper Acceptance Rate56of319submissions,18%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 185
  Total Citations
  View Citations
- 2,784
  Total Downloads
- Downloads (Last 12 months)474
- Downloads (Last 6 weeks)60
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation

Processing data where it makes sense: Enabling in-memory computation