skip to main content
10.1145/3173162.3173177acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

Published:19 March 2018Publication History

ABSTRACT

We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the limited battery capacity and thermal power budget. We find that data movement is a major contributor to the total system energy and execution time in consumer devices. The energy and performance costs of moving data between the memory system and the compute units are significantly higher than the costs of computation. As a result, addressing data movement is crucial for consumer devices. In this work, we comprehensively analyze the energy and performance impact of data movement for several widely-used Google consumer workloads: (1) the Chrome web browser; (2) TensorFlow Mobile, Google's machine learning framework; (3) video playback, and (4) video capture, both of which are used in many video services such as YouTube and Google Hangouts. We find that processing-in-memory (PIM) can significantly reduce data movement for all of these workloads, by performing part of the computation close to memory. Each workload contains simple primitives and functions that contribute to a significant amount of the overall data movement. We investigate whether these primitives and functions are feasible to implement using PIM, given the limited area and power constraints of consumer devices. Our analysis shows that offloading these primitives to PIM logic, consisting of either simple cores or specialized accelerators, eliminates a large amount of data movement, and significantly reduces total system energy (by an average of 55.4% across the workloads) and execution time (by an average of 54.2%).

References

  1. D. Abts, “Lost in the Bermuda Triangle: Complexity, Energy, and Performance,” in WCED, 2006.Google ScholarGoogle Scholar
  2. R. Adolf, S. Rama, B. Reagen, G.-Y. Wei, and D. Brooks, “Fathom: Reference Workloads for Modern Deep Learning Methods,” in IISWC, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, “A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing,” in ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture,” in ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Akin, F. Franchetti, and J. C. Hoe, “Data Reorganization in Memory Using 3D-Stacked DRAM,” in ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Al-Shuwaili and O. Simeone, “Energy-Efficient Resource Allocation for Mobile Edge Computing-Based Augmented Reality Applications,” IEEE Wireless Communications Letters, 2017.Google ScholarGoogle Scholar
  7. Alexa Internet, Inc., “Website Traffic, Statistics and Analytics,” http://www.alexa.com/siteinfo/.Google ScholarGoogle Scholar
  8. M. Alzantot, Y. Wang, Z. Ren, and M. B. Srivastava, “RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices,” in EMDL, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ARM Holdings PLC, “ARM Cortex-R8,” https://developer.arm.com/products/processors/cortex-r/cortex-r8.Google ScholarGoogle Scholar
  10. N. Binkert, B. Beckman, A. Saidi, G. Black, and A. Basu, “The gem5 Simulator,” Comp. Arch. News, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Bonwick and B. Moore, “ZFS: The Last Word in File Systems,” https://csde.washington.edu/ mbw/OLD/UNIX/zfs_lite.pdf, 2007.Google ScholarGoogle Scholar
  12. A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, “LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory,” IEEE CAL, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  13. F. Bossen, B. Bross, K. Suhring, and D. Flynn, “HEVC Complexity and Implementation Analysis,” IEEE CSVT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Q. Cao, N. Balasubramanian, and A. Balasubramanian, “MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU,” in EMDL, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Carroll and G. Heiser, “An Analysis of Power Consumption in a Smartphone,” in USENIX ATC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Chadha, S. Mahlke, and S. Narayanasamy, “EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications,” in PACT, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Chadha, S. Mahlke, and S. Narayanasamy, “Accelerating Asynchronous Programs Through Event Sneak Peek,” in ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Chatzopoulos, C. Bermejo, Z. Huang, and P. Hui, “Mobile Augmented Reality Survey: From Where We Are to Where We Go,” IEEE Access, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  19. Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” JSSC, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. J.-A. Choi and Y.-S. Ho, “Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding,” in PCM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Chou, P. Nair, and M. K. Qureshi, “Reducing Refresh Power in Mobile Devices with Morphable ECC,” in DSN, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chromium Project, “Blink Rendering Engine,” https://www.chromium.org/blink.Google ScholarGoogle Scholar
  23. Chromium Project, “Catapult: Telemetry,” https://chromium.googlesource.com/catapult/Google ScholarGoogle Scholar
  24. /HEAD/telemetry/README.md.Google ScholarGoogle Scholar
  25. Chromium Project, “GPU Rasterization in Chromium,” https://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome, 2014.Google ScholarGoogle Scholar
  26. Cisco Systems, Inc., “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016--2021 White Paper,” http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11--520862.html, 2017.Google ScholarGoogle Scholar
  27. E. Cuervo, A. Balasubramanian, D. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “MAUI: Making Smartphones Last Longer with Code Offload,” in MobiSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Deng, X. Zhu, and Z. Chen, “An Efficient Implementation for H.264 Decoder,” in ICCSIT, 2010.Google ScholarGoogle Scholar
  29. T. M. T. Do, J. Blom, and D. Gatica-Perez, “Smartphone Usage in the Wild: A Large-Scale Analysis of Applications and Context,” in ICMI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca, “The Architecture of the DIVA Processing-in-memory Chip,” in ICS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Drumond, A. Daglis, N. Mirzadeh, D. Ustiugov, J. Picorel, B. Falsafi, B. Grot, and D. Pnevmatikatos, “The Mondrian Data Engine,” in ISCA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Dubroy and R. Balakrishnan, “A Study of Tabbed Browsing Among Mozilla Firefox Users,” in CHI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. eMarketer, Inc., “Slowing Growth Ahead for Worldwide Internet Audience,” https://www.emarketer.com/article/slowing-growth-ahead-worldwide-internet-audience/1014045'soc1001, 2016.Google ScholarGoogle Scholar
  34. Ericsson, Inc., “Ericsson Mobility Report: On the Pulse of the Networked Society,” https://www.ericsson.com/res/docs/2015/ericsson-mobility-report-june-2015.pdf, 2015.Google ScholarGoogle Scholar
  35. Facebook, Inc., “Instagram,” https://www.instagram.com/.Google ScholarGoogle Scholar
  36. M. Gao, G. Ayers, and C. Kozyrakis, “Practical Near-Data Processing for In-Memory Analytics Frameworks,” in PACT, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory,” in ASPLOS, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Ghose, K. Hsieh, A. Boroumand, R. Ausavarungnirun, and O. Mutlu, “Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions,” arxiv {cs.AR}, 2018.Google ScholarGoogle Scholar
  39. Google LLC, “Android,” https://www.android.com/.Google ScholarGoogle Scholar
  40. Google LLC, “Chrome Browser,” https://www.google.com/chrome/browser/.Google ScholarGoogle Scholar
  41. Google LLC, “Chromebook,” https://www.google.com/chromebook/.Google ScholarGoogle Scholar
  42. Google LLC, “gemmlowp: a small self-contained low-precision GEMM library,” https://github.com/google/gemmlowp.Google ScholarGoogle Scholar
  43. Google LLC, “Gmail,” https://www.google.com/gmail/.Google ScholarGoogle Scholar
  44. Google LLC, “Google Calendar,” https://calendar.google.com/.Google ScholarGoogle Scholar
  45. Google LLC, “Google Docs,” https://docs.google.com/.Google ScholarGoogle Scholar
  46. Google LLC, “Google Hangouts,” https://hangouts.google.com/.Google ScholarGoogle Scholar
  47. Google LLC, “Google Photos,” https://photos.google.com/.Google ScholarGoogle Scholar
  48. Google LLC, “Google Search,” https://www.google.com/.Google ScholarGoogle Scholar
  49. Google LLC, “Google Search: About Google App,” https://www.google.com/search/about/.Google ScholarGoogle Scholar
  50. Google LLC, “Google Translate,” https://translate.google.com/.Google ScholarGoogle Scholar
  51. Google LLC, “Google Translate App,” https://translate.google.com/intl/en/about/.Google ScholarGoogle Scholar
  52. Google LLC, “Skia Graphics Library,” https://skia.org/.Google ScholarGoogle Scholar
  53. Google LLC, “TensorFlow: Mobile,” https://www.tensorflow.org/mobile/.Google ScholarGoogle Scholar
  54. Google LLC, “YouTube,” https://www.youtube.com/.Google ScholarGoogle Scholar
  55. Google LLC, “YouTube for Press,” https://www.youtube.com/yt/about/press/.Google ScholarGoogle Scholar
  56. A. Grange, P. de Rivaz, and J. Hunt, “VP9 Bitstream & Decoding Process Specification,” http://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6--20160331-draft.pdf.Google ScholarGoogle Scholar
  57. Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti, “3D-Stacked Memory-Side Acceleration: Accelerator and System Design,” in WoNDP, 2014.Google ScholarGoogle Scholar
  58. A. Gutierrez, R. G. Dreslinski, T. F. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver, “Full-System Analysis and Characterization of Interactive Smartphone Applications,” in IISWC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. H. Habli, J. Lilius, and J. Ersfolk, “Analysis of Memory Access Optimization for Motion Compensation Frames in MPEG-4,” in SOC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. R. Hadidi, L. Nai, H. Kim, and H. Kim, “CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-in-Memory,” ACM TACO, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. M. Halpern, Y. Zhu, and V. J. Reddi, “Mobile CPU's Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction,” in HPCA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  62. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and B. Dally, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” in ECCV, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  64. B. Heater, “As Chromebook Sales Soar in Schools, Apple and Microsoft Fight Back,” https://techcrunch.com/2017/04/27/as-chromebook-sales-soar-in-schools-apple-and-microsoft-fight-back/, 2017.Google ScholarGoogle Scholar
  65. M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC Baseline Profile Decoder Complexity Analysis,” CSVT, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Conner, N. Vijaykumar, O. Mutlu, and S. Keckler, “Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems,” in ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu, “Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation,” in ICCD, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  68. “HTTP Archive,” http://httparchive.org/.Google ScholarGoogle Scholar
  69. Y. Huang, Z. Zha, M. Chen, and L. Zhang, “Moby: A Mobile Benchmark Suite for Architectural Simulators,” in ISPASS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  70. D. A. Huffman, “A Method for the Construction of Minimum Redundancy Codes,” Proc. IRE, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  71. D. Hwang, “Native One-Copy Texture Uploads,” https://01.org/chromium/2016/native-one-copy-texture-uploads-for-chrome-OS, 2016.Google ScholarGoogle Scholar
  72. Hybrid Memory Cube Consortium, “HMC Specification 2.0,” 2014.Google ScholarGoogle Scholar
  73. Intel Corp., “Intel Celeron Processor N3060,” https://ark.intel.com/products/91832/Intel-Celeron-Processor-N3060--2M-Cache-up-to-2_48-GHz.Google ScholarGoogle Scholar
  74. Intel Corp., “Software vs. GPU Rasterization in Chromium,” https://software.intel.com/en-us/articles/software-vs-gpu-rasterization-in-chromium.Google ScholarGoogle Scholar
  75. J. Jeddeloh and B. Keeth, “Hybrid Memory Cube New DRAM Architecture Increases Density and Performance,” in VLSIT, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  76. JEDEC Solid State Technology Assn., “JESD235: High Bandwidth Memory (HBM) DRAM,” 2013.Google ScholarGoogle Scholar
  77. S. Jennings, “Transparent Memory Compression in Linux,” https://events.static.linuxfound.org/sites/events/files/slides/tmc_sjennings_linuxcon2013.pdf, 2013.Google ScholarGoogle Scholar
  78. E. Kalali and I. Hamzaoglu, “A Low Energy HEVC Sub-Pixel Interpolation Hardware,” in ICIP, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  79. J. Kane and Q. Yang, “Compression Speed Enhancements to LZO for Multi-Core Systems,” in SBAC-PAD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System,” in ICCD, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, “GPUs and the Future of Parallel Computing,” IEEE Micro, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “NeuroCube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory,” in ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. J. S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu, “GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies,” BMC Genomics, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  84. Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter, “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” in HPCA, 2010.Google ScholarGoogle Scholar
  85. Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” in MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A Fast and Extensible DRAM Simulator,” IEEE CAL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. P. M. Kogge, “EXECUBE: A New Architecture for Scaleable MPPs,” in ICPP, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Z. Lai, Y. C. Hu, Y. Cui, L. Sun, and N. Dai, “Furion: Engineering High-Quality Immersive Virtual Reality on Today's Mobile Devices,” in MobiCom, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. M. J. Langroodi, J. Peters, and S. Shirmohammadi, “Decoder-Complexity-Aware Encoding of Motion Compensation for Multiple Heterogeneous Receivers,” TOMM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. C. Lee and Y. Yu, “Design of a Motion Compensation Unit for H.264 Decoder Using 2-Dimensional Circular Register Files,” in ISOCC, 2008.Google ScholarGoogle Scholar
  91. D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu, “Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost,” ACM TACO, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. P. Lewis, “Avoiding Unnecessary Paints,” https://www.html5rocks.com/en/tutorials/speed/unnecessary-paints/, 2013.Google ScholarGoogle Scholar
  93. T. Li, C. An, X. Xiao, A. T. Campbell, and X. Zhou, “Real-Time Screen-Camera Communication Behind Any Scene,” in MobiSys, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. F. Liu, P. Shu, H. Jin, L. Ding, J. Yu, D. Niu, and B. Li, “Gearing Resource-Poor Mobile Devices with Powerful Clouds: Architectures, Challenges, and Applications,” IEEE Wireless Communications, 2013.Google ScholarGoogle Scholar
  95. G. H. Loh, “3D-Stacked Memory Architectures for Multi-Core Processors,” in ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” in ISCA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Mentor Graphics Corp., “Catapult High-Level Synthesis,” https://www.mentor.com/hls-lp/catapult-high-level-synthesis/.Google ScholarGoogle Scholar
  98. Microsoft Corp., “Skype,” https://www.skype.com/.Google ScholarGoogle Scholar
  99. A. Mirhosseini, A. Agrawal, and J. Torrellas, “Survive: Pointer-Based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery,” IEEE CAL, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  100. N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot, “Sort vs. Hash Join Revisited for Near-Memory Execution,” in ASBD, 2007.Google ScholarGoogle Scholar
  101. B. Moatamed, Arjun, F. Shahmohammadi, R. Ramezani, A. Naeim, and M. Sarrafzadeh, “Low-Cost Indoor Health Monitoring System,” in BSN, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  102. A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “CABA: Continuous Authentication Based on BioAura,” IEEE TC, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Wearable Medical Sensor-Based System Design: A Survey,” MSCS, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  104. S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” in MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. N. Muralimanohar, R. Balasubramonian, and N. Jouppi, “Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0,” in MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. N. C. Nachiappan, H. Zhang, J. Ryoo, N. Soundararajan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “VIP: Virtualizing IP Chains on Handheld Platforms,” in ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim, “GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks,” in HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  108. G. Narancic, P. Judd, D. Wu, I. Atta, M. Elnacouzi, J. Zebchuk, J. Albericio, N. E. Jerger, A. Moshovos, K. Kutulakos, and S. Gadelrab, “Evaluating the Memory System Behavior of Smartphone Workloads,” in SAMOS, 2014.Google ScholarGoogle Scholar
  109. Net Applications, “Market Share Statistics for Internet Technologies,” https://www.netmarketshare.com/.Google ScholarGoogle Scholar
  110. A. M. Nia, M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Energy-Efficient Long-term Continuous Personal Health Monitoring,” MSCS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Nielsen Norman Group, “Page Parking: Millennials' Multi-Tab Mania,” https://www.nngroup.com/articles/multi-tab-page-parking/.Google ScholarGoogle Scholar
  112. M. F. X. J. Oberhumer, “LZO Real-Time Data Compression Library,” http://www.oberhumer.com/opensource/lzo/, 2018.Google ScholarGoogle Scholar
  113. M. Oskin, F. T. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory,” in ISCA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. D. Pandiyan, S.-Y. Lee, and C.-J. Wu, “Performance, Energy Characterizations and Architectural Implications of an Emerging Mobile Platform Benchmark Suite -- MobileBench,” in IISWC, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  115. D. Pandiyan and C.-J. Wu, “Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms,” in IISWC, 2014.Google ScholarGoogle Scholar
  116. D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, “A Case for Intelligent RAM,” IEEE Micro, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das, “Scheduling Techniques for GPU Architectures with Processing-in-Memory Capabilities,” in PACT, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. B. Popper, “Google Services Monthly Active Users,” https://www.theverge.com/2017/5/17/15654454/android-reaches-2-billion-monthly-active-users, 2017.Google ScholarGoogle Scholar
  119. Qualcomm Technologies, Inc., “Snapdragon 835 Mobile Platform,” https://www.qualcomm.com/products/snapdragon/processors/835.Google ScholarGoogle Scholar
  120. B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators,” in ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. J. Ren and N. Kehtarnavaz, “Comparison of Power Consumption for Motion Compensation and Deblocking Filters in High Definition Video Coding,” in ISCE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  122. P. V. Rengasamy, H. Zhang, N. Nachiappan, S. Zhao, A. Sivasubramaniam, M. T. Kandemir, and C. R. Das, “Characterizing Diverse Handheld Apps for Customized Hardware Acceleration,” in IISWC, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  123. O. Rodeh, J. Bacik, and C. Mason, “BTRFS: The Linux B-Tree Filesystem,” ACM TOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. S. Rosen, A. Nikravesh, Y. Guo, Z. M. Mao, F. Qian, and S. Sen, “Revisiting Network Energy Efficiency of Mobile Apps: Performance in the Wild,” in IMC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. F. Ross, “Migrating to LPDDR3: An Overview of LPDDR3 Commands, Operations, and Functions,” in JEDEC LPDDR3 Symposium, 2012.Google ScholarGoogle Scholar
  126. V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Fast Bulk Bitwise AND and OR in DRAM,” CAL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology,” in MICRO, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. V. Seshadri and O. Mutlu, “The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR,” arXiv:1610.09603 {cs:AR}, 2016.Google ScholarGoogle Scholar
  129. V. Seshadri and O. Mutlu, “Simple Operations in Memory to Reduce Data Movement,” in Advances in Computers, Volume 106, 2017.Google ScholarGoogle Scholar
  130. D. E. Shaw, S. J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, and J. A. Andrews, “The NON-VON Database Machine: A Brief Overview,” IEEE DEB, 1981.Google ScholarGoogle Scholar
  131. D. Shingari, A. Arunkumar, and C.-J. Wu, “Characterization and Throttling-Based Mitigation of Memory Interference for Heterogeneous Smartphones,” in IISWC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in ICLR, 2015.Google ScholarGoogle Scholar
  133. R. Smith, “Apple's A9 SoC Is Dual Sourced From Samsung & TSMC,” https://www.anandtech.com/show/9665/apples-a9-soc-is-dual-sourced-from-samsung-tsmc, 2015.Google ScholarGoogle Scholar
  134. J. Stankowski, D. Karwowski, K. Klimaszewski, K. Wegner, O. Stankiewicz, and T. Grajek, “Analysis of the Complexity of the HEVC Motion Estimation,” in IWSSIP, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  135. H. S. Stone, “A Logic-in-Memory Computer,” IEEE TC, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. R. Sukale, “What Are Reflows and Repaints and How to Avoid Them,” http://javascript.tutorialhorizon.com/2015/06/06/what-are-reflows-and-repaints-and-how-to-avoid-them/, 2015.Google ScholarGoogle Scholar
  137. S. Sutardja, “The Future of IC Design Innovation,” in ISSCC, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  138. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” in AAAI, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  139. X. Tang, O. Kislal, M. Kandemir, and M. Karakoy, “Data Movement Aware Computation Partitioning,” in MICRO, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. TechInsights, “Samsung Galaxy S6,” http://www.techinsights.com/about-techinsights/overview/blog/inside-the-samsung-galaxy-s6/.Google ScholarGoogle Scholar
  141. R. Thompson, “Improve Rendering Performance with Dev Tools,” ttps://inviqa.com/blog/improve-rendering-performance-dev-tools, 2014.Google ScholarGoogle Scholar
  142. G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen, J. Shor, and M. Covell, “Full Resolution Image Compression with Recurrent Neural Networks,” in CVPR, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  143. Twitter, Inc., “Twitter,” https://www.twitter.com/.Google ScholarGoogle Scholar
  144. E. Vasilakis, “An Instruction Level Energy Characterization of ARM Processors,” Foundation of Research and Technology Hellas, Inst. of Computer Science, Tech. Rep. FORTH-ICS/TR-450, 2015.Google ScholarGoogle Scholar
  145. WebM Project, “Hardware: SoCs Supporting VP8/VP9,” http://wiki.webmproject.org/hardware/socs.Google ScholarGoogle Scholar
  146. WebM Project, “WebM Repositories -- libvpx: VP8/VP9 Codec SDK,” https://www.webmproject.org/code/.Google ScholarGoogle Scholar
  147. WebM Project, “WebM Video Hardware RTLs,” https://www.webmproject.org/hardware/.Google ScholarGoogle Scholar
  148. S. Wegner, A. Cowsky, C. Davis, D. James, D. Yang, R. Fontaine, and J. Morrison, “Apple iPhone 7 Teardown,” http://www.techinsights.com/about-techinsights/overview/blog/apple-iphone-7-teardown/, 2016.Google ScholarGoogle Scholar
  149. A. Wei, “Qualcomm Snapdragon 835 First to 10 nm,” http://www.techinsights.com/about-techinsights/overview/blog/qualcomm-snapdragon-835-first-to-10-nm/, 2017.Google ScholarGoogle Scholar
  150. WordPress Foundation, “WordPress,” https://www.wordpress.com/.Google ScholarGoogle Scholar
  151. S. L. Xi, O. Babarinsa, M. Athanassoulis, and S. Idreos, “Beyond the Wall: Near-Data Processing for Databases,” in DaMoN, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. C. Xie, S. L. Song, J. Wang, W. Zhang, and X. Fu, “Processing-in-Memory Enabled Graphics Processors for 3D Rendering,” in HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  153. Xiph.Org Foundation, “Derf Video Test Collection,” https://media.xiph.org/video/derf/.Google ScholarGoogle Scholar
  154. D. P. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, “TOP-PIM: Throughput-Oriented Programmable Processing in Memory,” in HPDC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-To-SleepGoogle ScholarGoogle Scholar
  156. Content CachingGoogle ScholarGoogle Scholar
  157. Display Caching: A Recipe for Energy-eficient Video Streaming on Handhelds,” in MICRO, 2017.Google ScholarGoogle Scholar
  158. H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-to-SleepGoogle ScholarGoogle Scholar
  159. Content CachingGoogle ScholarGoogle Scholar
  160. Display Caching: A Recipe for Energy-Efficient Video Streaming on Handhelds,” in MICRO, 2017.Google ScholarGoogle Scholar
  161. X. Zhang, J. Li, H. Wang, D. Xiong, J. Qu, H. Shin, J. P. Kim, and T. Zhang, “Realizing Transparent OS/Apps Compression in Mobile Devices at Zero Latency Overhead,” IEEE TC, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  162. S. Zhu and K.-K. Ma, “A New Diamond Search Algorithm for Fast Block Matching Motion Estimation,” in ICICS, 1997.Google ScholarGoogle Scholar
  163. Y. Zhu and V. J. Reddi, “WebCore: Architectural Support for Mobile Web Browsing,” in ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Y. Zhu and V. J. Reddi, “GreenWeb: Language Extensions for Energy-Efficient Mobile Web Computing,” in PLDI, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” TIT, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2018
        827 pages
        ISBN:9781450349116
        DOI:10.1145/3173162
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 53, Issue 2
          ASPLOS '18
          February 2018
          809 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3296957
          Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 March 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASPLOS '18 Paper Acceptance Rate56of319submissions,18%Overall Acceptance Rate535of2,713submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader