ABSTRACT
We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the limited battery capacity and thermal power budget. We find that data movement is a major contributor to the total system energy and execution time in consumer devices. The energy and performance costs of moving data between the memory system and the compute units are significantly higher than the costs of computation. As a result, addressing data movement is crucial for consumer devices. In this work, we comprehensively analyze the energy and performance impact of data movement for several widely-used Google consumer workloads: (1) the Chrome web browser; (2) TensorFlow Mobile, Google's machine learning framework; (3) video playback, and (4) video capture, both of which are used in many video services such as YouTube and Google Hangouts. We find that processing-in-memory (PIM) can significantly reduce data movement for all of these workloads, by performing part of the computation close to memory. Each workload contains simple primitives and functions that contribute to a significant amount of the overall data movement. We investigate whether these primitives and functions are feasible to implement using PIM, given the limited area and power constraints of consumer devices. Our analysis shows that offloading these primitives to PIM logic, consisting of either simple cores or specialized accelerators, eliminates a large amount of data movement, and significantly reduces total system energy (by an average of 55.4% across the workloads) and execution time (by an average of 54.2%).
- D. Abts, “Lost in the Bermuda Triangle: Complexity, Energy, and Performance,” in WCED, 2006.Google Scholar
- R. Adolf, S. Rama, B. Reagen, G.-Y. Wei, and D. Brooks, “Fathom: Reference Workloads for Modern Deep Learning Methods,” in IISWC, 2016.Google ScholarCross Ref
- J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, “A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing,” in ISCA, 2015. Google ScholarDigital Library
- J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture,” in ISCA, 2015. Google ScholarDigital Library
- B. Akin, F. Franchetti, and J. C. Hoe, “Data Reorganization in Memory Using 3D-Stacked DRAM,” in ISCA, 2015. Google ScholarDigital Library
- A. Al-Shuwaili and O. Simeone, “Energy-Efficient Resource Allocation for Mobile Edge Computing-Based Augmented Reality Applications,” IEEE Wireless Communications Letters, 2017.Google Scholar
- Alexa Internet, Inc., “Website Traffic, Statistics and Analytics,” http://www.alexa.com/siteinfo/.Google Scholar
- M. Alzantot, Y. Wang, Z. Ren, and M. B. Srivastava, “RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices,” in EMDL, 2017. Google ScholarDigital Library
- ARM Holdings PLC, “ARM Cortex-R8,” https://developer.arm.com/products/processors/cortex-r/cortex-r8.Google Scholar
- N. Binkert, B. Beckman, A. Saidi, G. Black, and A. Basu, “The gem5 Simulator,” Comp. Arch. News, 2011. Google ScholarDigital Library
- J. Bonwick and B. Moore, “ZFS: The Last Word in File Systems,” https://csde.washington.edu/ mbw/OLD/UNIX/zfs_lite.pdf, 2007.Google Scholar
- A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, “LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory,” IEEE CAL, 2017.Google ScholarCross Ref
- F. Bossen, B. Bross, K. Suhring, and D. Flynn, “HEVC Complexity and Implementation Analysis,” IEEE CSVT, 2012. Google ScholarDigital Library
- Q. Cao, N. Balasubramanian, and A. Balasubramanian, “MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU,” in EMDL, 2017. Google ScholarDigital Library
- A. Carroll and G. Heiser, “An Analysis of Power Consumption in a Smartphone,” in USENIX ATC, 2010. Google ScholarDigital Library
- G. Chadha, S. Mahlke, and S. Narayanasamy, “EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications,” in PACT, 2014. Google ScholarDigital Library
- G. Chadha, S. Mahlke, and S. Narayanasamy, “Accelerating Asynchronous Programs Through Event Sneak Peek,” in ISCA, 2015. Google ScholarDigital Library
- D. Chatzopoulos, C. Bermejo, Z. Huang, and P. Hui, “Mobile Augmented Reality Survey: From Where We Are to Where We Go,” IEEE Access, 2017.Google ScholarCross Ref
- Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” JSSC, 2017.Google ScholarCross Ref
- J.-A. Choi and Y.-S. Ho, “Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding,” in PCM, 2008. Google ScholarDigital Library
- C. Chou, P. Nair, and M. K. Qureshi, “Reducing Refresh Power in Mobile Devices with Morphable ECC,” in DSN, 2015. Google ScholarDigital Library
- Chromium Project, “Blink Rendering Engine,” https://www.chromium.org/blink.Google Scholar
- Chromium Project, “Catapult: Telemetry,” https://chromium.googlesource.com/catapult/Google Scholar
- /HEAD/telemetry/README.md.Google Scholar
- Chromium Project, “GPU Rasterization in Chromium,” https://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome, 2014.Google Scholar
- Cisco Systems, Inc., “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016--2021 White Paper,” http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11--520862.html, 2017.Google Scholar
- E. Cuervo, A. Balasubramanian, D. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “MAUI: Making Smartphones Last Longer with Code Offload,” in MobiSys, 2010. Google ScholarDigital Library
- H. Deng, X. Zhu, and Z. Chen, “An Efficient Implementation for H.264 Decoder,” in ICCSIT, 2010.Google Scholar
- T. M. T. Do, J. Blom, and D. Gatica-Perez, “Smartphone Usage in the Wild: A Large-Scale Analysis of Applications and Context,” in ICMI, 2011. Google ScholarDigital Library
- J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca, “The Architecture of the DIVA Processing-in-memory Chip,” in ICS, 2002. Google ScholarDigital Library
- M. Drumond, A. Daglis, N. Mirzadeh, D. Ustiugov, J. Picorel, B. Falsafi, B. Grot, and D. Pnevmatikatos, “The Mondrian Data Engine,” in ISCA, 2017. Google ScholarDigital Library
- P. Dubroy and R. Balakrishnan, “A Study of Tabbed Browsing Among Mozilla Firefox Users,” in CHI, 2010. Google ScholarDigital Library
- eMarketer, Inc., “Slowing Growth Ahead for Worldwide Internet Audience,” https://www.emarketer.com/article/slowing-growth-ahead-worldwide-internet-audience/1014045'soc1001, 2016.Google Scholar
- Ericsson, Inc., “Ericsson Mobility Report: On the Pulse of the Networked Society,” https://www.ericsson.com/res/docs/2015/ericsson-mobility-report-june-2015.pdf, 2015.Google Scholar
- Facebook, Inc., “Instagram,” https://www.instagram.com/.Google Scholar
- M. Gao, G. Ayers, and C. Kozyrakis, “Practical Near-Data Processing for In-Memory Analytics Frameworks,” in PACT, 2015. Google ScholarDigital Library
- M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory,” in ASPLOS, 2017. Google ScholarDigital Library
- S. Ghose, K. Hsieh, A. Boroumand, R. Ausavarungnirun, and O. Mutlu, “Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions,” arxiv {cs.AR}, 2018.Google Scholar
- Google LLC, “Android,” https://www.android.com/.Google Scholar
- Google LLC, “Chrome Browser,” https://www.google.com/chrome/browser/.Google Scholar
- Google LLC, “Chromebook,” https://www.google.com/chromebook/.Google Scholar
- Google LLC, “gemmlowp: a small self-contained low-precision GEMM library,” https://github.com/google/gemmlowp.Google Scholar
- Google LLC, “Gmail,” https://www.google.com/gmail/.Google Scholar
- Google LLC, “Google Calendar,” https://calendar.google.com/.Google Scholar
- Google LLC, “Google Docs,” https://docs.google.com/.Google Scholar
- Google LLC, “Google Hangouts,” https://hangouts.google.com/.Google Scholar
- Google LLC, “Google Photos,” https://photos.google.com/.Google Scholar
- Google LLC, “Google Search,” https://www.google.com/.Google Scholar
- Google LLC, “Google Search: About Google App,” https://www.google.com/search/about/.Google Scholar
- Google LLC, “Google Translate,” https://translate.google.com/.Google Scholar
- Google LLC, “Google Translate App,” https://translate.google.com/intl/en/about/.Google Scholar
- Google LLC, “Skia Graphics Library,” https://skia.org/.Google Scholar
- Google LLC, “TensorFlow: Mobile,” https://www.tensorflow.org/mobile/.Google Scholar
- Google LLC, “YouTube,” https://www.youtube.com/.Google Scholar
- Google LLC, “YouTube for Press,” https://www.youtube.com/yt/about/press/.Google Scholar
- A. Grange, P. de Rivaz, and J. Hunt, “VP9 Bitstream & Decoding Process Specification,” http://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6--20160331-draft.pdf.Google Scholar
- Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti, “3D-Stacked Memory-Side Acceleration: Accelerator and System Design,” in WoNDP, 2014.Google Scholar
- A. Gutierrez, R. G. Dreslinski, T. F. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver, “Full-System Analysis and Characterization of Interactive Smartphone Applications,” in IISWC, 2011. Google ScholarDigital Library
- H. Habli, J. Lilius, and J. Ersfolk, “Analysis of Memory Access Optimization for Motion Compensation Frames in MPEG-4,” in SOC, 2009. Google ScholarDigital Library
- R. Hadidi, L. Nai, H. Kim, and H. Kim, “CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-in-Memory,” ACM TACO, 2017. Google ScholarDigital Library
- M. Halpern, Y. Zhu, and V. J. Reddi, “Mobile CPU's Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction,” in HPCA, 2016.Google ScholarCross Ref
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and B. Dally, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in ISCA, 2016. Google ScholarDigital Library
- K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” in ECCV, 2016.Google ScholarCross Ref
- B. Heater, “As Chromebook Sales Soar in Schools, Apple and Microsoft Fight Back,” https://techcrunch.com/2017/04/27/as-chromebook-sales-soar-in-schools-apple-and-microsoft-fight-back/, 2017.Google Scholar
- M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC Baseline Profile Decoder Complexity Analysis,” CSVT, 2003. Google ScholarDigital Library
- K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Conner, N. Vijaykumar, O. Mutlu, and S. Keckler, “Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems,” in ISCA, 2016. Google ScholarDigital Library
- K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu, “Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation,” in ICCD, 2016.Google ScholarCross Ref
- “HTTP Archive,” http://httparchive.org/.Google Scholar
- Y. Huang, Z. Zha, M. Chen, and L. Zhang, “Moby: A Mobile Benchmark Suite for Architectural Simulators,” in ISPASS, 2014.Google ScholarCross Ref
- D. A. Huffman, “A Method for the Construction of Minimum Redundancy Codes,” Proc. IRE, 1952.Google ScholarCross Ref
- D. Hwang, “Native One-Copy Texture Uploads,” https://01.org/chromium/2016/native-one-copy-texture-uploads-for-chrome-OS, 2016.Google Scholar
- Hybrid Memory Cube Consortium, “HMC Specification 2.0,” 2014.Google Scholar
- Intel Corp., “Intel Celeron Processor N3060,” https://ark.intel.com/products/91832/Intel-Celeron-Processor-N3060--2M-Cache-up-to-2_48-GHz.Google Scholar
- Intel Corp., “Software vs. GPU Rasterization in Chromium,” https://software.intel.com/en-us/articles/software-vs-gpu-rasterization-in-chromium.Google Scholar
- J. Jeddeloh and B. Keeth, “Hybrid Memory Cube New DRAM Architecture Increases Density and Performance,” in VLSIT, 2012.Google ScholarCross Ref
- JEDEC Solid State Technology Assn., “JESD235: High Bandwidth Memory (HBM) DRAM,” 2013.Google Scholar
- S. Jennings, “Transparent Memory Compression in Linux,” https://events.static.linuxfound.org/sites/events/files/slides/tmc_sjennings_linuxcon2013.pdf, 2013.Google Scholar
- E. Kalali and I. Hamzaoglu, “A Low Energy HEVC Sub-Pixel Interpolation Hardware,” in ICIP, 2014.Google ScholarCross Ref
- J. Kane and Q. Yang, “Compression Speed Enhancements to LZO for Multi-Core Systems,” in SBAC-PAD, 2012. Google ScholarDigital Library
- Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System,” in ICCD, 2012.Google ScholarDigital Library
- S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, “GPUs and the Future of Parallel Computing,” IEEE Micro, 2011. Google ScholarDigital Library
- D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “NeuroCube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory,” in ISCA, 2016. Google ScholarDigital Library
- J. S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu, “GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies,” BMC Genomics, 2018.Google ScholarCross Ref
- Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter, “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” in HPCA, 2010.Google Scholar
- Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” in MICRO, 2010. Google ScholarDigital Library
- Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A Fast and Extensible DRAM Simulator,” IEEE CAL, 2015. Google ScholarDigital Library
- P. M. Kogge, “EXECUBE: A New Architecture for Scaleable MPPs,” in ICPP, 1994. Google ScholarDigital Library
- Z. Lai, Y. C. Hu, Y. Cui, L. Sun, and N. Dai, “Furion: Engineering High-Quality Immersive Virtual Reality on Today's Mobile Devices,” in MobiCom, 2017. Google ScholarDigital Library
- M. J. Langroodi, J. Peters, and S. Shirmohammadi, “Decoder-Complexity-Aware Encoding of Motion Compensation for Multiple Heterogeneous Receivers,” TOMM, 2015. Google ScholarDigital Library
- C. Lee and Y. Yu, “Design of a Motion Compensation Unit for H.264 Decoder Using 2-Dimensional Circular Register Files,” in ISOCC, 2008.Google Scholar
- D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu, “Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost,” ACM TACO, 2016. Google ScholarDigital Library
- P. Lewis, “Avoiding Unnecessary Paints,” https://www.html5rocks.com/en/tutorials/speed/unnecessary-paints/, 2013.Google Scholar
- T. Li, C. An, X. Xiao, A. T. Campbell, and X. Zhou, “Real-Time Screen-Camera Communication Behind Any Scene,” in MobiSys, 2015. Google ScholarDigital Library
- F. Liu, P. Shu, H. Jin, L. Ding, J. Yu, D. Niu, and B. Li, “Gearing Resource-Poor Mobile Devices with Powerful Clouds: Architectures, Challenges, and Applications,” IEEE Wireless Communications, 2013.Google Scholar
- G. H. Loh, “3D-Stacked Memory Architectures for Multi-Core Processors,” in ISCA, 2008. Google ScholarDigital Library
- K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” in ISCA, 2000. Google ScholarDigital Library
- Mentor Graphics Corp., “Catapult High-Level Synthesis,” https://www.mentor.com/hls-lp/catapult-high-level-synthesis/.Google Scholar
- Microsoft Corp., “Skype,” https://www.skype.com/.Google Scholar
- A. Mirhosseini, A. Agrawal, and J. Torrellas, “Survive: Pointer-Based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery,” IEEE CAL, 2017.Google ScholarCross Ref
- N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot, “Sort vs. Hash Join Revisited for Near-Memory Execution,” in ASBD, 2007.Google Scholar
- B. Moatamed, Arjun, F. Shahmohammadi, R. Ramezani, A. Naeim, and M. Sarrafzadeh, “Low-Cost Indoor Health Monitoring System,” in BSN, 2016.Google ScholarCross Ref
- A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “CABA: Continuous Authentication Based on BioAura,” IEEE TC, 2017. Google ScholarDigital Library
- A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Wearable Medical Sensor-Based System Design: A Survey,” MSCS, 2017.Google ScholarCross Ref
- S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” in MICRO, 2011. Google ScholarDigital Library
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi, “Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0,” in MICRO, 2007. Google ScholarDigital Library
- N. C. Nachiappan, H. Zhang, J. Ryoo, N. Soundararajan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “VIP: Virtualizing IP Chains on Handheld Platforms,” in ISCA, 2015. Google ScholarDigital Library
- L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim, “GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks,” in HPCA, 2017.Google ScholarCross Ref
- G. Narancic, P. Judd, D. Wu, I. Atta, M. Elnacouzi, J. Zebchuk, J. Albericio, N. E. Jerger, A. Moshovos, K. Kutulakos, and S. Gadelrab, “Evaluating the Memory System Behavior of Smartphone Workloads,” in SAMOS, 2014.Google Scholar
- Net Applications, “Market Share Statistics for Internet Technologies,” https://www.netmarketshare.com/.Google Scholar
- A. M. Nia, M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Energy-Efficient Long-term Continuous Personal Health Monitoring,” MSCS, 2015. Google ScholarDigital Library
- Nielsen Norman Group, “Page Parking: Millennials' Multi-Tab Mania,” https://www.nngroup.com/articles/multi-tab-page-parking/.Google Scholar
- M. F. X. J. Oberhumer, “LZO Real-Time Data Compression Library,” http://www.oberhumer.com/opensource/lzo/, 2018.Google Scholar
- M. Oskin, F. T. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory,” in ISCA, 1998. Google ScholarDigital Library
- D. Pandiyan, S.-Y. Lee, and C.-J. Wu, “Performance, Energy Characterizations and Architectural Implications of an Emerging Mobile Platform Benchmark Suite -- MobileBench,” in IISWC, 2013.Google ScholarCross Ref
- D. Pandiyan and C.-J. Wu, “Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms,” in IISWC, 2014.Google Scholar
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, “A Case for Intelligent RAM,” IEEE Micro, 1997. Google ScholarDigital Library
- A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das, “Scheduling Techniques for GPU Architectures with Processing-in-Memory Capabilities,” in PACT, 2016. Google ScholarDigital Library
- B. Popper, “Google Services Monthly Active Users,” https://www.theverge.com/2017/5/17/15654454/android-reaches-2-billion-monthly-active-users, 2017.Google Scholar
- Qualcomm Technologies, Inc., “Snapdragon 835 Mobile Platform,” https://www.qualcomm.com/products/snapdragon/processors/835.Google Scholar
- B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators,” in ISCA, 2016. Google ScholarDigital Library
- J. Ren and N. Kehtarnavaz, “Comparison of Power Consumption for Motion Compensation and Deblocking Filters in High Definition Video Coding,” in ISCE, 2007.Google ScholarCross Ref
- P. V. Rengasamy, H. Zhang, N. Nachiappan, S. Zhao, A. Sivasubramaniam, M. T. Kandemir, and C. R. Das, “Characterizing Diverse Handheld Apps for Customized Hardware Acceleration,” in IISWC, 2017.Google ScholarCross Ref
- O. Rodeh, J. Bacik, and C. Mason, “BTRFS: The Linux B-Tree Filesystem,” ACM TOS, 2013. Google ScholarDigital Library
- S. Rosen, A. Nikravesh, Y. Guo, Z. M. Mao, F. Qian, and S. Sen, “Revisiting Network Energy Efficiency of Mobile Apps: Performance in the Wild,” in IMC, 2015. Google ScholarDigital Library
- F. Ross, “Migrating to LPDDR3: An Overview of LPDDR3 Commands, Operations, and Functions,” in JEDEC LPDDR3 Symposium, 2012.Google Scholar
- V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Fast Bulk Bitwise AND and OR in DRAM,” CAL, 2015. Google ScholarDigital Library
- V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology,” in MICRO, 2017. Google ScholarDigital Library
- V. Seshadri and O. Mutlu, “The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR,” arXiv:1610.09603 {cs:AR}, 2016.Google Scholar
- V. Seshadri and O. Mutlu, “Simple Operations in Memory to Reduce Data Movement,” in Advances in Computers, Volume 106, 2017.Google Scholar
- D. E. Shaw, S. J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, and J. A. Andrews, “The NON-VON Database Machine: A Brief Overview,” IEEE DEB, 1981.Google Scholar
- D. Shingari, A. Arunkumar, and C.-J. Wu, “Characterization and Throttling-Based Mitigation of Memory Interference for Heterogeneous Smartphones,” in IISWC, 2015. Google ScholarDigital Library
- K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in ICLR, 2015.Google Scholar
- R. Smith, “Apple's A9 SoC Is Dual Sourced From Samsung & TSMC,” https://www.anandtech.com/show/9665/apples-a9-soc-is-dual-sourced-from-samsung-tsmc, 2015.Google Scholar
- J. Stankowski, D. Karwowski, K. Klimaszewski, K. Wegner, O. Stankiewicz, and T. Grajek, “Analysis of the Complexity of the HEVC Motion Estimation,” in IWSSIP, 2016.Google ScholarCross Ref
- H. S. Stone, “A Logic-in-Memory Computer,” IEEE TC, 1970. Google ScholarDigital Library
- R. Sukale, “What Are Reflows and Repaints and How to Avoid Them,” http://javascript.tutorialhorizon.com/2015/06/06/what-are-reflows-and-repaints-and-how-to-avoid-them/, 2015.Google Scholar
- S. Sutardja, “The Future of IC Design Innovation,” in ISSCC, 2015.Google ScholarCross Ref
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” in AAAI, 2017.Google ScholarCross Ref
- X. Tang, O. Kislal, M. Kandemir, and M. Karakoy, “Data Movement Aware Computation Partitioning,” in MICRO, 2017. Google ScholarDigital Library
- TechInsights, “Samsung Galaxy S6,” http://www.techinsights.com/about-techinsights/overview/blog/inside-the-samsung-galaxy-s6/.Google Scholar
- R. Thompson, “Improve Rendering Performance with Dev Tools,” ttps://inviqa.com/blog/improve-rendering-performance-dev-tools, 2014.Google Scholar
- G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen, J. Shor, and M. Covell, “Full Resolution Image Compression with Recurrent Neural Networks,” in CVPR, 2017.Google ScholarCross Ref
- Twitter, Inc., “Twitter,” https://www.twitter.com/.Google Scholar
- E. Vasilakis, “An Instruction Level Energy Characterization of ARM Processors,” Foundation of Research and Technology Hellas, Inst. of Computer Science, Tech. Rep. FORTH-ICS/TR-450, 2015.Google Scholar
- WebM Project, “Hardware: SoCs Supporting VP8/VP9,” http://wiki.webmproject.org/hardware/socs.Google Scholar
- WebM Project, “WebM Repositories -- libvpx: VP8/VP9 Codec SDK,” https://www.webmproject.org/code/.Google Scholar
- WebM Project, “WebM Video Hardware RTLs,” https://www.webmproject.org/hardware/.Google Scholar
- S. Wegner, A. Cowsky, C. Davis, D. James, D. Yang, R. Fontaine, and J. Morrison, “Apple iPhone 7 Teardown,” http://www.techinsights.com/about-techinsights/overview/blog/apple-iphone-7-teardown/, 2016.Google Scholar
- A. Wei, “Qualcomm Snapdragon 835 First to 10 nm,” http://www.techinsights.com/about-techinsights/overview/blog/qualcomm-snapdragon-835-first-to-10-nm/, 2017.Google Scholar
- WordPress Foundation, “WordPress,” https://www.wordpress.com/.Google Scholar
- S. L. Xi, O. Babarinsa, M. Athanassoulis, and S. Idreos, “Beyond the Wall: Near-Data Processing for Databases,” in DaMoN, 2015. Google ScholarDigital Library
- C. Xie, S. L. Song, J. Wang, W. Zhang, and X. Fu, “Processing-in-Memory Enabled Graphics Processors for 3D Rendering,” in HPCA, 2017.Google ScholarCross Ref
- Xiph.Org Foundation, “Derf Video Test Collection,” https://media.xiph.org/video/derf/.Google Scholar
- D. P. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, “TOP-PIM: Throughput-Oriented Programmable Processing in Memory,” in HPDC, 2014. Google ScholarDigital Library
- H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-To-SleepGoogle Scholar
- Content CachingGoogle Scholar
- Display Caching: A Recipe for Energy-eficient Video Streaming on Handhelds,” in MICRO, 2017.Google Scholar
- H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-to-SleepGoogle Scholar
- Content CachingGoogle Scholar
- Display Caching: A Recipe for Energy-Efficient Video Streaming on Handhelds,” in MICRO, 2017.Google Scholar
- X. Zhang, J. Li, H. Wang, D. Xiong, J. Qu, H. Shin, J. P. Kim, and T. Zhang, “Realizing Transparent OS/Apps Compression in Mobile Devices at Zero Latency Overhead,” IEEE TC, 2017.Google ScholarCross Ref
- S. Zhu and K.-K. Ma, “A New Diamond Search Algorithm for Fast Block Matching Motion Estimation,” in ICICS, 1997.Google Scholar
- Y. Zhu and V. J. Reddi, “WebCore: Architectural Support for Mobile Web Browsing,” in ISCA, 2014. Google ScholarDigital Library
- Y. Zhu and V. J. Reddi, “GreenWeb: Language Extensions for Energy-Efficient Mobile Web Computing,” in PLDI, 2016. Google ScholarDigital Library
- J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” TIT, 1977. Google ScholarDigital Library
Index Terms
- Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
Recommendations
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
ASPLOS '18We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the ...
Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation
GLSVLSI '19: Proceedings of the 2019 on Great Lakes Symposium on VLSIToday's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: 1) data access from memory is already a ...
Processing data where it makes sense: Enabling in-memory computation
AbstractToday’s systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from ...
Comments