skip to main content
10.1145/3352460.3358295acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning

Published:12 October 2019Publication History

ABSTRACT

We show that selecting a data width for all values in Deep Neural Networks, quantized or not and even if that width is different per layer, amounts to worst-case design. Much shorter data widths can be used if we target the common case by adjusting the data type width at a much finer granularity. We propose ShapeShifter, where we group weights and activations and encode them using a width specific to each group and where typical group sizes vary from 16 to 256 values. The per group widths are selected statically for the weights and dynamically by hardware for the activations. We present two applications of ShapeShifter. In the first, that is applicable to any system, ShapeShifter reduces off- and on-chip storage and communication. This ShapeShifter-based memory compression is simple and low cost yet reduces off-chip traffic to 33% and 36% for 8-bit and 16-bit models respectively. This makes it possible to sustain higher performance for a given off-chip memory interface while also boosting energy efficiency. In the second application, we show how ShapeShifter can be implemented as a surgical extension over designs that exploit variable precision in time.

References

  1. D. Das, N. Mellempudi, D. Mudigere, D. D. Kalamkar, S. Avancha, K. Banerjee, S. Sridharan, K. Vaidyanathan, B. Kaul, E. Georganas, A. Heinecke, P. Dubey, J. Corbal, N. Shustrov, R. Dubtsov, E. Fomenko, and V. O. Pirogov, "Mixed precision training of convolutional neural networks using integer operations," CoRR, vol. abs/1802.00930, 2018.Google ScholarGoogle Scholar
  2. M. Drumond, T. Lin, M. Jaggi, and B. Falsafi, "End-to-end DNN training with block floating point arithmetic," CoRR, vol. abs/1804.01526, 2018.Google ScholarGoogle Scholar
  3. S. Migacz, "8-bit inference with tensorrt," 2017. GPU Technology Conference.Google ScholarGoogle Scholar
  4. P. Warden, "Low-precision matrix multiplication." https://petewarden.com, 2016.Google ScholarGoogle Scholar
  5. M. Courbariaux, Y. Bengio, and J. David, "Low precision arithmetic for deep learning," CoRR, vol. abs/1412.7024, 2014.Google ScholarGoogle Scholar
  6. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML'15, pp. 1737--1746, JMLR.org, 2015.Google ScholarGoogle Scholar
  7. A. K. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr, "WRPN: wide reduced-precision networks," CoRR, vol. abs/1709.01134, 2017.Google ScholarGoogle Scholar
  8. E. Park, J. Ahn, and S. Yoo, "Weighted-entropy-based quantization for deep neural networks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 7197--7205, 2017.Google ScholarGoogle Scholar
  9. S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," CoRR, vol. abs/1606.06160, 2016.Google ScholarGoogle Scholar
  10. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, pp. 187:1--187:30, 2017.Google ScholarGoogle Scholar
  11. S. Kapur, A. K. Mishra, and D. Marr, "Low precision rnns: Quantizing rnns without losing accuracy," CoRR, vol. abs/1710.07706, 2017.Google ScholarGoogle Scholar
  12. C. Zhu, S. Han, H. Mao, and W. J. Dally, "Trained ternary quantization," CoRR, vol. abs/1612.01064, 2016.Google ScholarGoogle Scholar
  13. F. Li and B. Liu, "Ternary weight networks," CoRR, vol. abs/1605.04711, 2016.Google ScholarGoogle Scholar
  14. M. Courbariaux, Y. Bengio, and J.-P. David, "BinaryConnect: Training Deep Neural Networks with binary weights during propagations," ArXiv e-prints, Nov. 2015.Google ScholarGoogle Scholar
  15. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," CoRR, vol. abs/1603.05279, 2016.Google ScholarGoogle Scholar
  16. L. Deng, P. Jiao, J. Pei, Z. Wu, and G. Li, "Gxnor-net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework," Neural Networks, vol. 100, pp. 49--58, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Kim and P. Smaragdis, "Bitwise neural networks," CoRR, vol. abs/1601.06071, 2016.Google ScholarGoogle Scholar
  18. J. Kim, K. Hwang, and W. Sung, "X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7510--7514, May 2014.Google ScholarGoogle Scholar
  19. B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, "Envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi," in IEEE Solid-State Circuits Conference (ISSCC), 2017.Google ScholarGoogle Scholar
  20. S. Shin, K. Hwang, and W. Sung, "Fixed point performance analysis of recurrent neural networks," CoRR, vol. abs/1512.01322, 2015.Google ScholarGoogle Scholar
  21. P. Judd, J. Albericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos, "Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets, arXiv:1511.05236v4 [cs.LG]," arXiv.org, 2015.Google ScholarGoogle Scholar
  22. P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, N. E. Jerger, and A. Moshovos, "Proteus: Exploiting numerical precision variability in deep neural networks," in Proceedings of the 2016 International Conference on Supercomputing, ICS '16, (New York, NY, USA), pp. 23:1--23:12, ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, "Going deeper with embedded fpga platform for convolutional neural network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, (New York, NY, USA), pp. 26--35, ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. D. Lin, S. S. Talathi, and V. S. Annapureddy, "Fixed point quantization of deep convolutional networks," in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML'16, pp. 2849--2858, JMLR.org, 2016.Google ScholarGoogle Scholar
  25. H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh, "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," in ISCA, pp. 764--775, IEEE Computer Society, 2018.Google ScholarGoogle Scholar
  26. Z. Song, Z. Liu, and D. Wang, "Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design," in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 816--823, 2018.Google ScholarGoogle Scholar
  27. U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Hall, L. Hornof, A. Khosrowshahi, C. Kloss, R. J. Pai, and N. Rao, "Flexpoint: An adaptive numerical format for efficient training of deep neural networks," in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 1740--1750, 2017.Google ScholarGoogle Scholar
  28. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. Software available from tensorflow.org.Google ScholarGoogle Scholar
  29. P. Judd, J. Albericio, T. Hetherington, T. Aamodt, and A. Moshovos, "Stripes: Bit-serial Deep Neural Network Computing," in Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-49, 2016.Google ScholarGoogle Scholar
  30. S. Sharify, A. D. Lascorz, P. Judd, and A. Moshovos, "Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks," CoRR, vol. abs/1706.07853, 2017.Google ScholarGoogle Scholar
  31. C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. R. Iyer, D. Sylvester, D. T. Blaauw, and R. Das, "Neural cache: Bit-serial in-cache acceleration of deep neural networks," in 45th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2018, Los Angeles, CA, USA, June 1-6, 2018, pp. 383--396, 2018.Google ScholarGoogle Scholar
  32. E. Park, D. Kim, and S. Yoo, "Energy-efficient neural network accelerator based on outlier-aware low-precision computation," in ISCA, pp. 688--698, IEEE Computer Society, 2018.Google ScholarGoogle Scholar
  33. J. Park, S. Li, W. Wen, P. T. P. Tang, H. Li, Y. Chen, and P. Dubey, "Faster CNNs with Direct Sparse Convolutions and Guided Pruning," in 5th International Conference on Learning Representations (ICLR), 2017.Google ScholarGoogle Scholar
  34. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," arXiv:1409.0575 [cs], Sept. 2014. arXiv: 1409.0575.Google ScholarGoogle Scholar
  35. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  36. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015.Google ScholarGoogle Scholar
  37. Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne, "Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.Google ScholarGoogle Scholar
  38. F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, "Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size," CoRR, vol. abs/1602.07360, 2016.Google ScholarGoogle Scholar
  39. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," CoRR, vol. abs/1704.04861, 2017.Google ScholarGoogle Scholar
  40. G. J. Brostow, J. Fauqueur, and R. Cipolla, "Semantic object classes in video: A high-definition ground truth database," Pattern Recognition Letters, vol. xx, no. x, pp. xx--xx, 2008.Google ScholarGoogle Scholar
  41. V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  42. M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International Journal of Computer Vision, vol. 111, pp. 98--136, Jan. 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," CoRR, vol. abs/1612.08242, 2016.Google ScholarGoogle Scholar
  44. E. Shelhamer, J. Long, and T. Darrell, "Fully convolutional networks for semantic segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 640--651, April 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Roth and M. J. Black, "Fields of experts: a framework for learning image priors," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 2, pp. 860--867 vol. 2, June 2005.Google ScholarGoogle Scholar
  46. M. Bevilacqua, A. Roumy, C. Guillemot, and M. Alberi-Morel, "Low-complexity single-image super-resolution based on nonnegative neighbor embedding," in British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012, pp. 1--10, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  47. R. Zeyde, M. Elad, and M. Protter, "On single image scale-up using sparse-representations," in Curves and Surfaces (J.-D. Boissonnat, P. Chenin, A. Cohen, C. Gout, T. Lyche, M.-L. Mazure, and L. Schumaker, eds.), (Berlin, Heidelberg), pp. 711--730, Springer Berlin Heidelberg, 2012.Google ScholarGoogle Scholar
  48. K. Zhang, W. Zuo, S. Gu, and L. Zhang, "Learning deep cnn denoiser prior for image restoration," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929--3938, 2017.Google ScholarGoogle Scholar
  49. D. Li and Z. Wang, "Video superresolution via motion compensation and deep residual learning," IEEE Transactions on Computational Imaging, vol. 3, pp. 749--762, Dec 2017.Google ScholarGoogle ScholarCross RefCross Ref
  50. I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS'14, (Cambridge, MA, USA), pp. 3104--3112, MIT Press, 2014.Google ScholarGoogle Scholar
  51. T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: common objects in context," CoRR, vol. abs/1405.0312, 2014.Google ScholarGoogle Scholar
  52. J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description," in CVPR, 2015.Google ScholarGoogle Scholar
  53. C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier, "Collecting image annotations using amazon's mechanical turk," in Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10, (Stroudsburg, PA, USA), pp. 139--147, Association for Computational Linguistics, 2010.Google ScholarGoogle Scholar
  54. C. Wang, H. Yang, C. Bartz, and C. Meinel, "Image captioning with deep bidirectional lstms," in Proceedings of the 2016 ACM on Multimedia Conference, pp. 988--997, ACM, 2016.Google ScholarGoogle Scholar
  55. D. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," CoRR, vol. abs/1511.07289, 2015.Google ScholarGoogle Scholar
  56. K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026--1034, Dec 2015.Google ScholarGoogle Scholar
  57. V. Young, S. Kariyappa, and M. K. Qureshi, "Enabling transparent memorycompression for commodity memory systems," in 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16-20, 2019, pp. 570--581, 2019.Google ScholarGoogle Scholar
  58. K. Siu, D. M. Stuart, M. Mahmoud, and A. Moshovos, "Memory requirements for convolutional neural network hardware accelerators," in IEEE International Symposium on Workload Characterization, 2018.Google ScholarGoogle Scholar
  59. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1106--1114, 2012.Google ScholarGoogle Scholar
  60. K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, "Return of the devil in the details: Delving deep into convolutional nets," CoRR, vol. abs/1405.3531, 2014.Google ScholarGoogle Scholar
  61. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015.Google ScholarGoogle Scholar
  62. M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, "Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation," CoRR, vol. abs/1801.04381, 2018.Google ScholarGoogle Scholar
  63. Synopsys, "Design Compiler." http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages.Google ScholarGoogle Scholar
  64. N. Muralimanohar and R. Balasubramonian, "Cacti 6.0: A tool to understand large caches."Google ScholarGoogle Scholar
  65. X. Yang, M. Gao, J. Pu, A. Nayak, Q. Liu, S. Bell, J. Setter, K. Cao, H. Ha, C. Kozyrakis, and M. Horowitz, "DNN dataflow choice is overrated," CoRR, vol. abs/1809.04070, 2018.Google ScholarGoogle Scholar
  66. S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," arXiv:1510.00149 [cs], Oct. 2015. arXiv: 1510.00149.Google ScholarGoogle Scholar
  67. M. Alwani, H. Chen, M. Ferdman, and P. Milder, "Fused-layer cnn accelerators," in 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.Google ScholarGoogle Scholar
  68. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, "In-datacenter performance analysis of a tensor processing unit," in Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA '17, (New York, NY, USA), pp. 1--12, ACM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective. USA: Addison-Wesley Publishing Company, 4th ed., 2010.Google ScholarGoogle Scholar
  70. S. Sharify, A. D. Lascorz, K. Siu, P. Judd, and A. Moshovos, "Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks," in Proceedings of the 55th Annual Design Automation Conference, DAC '18, (New York, NY, USA), pp. 20:1--20:6, ACM, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. E. Park, D. Kim, and S. Yoo, "Energy-efficient neural network accelerator based on outlier-aware low-precision computation," in ISCA, pp. 688--698, IEEE Computer Society, 2018.Google ScholarGoogle Scholar
  72. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "EIE: efficient inference engine on compressed deep neural network," in 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18-22, 2016, pp. 243--254, 2016.Google ScholarGoogle Scholar
  73. X. Zhou, Z. Du, Q. Guo, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen, "Cambricon-s: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach," in Proceedings of the 51th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2018.Google ScholarGoogle Scholar
  74. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. G. Howard, H. Adam, and D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference," CoRR, vol. abs/1712.05877, 2017.Google ScholarGoogle Scholar
  75. E. Park, S. Yoo, and P. Vajda, "Value-aware quantization for training and inference of neural networks," in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV, pp. 608--624, 2018.Google ScholarGoogle Scholar
  76. M. Courbariaux, Y. Bengio, and J.-P. David, "Binaryconnect: Training deep neural networks with binary weights during propagations," in Advances in Neural Information Processing Systems, pp. 3123--3131, 2015.Google ScholarGoogle Scholar
  77. M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, Y. Kwon, and S. W. Keckler, "Compressing DMA engine: Leveraging activation sparsity for training deep neural networks," in IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24-28, 2018, pp. 78--91, 2018.Google ScholarGoogle Scholar
  78. A. Delmas, S. Sharify, P. Judd, and A. Moshovos, "Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability," CoRR, vol. abs/1707.09068, 2017.Google ScholarGoogle Scholar
  79. M. Mahmoud, K. Siu, and A. Moshovos, "Diffy: A dÉjÀ vu-free differential deep neural network accelerator," in Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-51, (Piscataway, NJ, USA), pp. 134--147, IEEE Press, 2018.Google ScholarGoogle Scholar
  80. N. Kim, T. Park, S. Narayanamoorthy, and H. Asgharimoghaddam, "Multiplier supporting accuracy and energy trade-offs for recognition applications," IET Electronics Letters, vol. 50, no. 7, pp. 512--514, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  81. A. Delmas, S. Sharify, P. Judd, M. Nikolic, and A. Moshovos, "Dpred: Making typical activation values matter in deep learning computing," CoRR, vol. abs/1804.06732, 2018.Google ScholarGoogle Scholar
  1. ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
      October 2019
      1104 pages
      ISBN:9781450369381
      DOI:10.1145/3352460

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader