skip to main content
10.1145/3352460.3358295acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning

Published: 12 October 2019 Publication History

Abstract

We show that selecting a data width for all values in Deep Neural Networks, quantized or not and even if that width is different per layer, amounts to worst-case design. Much shorter data widths can be used if we target the common case by adjusting the data type width at a much finer granularity. We propose ShapeShifter, where we group weights and activations and encode them using a width specific to each group and where typical group sizes vary from 16 to 256 values. The per group widths are selected statically for the weights and dynamically by hardware for the activations. We present two applications of ShapeShifter. In the first, that is applicable to any system, ShapeShifter reduces off- and on-chip storage and communication. This ShapeShifter-based memory compression is simple and low cost yet reduces off-chip traffic to 33% and 36% for 8-bit and 16-bit models respectively. This makes it possible to sustain higher performance for a given off-chip memory interface while also boosting energy efficiency. In the second application, we show how ShapeShifter can be implemented as a surgical extension over designs that exploit variable precision in time.

References

[1]
D. Das, N. Mellempudi, D. Mudigere, D. D. Kalamkar, S. Avancha, K. Banerjee, S. Sridharan, K. Vaidyanathan, B. Kaul, E. Georganas, A. Heinecke, P. Dubey, J. Corbal, N. Shustrov, R. Dubtsov, E. Fomenko, and V. O. Pirogov, "Mixed precision training of convolutional neural networks using integer operations," CoRR, vol. abs/1802.00930, 2018.
[2]
M. Drumond, T. Lin, M. Jaggi, and B. Falsafi, "End-to-end DNN training with block floating point arithmetic," CoRR, vol. abs/1804.01526, 2018.
[3]
S. Migacz, "8-bit inference with tensorrt," 2017. GPU Technology Conference.
[4]
P. Warden, "Low-precision matrix multiplication." https://petewarden.com, 2016.
[5]
M. Courbariaux, Y. Bengio, and J. David, "Low precision arithmetic for deep learning," CoRR, vol. abs/1412.7024, 2014.
[6]
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML'15, pp. 1737--1746, JMLR.org, 2015.
[7]
A. K. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr, "WRPN: wide reduced-precision networks," CoRR, vol. abs/1709.01134, 2017.
[8]
E. Park, J. Ahn, and S. Yoo, "Weighted-entropy-based quantization for deep neural networks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 7197--7205, 2017.
[9]
S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," CoRR, vol. abs/1606.06160, 2016.
[10]
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, pp. 187:1--187:30, 2017.
[11]
S. Kapur, A. K. Mishra, and D. Marr, "Low precision rnns: Quantizing rnns without losing accuracy," CoRR, vol. abs/1710.07706, 2017.
[12]
C. Zhu, S. Han, H. Mao, and W. J. Dally, "Trained ternary quantization," CoRR, vol. abs/1612.01064, 2016.
[13]
F. Li and B. Liu, "Ternary weight networks," CoRR, vol. abs/1605.04711, 2016.
[14]
M. Courbariaux, Y. Bengio, and J.-P. David, "BinaryConnect: Training Deep Neural Networks with binary weights during propagations," ArXiv e-prints, Nov. 2015.
[15]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," CoRR, vol. abs/1603.05279, 2016.
[16]
L. Deng, P. Jiao, J. Pei, Z. Wu, and G. Li, "Gxnor-net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework," Neural Networks, vol. 100, pp. 49--58, 2018.
[17]
M. Kim and P. Smaragdis, "Bitwise neural networks," CoRR, vol. abs/1601.06071, 2016.
[18]
J. Kim, K. Hwang, and W. Sung, "X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7510--7514, May 2014.
[19]
B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, "Envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi," in IEEE Solid-State Circuits Conference (ISSCC), 2017.
[20]
S. Shin, K. Hwang, and W. Sung, "Fixed point performance analysis of recurrent neural networks," CoRR, vol. abs/1512.01322, 2015.
[21]
P. Judd, J. Albericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos, "Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets, arXiv:1511.05236v4 [cs.LG]," arXiv.org, 2015.
[22]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, N. E. Jerger, and A. Moshovos, "Proteus: Exploiting numerical precision variability in deep neural networks," in Proceedings of the 2016 International Conference on Supercomputing, ICS '16, (New York, NY, USA), pp. 23:1--23:12, ACM, 2016.
[23]
J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, "Going deeper with embedded fpga platform for convolutional neural network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, (New York, NY, USA), pp. 26--35, ACM, 2016.
[24]
D. D. Lin, S. S. Talathi, and V. S. Annapureddy, "Fixed point quantization of deep convolutional networks," in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML'16, pp. 2849--2858, JMLR.org, 2016.
[25]
H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh, "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," in ISCA, pp. 764--775, IEEE Computer Society, 2018.
[26]
Z. Song, Z. Liu, and D. Wang, "Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design," in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 816--823, 2018.
[27]
U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Hall, L. Hornof, A. Khosrowshahi, C. Kloss, R. J. Pai, and N. Rao, "Flexpoint: An adaptive numerical format for efficient training of deep neural networks," in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 1740--1750, 2017.
[28]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. Software available from tensorflow.org.
[29]
P. Judd, J. Albericio, T. Hetherington, T. Aamodt, and A. Moshovos, "Stripes: Bit-serial Deep Neural Network Computing," in Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-49, 2016.
[30]
S. Sharify, A. D. Lascorz, P. Judd, and A. Moshovos, "Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks," CoRR, vol. abs/1706.07853, 2017.
[31]
C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. R. Iyer, D. Sylvester, D. T. Blaauw, and R. Das, "Neural cache: Bit-serial in-cache acceleration of deep neural networks," in 45th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2018, Los Angeles, CA, USA, June 1-6, 2018, pp. 383--396, 2018.
[32]
E. Park, D. Kim, and S. Yoo, "Energy-efficient neural network accelerator based on outlier-aware low-precision computation," in ISCA, pp. 688--698, IEEE Computer Society, 2018.
[33]
J. Park, S. Li, W. Wen, P. T. P. Tang, H. Li, Y. Chen, and P. Dubey, "Faster CNNs with Direct Sparse Convolutions and Guided Pruning," in 5th International Conference on Learning Representations (ICLR), 2017.
[34]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," arXiv:1409.0575 [cs], Sept. 2014. arXiv: 1409.0575.
[35]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.
[36]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015.
[37]
Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne, "Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[38]
F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, "Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size," CoRR, vol. abs/1602.07360, 2016.
[39]
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," CoRR, vol. abs/1704.04861, 2017.
[40]
G. J. Brostow, J. Fauqueur, and R. Cipolla, "Semantic object classes in video: A high-definition ground truth database," Pattern Recognition Letters, vol. xx, no. x, pp. xx--xx, 2008.
[41]
V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
[42]
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International Journal of Computer Vision, vol. 111, pp. 98--136, Jan. 2015.
[43]
J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," CoRR, vol. abs/1612.08242, 2016.
[44]
E. Shelhamer, J. Long, and T. Darrell, "Fully convolutional networks for semantic segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 640--651, April 2017.
[45]
S. Roth and M. J. Black, "Fields of experts: a framework for learning image priors," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 2, pp. 860--867 vol. 2, June 2005.
[46]
M. Bevilacqua, A. Roumy, C. Guillemot, and M. Alberi-Morel, "Low-complexity single-image super-resolution based on nonnegative neighbor embedding," in British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012, pp. 1--10, 2012.
[47]
R. Zeyde, M. Elad, and M. Protter, "On single image scale-up using sparse-representations," in Curves and Surfaces (J.-D. Boissonnat, P. Chenin, A. Cohen, C. Gout, T. Lyche, M.-L. Mazure, and L. Schumaker, eds.), (Berlin, Heidelberg), pp. 711--730, Springer Berlin Heidelberg, 2012.
[48]
K. Zhang, W. Zuo, S. Gu, and L. Zhang, "Learning deep cnn denoiser prior for image restoration," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929--3938, 2017.
[49]
D. Li and Z. Wang, "Video superresolution via motion compensation and deep residual learning," IEEE Transactions on Computational Imaging, vol. 3, pp. 749--762, Dec 2017.
[50]
I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS'14, (Cambridge, MA, USA), pp. 3104--3112, MIT Press, 2014.
[51]
T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: common objects in context," CoRR, vol. abs/1405.0312, 2014.
[52]
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description," in CVPR, 2015.
[53]
C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier, "Collecting image annotations using amazon's mechanical turk," in Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10, (Stroudsburg, PA, USA), pp. 139--147, Association for Computational Linguistics, 2010.
[54]
C. Wang, H. Yang, C. Bartz, and C. Meinel, "Image captioning with deep bidirectional lstms," in Proceedings of the 2016 ACM on Multimedia Conference, pp. 988--997, ACM, 2016.
[55]
D. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," CoRR, vol. abs/1511.07289, 2015.
[56]
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026--1034, Dec 2015.
[57]
V. Young, S. Kariyappa, and M. K. Qureshi, "Enabling transparent memorycompression for commodity memory systems," in 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16-20, 2019, pp. 570--581, 2019.
[58]
K. Siu, D. M. Stuart, M. Mahmoud, and A. Moshovos, "Memory requirements for convolutional neural network hardware accelerators," in IEEE International Symposium on Workload Characterization, 2018.
[59]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1106--1114, 2012.
[60]
K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, "Return of the devil in the details: Delving deep into convolutional nets," CoRR, vol. abs/1405.3531, 2014.
[61]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015.
[62]
M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, "Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation," CoRR, vol. abs/1801.04381, 2018.
[63]
Synopsys, "Design Compiler." http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages.
[64]
N. Muralimanohar and R. Balasubramonian, "Cacti 6.0: A tool to understand large caches."
[65]
X. Yang, M. Gao, J. Pu, A. Nayak, Q. Liu, S. Bell, J. Setter, K. Cao, H. Ha, C. Kozyrakis, and M. Horowitz, "DNN dataflow choice is overrated," CoRR, vol. abs/1809.04070, 2018.
[66]
S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," arXiv:1510.00149 [cs], Oct. 2015. arXiv: 1510.00149.
[67]
M. Alwani, H. Chen, M. Ferdman, and P. Milder, "Fused-layer cnn accelerators," in 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.
[68]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, "In-datacenter performance analysis of a tensor processing unit," in Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA '17, (New York, NY, USA), pp. 1--12, ACM, 2017.
[69]
N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective. USA: Addison-Wesley Publishing Company, 4th ed., 2010.
[70]
S. Sharify, A. D. Lascorz, K. Siu, P. Judd, and A. Moshovos, "Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks," in Proceedings of the 55th Annual Design Automation Conference, DAC '18, (New York, NY, USA), pp. 20:1--20:6, ACM, 2018.
[71]
E. Park, D. Kim, and S. Yoo, "Energy-efficient neural network accelerator based on outlier-aware low-precision computation," in ISCA, pp. 688--698, IEEE Computer Society, 2018.
[72]
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "EIE: efficient inference engine on compressed deep neural network," in 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18-22, 2016, pp. 243--254, 2016.
[73]
X. Zhou, Z. Du, Q. Guo, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen, "Cambricon-s: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach," in Proceedings of the 51th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2018.
[74]
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. G. Howard, H. Adam, and D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference," CoRR, vol. abs/1712.05877, 2017.
[75]
E. Park, S. Yoo, and P. Vajda, "Value-aware quantization for training and inference of neural networks," in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV, pp. 608--624, 2018.
[76]
M. Courbariaux, Y. Bengio, and J.-P. David, "Binaryconnect: Training deep neural networks with binary weights during propagations," in Advances in Neural Information Processing Systems, pp. 3123--3131, 2015.
[77]
M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, Y. Kwon, and S. W. Keckler, "Compressing DMA engine: Leveraging activation sparsity for training deep neural networks," in IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24-28, 2018, pp. 78--91, 2018.
[78]
A. Delmas, S. Sharify, P. Judd, and A. Moshovos, "Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability," CoRR, vol. abs/1707.09068, 2017.
[79]
M. Mahmoud, K. Siu, and A. Moshovos, "Diffy: A dÉjÀ vu-free differential deep neural network accelerator," in Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-51, (Piscataway, NJ, USA), pp. 134--147, IEEE Press, 2018.
[80]
N. Kim, T. Park, S. Narayanamoorthy, and H. Asgharimoghaddam, "Multiplier supporting accuracy and energy trade-offs for recognition applications," IET Electronics Letters, vol. 50, no. 7, pp. 512--514, 2014.
[81]
A. Delmas, S. Sharify, P. Judd, M. Nikolic, and A. Moshovos, "Dpred: Making typical activation values matter in deep learning computing," CoRR, vol. abs/1804.06732, 2018.

Cited By

View all
  • (2024)Atalanta: A Bit is Worth a “Thousand” Tensor ValuesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640356(85-102)Online publication date: 27-Apr-2024
  • (2024) 3 A -ReRAM: Adaptive Activation Accumulation in ReRAM-Based CNN Accelerator IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329796843:1(176-188)Online publication date: Jan-2024
  • (2023)Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model InferenceACM Transactions on Architecture and Code Optimization10.1145/361768820:4(1-24)Online publication date: 26-Oct-2023
  • Show More Cited By
  1. ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
    October 2019
    1104 pages
    ISBN:9781450369381
    DOI:10.1145/3352460
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    MICRO '52
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)83
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Atalanta: A Bit is Worth a “Thousand” Tensor ValuesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640356(85-102)Online publication date: 27-Apr-2024
    • (2024) 3 A -ReRAM: Adaptive Activation Accumulation in ReRAM-Based CNN Accelerator IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329796843:1(176-188)Online publication date: Jan-2024
    • (2023)Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model InferenceACM Transactions on Architecture and Code Optimization10.1145/361768820:4(1-24)Online publication date: 26-Oct-2023
    • (2023)Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights GenerationACM Transactions on Design Automation of Electronic Systems10.1145/361167328:6(1-31)Online publication date: 16-Oct-2023
    • (2023)Structured Dynamic Precision for Deep Neural Networks QuantizationACM Transactions on Design Automation of Electronic Systems10.1145/354953528:1(1-24)Online publication date: 20-Jan-2023
    • (2023)Rei: A Reconfigurable Interconnection Unit for Array-Based CNN AcceleratorsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329013811:4(895-906)Online publication date: Oct-2023
    • (2023)Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI ChipsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.322886070:3(1228-1241)Online publication date: Mar-2023
    • (2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
    • (2023)Exploiting bit sparsity in both activation and weight in neural networks acceleratorsIntegration10.1016/j.vlsi.2022.09.00888(400-409)Online publication date: Jan-2023
    • (2022)RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.316744930:7(860-868)Online publication date: Jul-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media