research-article

ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning

Authors:

Alberto Delmás Lascorz,

Dylan Malone Stuart,

Omar Mohamed Awad,

Mostafa Mahmoud,

Andreas MoshovosAuthors Info & Claims

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 28 - 41

https://doi.org/10.1145/3352460.3358295

Published: 12 October 2019 Publication History

Abstract

We show that selecting a data width for all values in Deep Neural Networks, quantized or not and even if that width is different per layer, amounts to worst-case design. Much shorter data widths can be used if we target the common case by adjusting the data type width at a much finer granularity. We propose ShapeShifter, where we group weights and activations and encode them using a width specific to each group and where typical group sizes vary from 16 to 256 values. The per group widths are selected statically for the weights and dynamically by hardware for the activations. We present two applications of ShapeShifter. In the first, that is applicable to any system, ShapeShifter reduces off- and on-chip storage and communication. This ShapeShifter-based memory compression is simple and low cost yet reduces off-chip traffic to 33% and 36% for 8-bit and 16-bit models respectively. This makes it possible to sustain higher performance for a given off-chip memory interface while also boosting energy efficiency. In the second application, we show how ShapeShifter can be implemented as a surgical extension over designs that exploit variable precision in time.

References

[1]

D. Das, N. Mellempudi, D. Mudigere, D. D. Kalamkar, S. Avancha, K. Banerjee, S. Sridharan, K. Vaidyanathan, B. Kaul, E. Georganas, A. Heinecke, P. Dubey, J. Corbal, N. Shustrov, R. Dubtsov, E. Fomenko, and V. O. Pirogov, "Mixed precision training of convolutional neural networks using integer operations," CoRR, vol. abs/1802.00930, 2018.

[2]

M. Drumond, T. Lin, M. Jaggi, and B. Falsafi, "End-to-end DNN training with block floating point arithmetic," CoRR, vol. abs/1804.01526, 2018.

[3]

S. Migacz, "8-bit inference with tensorrt," 2017. GPU Technology Conference.

[4]

P. Warden, "Low-precision matrix multiplication." https://petewarden.com, 2016.

[5]

M. Courbariaux, Y. Bengio, and J. David, "Low precision arithmetic for deep learning," CoRR, vol. abs/1412.7024, 2014.

[6]

S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML'15, pp. 1737--1746, JMLR.org, 2015.

[7]

A. K. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr, "WRPN: wide reduced-precision networks," CoRR, vol. abs/1709.01134, 2017.

[8]

E. Park, J. Ahn, and S. Yoo, "Weighted-entropy-based quantization for deep neural networks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 7197--7205, 2017.

[9]

S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," CoRR, vol. abs/1606.06160, 2016.

[10]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, pp. 187:1--187:30, 2017.

[11]

S. Kapur, A. K. Mishra, and D. Marr, "Low precision rnns: Quantizing rnns without losing accuracy," CoRR, vol. abs/1710.07706, 2017.

[12]

C. Zhu, S. Han, H. Mao, and W. J. Dally, "Trained ternary quantization," CoRR, vol. abs/1612.01064, 2016.

[13]

F. Li and B. Liu, "Ternary weight networks," CoRR, vol. abs/1605.04711, 2016.

[14]

M. Courbariaux, Y. Bengio, and J.-P. David, "BinaryConnect: Training Deep Neural Networks with binary weights during propagations," ArXiv e-prints, Nov. 2015.

[15]

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," CoRR, vol. abs/1603.05279, 2016.

[16]

L. Deng, P. Jiao, J. Pei, Z. Wu, and G. Li, "Gxnor-net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework," Neural Networks, vol. 100, pp. 49--58, 2018.

Digital Library

[17]

M. Kim and P. Smaragdis, "Bitwise neural networks," CoRR, vol. abs/1601.06071, 2016.

[18]

J. Kim, K. Hwang, and W. Sung, "X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7510--7514, May 2014.

[19]

B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, "Envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi," in IEEE Solid-State Circuits Conference (ISSCC), 2017.

[20]

S. Shin, K. Hwang, and W. Sung, "Fixed point performance analysis of recurrent neural networks," CoRR, vol. abs/1512.01322, 2015.

[21]

P. Judd, J. Albericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos, "Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets, arXiv:1511.05236v4 [cs.LG]," arXiv.org, 2015.

[22]

P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, N. E. Jerger, and A. Moshovos, "Proteus: Exploiting numerical precision variability in deep neural networks," in Proceedings of the 2016 International Conference on Supercomputing, ICS '16, (New York, NY, USA), pp. 23:1--23:12, ACM, 2016.

Digital Library

[23]

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, "Going deeper with embedded fpga platform for convolutional neural network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, (New York, NY, USA), pp. 26--35, ACM, 2016.

Digital Library

[24]

D. D. Lin, S. S. Talathi, and V. S. Annapureddy, "Fixed point quantization of deep convolutional networks," in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML'16, pp. 2849--2858, JMLR.org, 2016.

[25]

H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh, "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," in ISCA, pp. 764--775, IEEE Computer Society, 2018.

[26]

Z. Song, Z. Liu, and D. Wang, "Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design," in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 816--823, 2018.

[27]

U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Hall, L. Hornof, A. Khosrowshahi, C. Kloss, R. J. Pai, and N. Rao, "Flexpoint: An adaptive numerical format for efficient training of deep neural networks," in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 1740--1750, 2017.

[28]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. Software available from tensorflow.org.

[29]

P. Judd, J. Albericio, T. Hetherington, T. Aamodt, and A. Moshovos, "Stripes: Bit-serial Deep Neural Network Computing," in Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-49, 2016.

[30]

S. Sharify, A. D. Lascorz, P. Judd, and A. Moshovos, "Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks," CoRR, vol. abs/1706.07853, 2017.

[31]

C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. R. Iyer, D. Sylvester, D. T. Blaauw, and R. Das, "Neural cache: Bit-serial in-cache acceleration of deep neural networks," in 45th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2018, Los Angeles, CA, USA, June 1-6, 2018, pp. 383--396, 2018.

[32]

E. Park, D. Kim, and S. Yoo, "Energy-efficient neural network accelerator based on outlier-aware low-precision computation," in ISCA, pp. 688--698, IEEE Computer Society, 2018.

[33]

J. Park, S. Li, W. Wen, P. T. P. Tang, H. Li, Y. Chen, and P. Dubey, "Faster CNNs with Direct Sparse Convolutions and Guided Pruning," in 5th International Conference on Learning Representations (ICLR), 2017.

[34]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," arXiv:1409.0575 [cs], Sept. 2014. arXiv: 1409.0575.

[35]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.

[36]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015.

[37]

Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne, "Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[38]

F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, "Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size," CoRR, vol. abs/1602.07360, 2016.

[39]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," CoRR, vol. abs/1704.04861, 2017.

[40]

G. J. Brostow, J. Fauqueur, and R. Cipolla, "Semantic object classes in video: A high-definition ground truth database," Pattern Recognition Letters, vol. xx, no. x, pp. xx--xx, 2008.

[41]

V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

[42]

M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International Journal of Computer Vision, vol. 111, pp. 98--136, Jan. 2015.

Digital Library

[43]

J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," CoRR, vol. abs/1612.08242, 2016.

[44]

E. Shelhamer, J. Long, and T. Darrell, "Fully convolutional networks for semantic segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 640--651, April 2017.

Digital Library

[45]

S. Roth and M. J. Black, "Fields of experts: a framework for learning image priors," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 2, pp. 860--867 vol. 2, June 2005.

[46]

M. Bevilacqua, A. Roumy, C. Guillemot, and M. Alberi-Morel, "Low-complexity single-image super-resolution based on nonnegative neighbor embedding," in British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012, pp. 1--10, 2012.

[47]

R. Zeyde, M. Elad, and M. Protter, "On single image scale-up using sparse-representations," in Curves and Surfaces (J.-D. Boissonnat, P. Chenin, A. Cohen, C. Gout, T. Lyche, M.-L. Mazure, and L. Schumaker, eds.), (Berlin, Heidelberg), pp. 711--730, Springer Berlin Heidelberg, 2012.

[48]

K. Zhang, W. Zuo, S. Gu, and L. Zhang, "Learning deep cnn denoiser prior for image restoration," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929--3938, 2017.

[49]

D. Li and Z. Wang, "Video superresolution via motion compensation and deep residual learning," IEEE Transactions on Computational Imaging, vol. 3, pp. 749--762, Dec 2017.

[50]

I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS'14, (Cambridge, MA, USA), pp. 3104--3112, MIT Press, 2014.

[51]

T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: common objects in context," CoRR, vol. abs/1405.0312, 2014.

[52]

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description," in CVPR, 2015.

[53]

C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier, "Collecting image annotations using amazon's mechanical turk," in Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10, (Stroudsburg, PA, USA), pp. 139--147, Association for Computational Linguistics, 2010.

[54]

C. Wang, H. Yang, C. Bartz, and C. Meinel, "Image captioning with deep bidirectional lstms," in Proceedings of the 2016 ACM on Multimedia Conference, pp. 988--997, ACM, 2016.

[55]

D. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," CoRR, vol. abs/1511.07289, 2015.

[56]

K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026--1034, Dec 2015.

[57]

V. Young, S. Kariyappa, and M. K. Qureshi, "Enabling transparent memorycompression for commodity memory systems," in 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16-20, 2019, pp. 570--581, 2019.

[58]

K. Siu, D. M. Stuart, M. Mahmoud, and A. Moshovos, "Memory requirements for convolutional neural network hardware accelerators," in IEEE International Symposium on Workload Characterization, 2018.

[59]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1106--1114, 2012.

[60]

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, "Return of the devil in the details: Delving deep into convolutional nets," CoRR, vol. abs/1405.3531, 2014.

[61]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015.

[62]

M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, "Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation," CoRR, vol. abs/1801.04381, 2018.

[63]

Synopsys, "Design Compiler." http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages.

[64]

N. Muralimanohar and R. Balasubramonian, "Cacti 6.0: A tool to understand large caches."

[65]

X. Yang, M. Gao, J. Pu, A. Nayak, Q. Liu, S. Bell, J. Setter, K. Cao, H. Ha, C. Kozyrakis, and M. Horowitz, "DNN dataflow choice is overrated," CoRR, vol. abs/1809.04070, 2018.

[66]

S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," arXiv:1510.00149 [cs], Oct. 2015. arXiv: 1510.00149.

[67]

M. Alwani, H. Chen, M. Ferdman, and P. Milder, "Fused-layer cnn accelerators," in 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.

[68]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, "In-datacenter performance analysis of a tensor processing unit," in Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA '17, (New York, NY, USA), pp. 1--12, ACM, 2017.

Digital Library

[69]

N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective. USA: Addison-Wesley Publishing Company, 4th ed., 2010.

[70]

S. Sharify, A. D. Lascorz, K. Siu, P. Judd, and A. Moshovos, "Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks," in Proceedings of the 55th Annual Design Automation Conference, DAC '18, (New York, NY, USA), pp. 20:1--20:6, ACM, 2018.

Digital Library

[71]

E. Park, D. Kim, and S. Yoo, "Energy-efficient neural network accelerator based on outlier-aware low-precision computation," in ISCA, pp. 688--698, IEEE Computer Society, 2018.

[72]

S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "EIE: efficient inference engine on compressed deep neural network," in 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18-22, 2016, pp. 243--254, 2016.

[73]

X. Zhou, Z. Du, Q. Guo, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen, "Cambricon-s: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach," in Proceedings of the 51th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2018.

[74]

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. G. Howard, H. Adam, and D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference," CoRR, vol. abs/1712.05877, 2017.

[75]

E. Park, S. Yoo, and P. Vajda, "Value-aware quantization for training and inference of neural networks," in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV, pp. 608--624, 2018.

[76]

M. Courbariaux, Y. Bengio, and J.-P. David, "Binaryconnect: Training deep neural networks with binary weights during propagations," in Advances in Neural Information Processing Systems, pp. 3123--3131, 2015.

[77]

M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, Y. Kwon, and S. W. Keckler, "Compressing DMA engine: Leveraging activation sparsity for training deep neural networks," in IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24-28, 2018, pp. 78--91, 2018.

[78]

A. Delmas, S. Sharify, P. Judd, and A. Moshovos, "Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability," CoRR, vol. abs/1707.09068, 2017.

[79]

M. Mahmoud, K. Siu, and A. Moshovos, "Diffy: A dÉjÀ vu-free differential deep neural network accelerator," in Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-51, (Piscataway, NJ, USA), pp. 134--147, IEEE Press, 2018.

[80]

N. Kim, T. Park, S. Narayanamoorthy, and H. Asgharimoghaddam, "Multiplier supporting accuracy and energy trade-offs for recognition applications," IET Electronics Letters, vol. 50, no. 7, pp. 512--514, 2014.

[81]

A. Delmas, S. Sharify, P. Judd, M. Nikolic, and A. Moshovos, "Dpred: Making typical activation values matter in deep learning computing," CoRR, vol. abs/1804.06732, 2018.

Cited By

Lascorz AMahmoud MZadeh ANikolic MIbrahim KGiannoula CAbdelhadi AMoshovos ATsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Atalanta: A Bit is Worth a “Thousand” Tensor ValuesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640356(85-102)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640356
Zhang ZJiang JWang QMao ZJing N(2024) 3 A -ReRAM: Adaptive Activation Accumulation in ReRAM-Based CNN Accelerator IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329796843:1(176-188)Online publication date: Jan-2024
https://doi.org/10.1109/TCAD.2023.3297968
Wu DYang WZou XXia WLi SHu ZZhang WFang B(2023)Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model InferenceACM Transactions on Architecture and Code Optimization10.1145/361768820:4(1-24)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3617688
Show More Cited By

ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Shapeshifter: Dynamically changing pipeline width and speed to address process variations
MICRO 41: Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

Process variations are a manufacturing phenomenon that result in some parameters of the transistors in a real chip to be different from those specified in the design. One impact of these variations is that the affected circuits may perform faster or ...
Shapeshifter: Intelligence-driven data plane randomization resilient to data-oriented programming attacks
Abstract
Non-control data attacks are becoming an increasingly major threat to cyber security. Specifically, data-oriented programming (DOP) attacks manipulate the non-control data in the target program to achieve malicious goals without ...
Transformations techniques for extracting parallelism in non-uniform nested loops

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 2019

1104 pages

ISBN:9781450369381

DOI:10.1145/3352460

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada

Conference

MICRO '52

Sponsor:

SIGMICRO

MICRO '52: The 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 12 - 16, 2019

OH, Columbus, USA

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
1,574
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)6

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lascorz AMahmoud MZadeh ANikolic MIbrahim KGiannoula CAbdelhadi AMoshovos ATsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Atalanta: A Bit is Worth a “Thousand” Tensor ValuesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640356(85-102)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640356
Zhang ZJiang JWang QMao ZJing N(2024) 3 A -ReRAM: Adaptive Activation Accumulation in ReRAM-Based CNN Accelerator IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329796843:1(176-188)Online publication date: Jan-2024
https://doi.org/10.1109/TCAD.2023.3297968
Wu DYang WZou XXia WLi SHu ZZhang WFang B(2023)Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model InferenceACM Transactions on Architecture and Code Optimization10.1145/361768820:4(1-24)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3617688
Venieris SFernandez-Marques JLane N(2023)Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights GenerationACM Transactions on Design Automation of Electronic Systems10.1145/361167328:6(1-31)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3611673
Huang KLi BXiong DJiang HJiang XYan XClaesen LLiu DChen JLiu Z(2023)Structured Dynamic Precision for Deep Neural Networks QuantizationACM Transactions on Design Automation of Electronic Systems10.1145/354953528:1(1-24)Online publication date: 20-Jan-2023
https://dl.acm.org/doi/10.1145/3549535
Darbani PBeitollahi HLotfi-Kamran P(2023)Rei: A Reconfigurable Interconnection Unit for Array-Based CNN AcceleratorsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329013811:4(895-906)Online publication date: Oct-2023
https://doi.org/10.1109/TETC.2023.3290138
Wei SLin XTu FWang YLiu LYin S(2023)Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI ChipsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.322886070:3(1228-1241)Online publication date: Mar-2023
https://doi.org/10.1109/TCSI.2022.3228860
Shuvo MIslam SCheng JMorshed B(2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
https://doi.org/10.1109/JPROC.2022.3226481
Jing NZhang ZSun YLiu PChen LWang QJiang J(2023)Exploiting bit sparsity in both activation and weight in neural networks acceleratorsIntegration10.1016/j.vlsi.2022.09.00888(400-409)Online publication date: Jan-2023
https://doi.org/10.1016/j.vlsi.2022.09.008
Darbani PRohbani NBeitollahi HLotfi-Kamran P(2022)RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.316744930:7(860-868)Online publication date: Jul-2022
https://doi.org/10.1109/TVLSI.2022.3167449
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten