research-article

Free access

CoCoPIE: enabling real-time AI on off-the-shelf mobile devices via compression-compilation co-design

Authors:

Hui Guan,

Shaoshan Liu,

Xiaolong Ma,

Wei Niu,

Bin Ren,

Xipeng Shen,

Yanzhi Wang,

Pu ZhaoAuthors Info & Claims

Communications of the ACM, Volume 64, Issue 6

Pages 62 - 68

https://doi.org/10.1145/3418297

Published: 24 May 2021 Publication History

All formats PDF

Abstract

A new framework allows intelligence on mainstream end devices without special hardware.

References

[1]

Alibaba. 2019.

Google Scholar

[2]

Chen, T. et al. TVM: An automated end-toend optimizing compiler for deep learning. In 13^th USENIX Symposium on Operating Systems Design and Implementation, 2018, 578--594.

Google Scholar

[3]

Chen, Y., Krishna, T., Emer, J., and Sze, V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In Proceedings of IEEE Intern. Solid-State Circuits Conf. Digest of Technical Papers, 2016, 262--263.

Crossref

Google Scholar

[4]

Dong, C., Loy, C., He, K., and Tang, X. Learning a deep convolutional network for image super-resolution. In European Conf. Computer Vision. Springer, 2014, 184--199.

Crossref

Google Scholar

[5]

Gatys, L., Ecker, A., and Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition, 2016, 2414--2423.

Crossref

Google Scholar

[6]

Google. Tensorflow lite, 2019.

Google Scholar

[7]

Google Cloud TPU. Google cloud TPU, 2017; https://cloud.google.com/tpu/

Google Scholar

[8]

Guan, H., Shen, X., and Lim, S. Wootz: A compiler-based framework for fast CNN pruning via composability. In Proceedings of the Programming Language Design and Implementation, 2019.

Digital Library

Google Scholar

[9]

Han, S. et al. Ese: Efficient speech recognition engine with sparse LSTM on FPGA. FPGA, 2017, 75--84.

Google Scholar

[10]

He, Y., Zhang, X., and Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the 2017 IEEE Intern. Conf. on Computer Vision. 2017, 1398--1406.

Crossref

Google Scholar

[11]

Iizuka, S., Simo-Serra, E., and Ishikawa, H. Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graphics 3, 4 (July 2016).

Google Scholar

[12]

Lebedev, V. and Lempitsky, V. Fast convnets using group-wise brain damage. In Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition, 2016, 2554--2564.

Crossref

Google Scholar

[13]

Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. Pruning filters for efficient convnets. In Proceedings of the Intern. Conf. on Learning Representations, 2017.

Google Scholar

[14]

Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, L. Microsoft coco: Common objects in context. In Proceedings in European Conf. on Computer Vision. Springer, 2014, 740--755.

Crossref

Google Scholar

[15]

Ma, X. et al. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. AAAI, 2020.

Google Scholar

[16]

Mao, H., Han, S., Pool, J., Li, W., Liu, X., Wang, Y., and Dally, W. Exploring the regularity of sparse structure in convolutional neural networks. 2017; arXiv:1705.08922, 2017.

Google Scholar

[17]

Nevill-Manning, C. and Witten, I. Identifying hierarchical structure in sequences: A linear-time algorithm. J. Artif. Intell. Res. 7 (1997), 67--82.

Crossref

Google Scholar

[18]

Niu, W., Ma, X., Lin, S., Wang, S., Qian, X., Lin, X., Wang, Y., and Ren, B. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. ASPLOS, 2020.

Digital Library

Google Scholar

[19]

Timofte, R., Agustsson, E., Gool, L., Yang, M., and Zhang, L. Ntire challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition Workshops, 2017, 114--125.

Google Scholar

[20]

Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems, 2016, 2074--2082.

Google Scholar

[21]

Yu, J., Fan, Y., Yang, J., Xu, N., Wang, Z., Wang, X., and Huang, T. Wide activation for efficient and accurate image super-resolution. 2018; arXiv:1808.08718.

Google Scholar

[22]

Zhan, Z. et al. Priv: A privacy-preserving deep neural network model compression framework. arXiv preprint, 2020.

Google Scholar

[23]

Zhang, H. and Dana, K. Multi-style generative network for real-time transfer. 2017; arXiv:1703.06953.

Google Scholar

[24]

Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., and Oliva, A. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems, 2014, 487--495.

Google Scholar

Cited By

View all

Xu DZhang HYang LLiu RHuang GXu MLiu XEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Fast On-device LLM Inference with NPUsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707239(445-462)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707239
Yang TYang TLiu AAn NLiu SLiu X(2024)AICOM-MP: an AI-based monkeypox detector for resource-constrained environmentsConnection Science10.1080/09540091.2024.230696236:1Online publication date: 6-Feb-2024
https://doi.org/10.1080/09540091.2024.2306962
Zhang HXing MWu YZhao C(2023)Compiler Technologies in Deep Learning Co-Design: A SurveyIntelligent Computing10.34133/icomputing.00402Online publication date: 19-Jun-2023
https://doi.org/10.34133/icomputing.0040
Show More Cited By

Index Terms

CoCoPIE: enabling real-time AI on off-the-shelf mobile devices via compression-compilation co-design

Recommendations

Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular Papers

In this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Clock Power Analysis of Low Power Clock Gated Arithmetic Logic Unit on Different FPGA
CICN '14: Proceedings of the 2014 International Conference on Computational Intelligence and Communication Networks

This paper, deals with Latch Free Clock Gating technique for reduction of clock power and total power consumption in Low Power Arithmetic and Logic Unit and we have analysed power reduction on different FPGA devices. Without latch free clock gating ...
Embedded SoPC Design with Nios II Processor and Verilog Examples

Comments

Information & Contributors

Information

Published In

Communications of the ACM Volume 64, Issue 6

June 2021

106 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/3467845

Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 2021

Published in CACM Volume 64, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
11,255
Total Downloads

Downloads (Last 12 months)545
Downloads (Last 6 weeks)60

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Xu DZhang HYang LLiu RHuang GXu MLiu XEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Fast On-device LLM Inference with NPUsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707239(445-462)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707239
Yang TYang TLiu AAn NLiu SLiu X(2024)AICOM-MP: an AI-based monkeypox detector for resource-constrained environmentsConnection Science10.1080/09540091.2024.230696236:1Online publication date: 6-Feb-2024
https://doi.org/10.1080/09540091.2024.2306962
Zhang HXing MWu YZhao C(2023)Compiler Technologies in Deep Learning Co-Design: A SurveyIntelligent Computing10.34133/icomputing.00402Online publication date: 19-Jun-2023
https://doi.org/10.34133/icomputing.0040
Wang MQiu HZhang TQiu MThuraisingham B(2023)Mitigating Query-based Neural Network Fingerprinting via Data AugmentationACM Transactions on Sensor Networks10.1145/3597933Online publication date: 29-May-2023
https://dl.acm.org/doi/10.1145/3597933
Shuvo MIslam SCheng JMorshed B(2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
https://doi.org/10.1109/JPROC.2022.3226481
Yuan GLi YLi SKong ZTulyakov STang XWang YRen JKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Layer freezing & data sievingProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601655(19061-19074)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601655
Chen WWang YXu YGao CLiu CZhang L(2022)A Framework for Neural Network Architecture and Compile Co-optimizationACM Transactions on Embedded Computing Systems10.1145/353325122:1(1-24)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3533251
Liu SGaudiot J(2022)Rise of the Autonomous MachinesComputer10.1109/MC.2021.309342855:1(64-73)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/MC.2021.3093428
Liu SHuang YKong ATang JLiu X(2022)Rise of the Automotive Health-Domain Controllers: Empowering Healthcare Services in Intelligent VehiclesIEEE Internet of Things Journal10.1109/JIOT.2022.31948889:24(24882-24889)Online publication date: 15-Dec-2022
https://doi.org/10.1109/JIOT.2022.3194888
Yuan GMa XNiu WLi ZKong ZLiu NGong YZhan ZHe CJin QWang SQin MRen BWang YLiu SLin XRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)MESTProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541855(20838-20850)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541855
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow

Clock Power Analysis of Low Power Clock Gated Arithmetic Logic Unit on Different FPGA

Embedded SoPC Design with Nios II Processor and Verilog Examples

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Digital Edition

Magazine Site

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations