research-article

Hierarchical memory-constrained operator scheduling of neural architecture search networks

Authors:

Chengcheng Wan,

Lei QiaoAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 493 - 498

https://doi.org/10.1145/3489517.3530472

Published: 23 August 2022 Publication History

Abstract

Neural Architecture Search (NAS) is widely used in industry, searching for neural networks meeting task requirements. Meanwhile, it faces a challenge in scheduling networks satisfying memory constraints. This paper proposes HMCOS that performs hierarchical memory-constrained operator scheduling of NAS networks: given a network, HMCOS constructs a hierarchical computation graph and employs an iterative scheduling algorithm to progressively reduce peak memory footprints. We evaluate HMCOS against RPO and Serenity (two popular scheduling techniques). The results show that HMCOS outperforms existing techniques in supporting more NAS networks, reducing 8.7~42.4% of peak memory footprints, and achieving 137--283x of speedups in scheduling.

References

[1]

Martín Abadi etal. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283.

[2]

Byung Hoon Ahn et al. 2020. Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices. In MLSys, Vol. 2. 44--57.

[3]

Tianqi Chen et al. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI. 579--594.

[4]

Yaoyao Ding et al. 2021. IOS: Inter-Operator Scheduler for CNN Acceleration. In MLSys, Vol. 3. 167--180.

[5]

Song Han et al. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In ICLR.

[6]

Kaiming He et al. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.

[7]

Andrew G. Howard et al. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861 [cs.CV]

[8]

Gao Huang et al. 2017. Densely Connected Convolutional Networks. In CVPR. 2261--2269.

[9]

Zhihao Jia et al. 2019. Optimizing DNN Computation with Relaxed Graph Substitutions. In MLSys, Vol. 1. 27--39.

[10]

Marisa Kirisame et al. 2021. Dynamic Tensor Rematerialization. In ICLR.

[11]

Thomas Lengauer et al. 1979. A Fast Algorithm for Finding Dominators in a Flowgraph. ACM Trans. Program. Lang. Syst. 1, 1 (Jan. 1979), 121--141.

[12]

Hanxiao Liu et al. 2019. DARTS: Differentiable Architecture Search. In ICLR.

[13]

Lingxiao Ma et al. 2020. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. In OSDI. 881--897.

[14]

Xuan Peng et al. 2020. Capuchin: Tensor-Based GPU Memory Management for Deep Learning. In ASPLOS. 891--905.

[15]

Esteban Real et al. 2019. Regularized Evolution for Image Classifier Architecture Search. In AAAI. 4780--4789.

[16]

Pengzhen Ren et al. 2021. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv. 54, 4, Article 76 (2021).

[17]

Karen Simonyan et al. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.

[18]

Saining Xie et al. 2019. Exploring Randomly Wired Neural Networks for Image Recognition. In ICCV. 1284--1293.

[19]

Barret Zoph et al. 2018. Learning Transferable Architectures for Scalable Image Recognition. In CVPR. 8697--8710.

Cited By

Chen RDing ZZheng SZhang CLeng JLiu XLiang YTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNNProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651330(607-621)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651330
Ping YJiang HLiu XZhao ZZhou ZChen X(2024)Latency-Based Inter-Operator Scheduling for CNN Inference Acceleration on GPUIEEE Transactions on Services Computing10.1109/TSC.2023.334595217:1(277-290)Online publication date: Jan-2024
https://doi.org/10.1109/TSC.2023.3345952
Hou XTang PXu TXu CLi CGuo M(2024)CPM: A Cross-layer Power Management Facility to Enable QoS-Aware AIoT Systems2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682859(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682859
Show More Cited By

Recommendations

Towards memory-efficient processing-in-memory architecture for convolutional neural networks
LCTES '17

Convolutional neural networks (CNNs) are widely adopted in artificial intelligent systems. In contrast to conventional computing centric applications, the computational and memory resources of CNN applications are mixed together in the network weights. ...
Performance of Hierarchical Processor Scheduling in Shared-Memory Multiprocessor Systems

Processor scheduling policies for multiprocessor systems can be broadly divided into space-sharing and time-sharing policies. Space-sharing policies divide the system processors into a number of partitions and each partition is exclusively allocated to ...
Approximation schemes for constrained scheduling problems
SFCS '89: Proceedings of the 30th Annual Symposium on Foundations of Computer Science

Several constrained scheduling problems are considered. The first polynomial approximation schemes for the problem of minimizing maximum completion time in a two-machine flow shop with release dates and for the problem of minimizing maximum lateness for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
386
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)7

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen RDing ZZheng SZhang CLeng JLiu XLiang YTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNNProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651330(607-621)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651330
Ping YJiang HLiu XZhao ZZhou ZChen X(2024)Latency-Based Inter-Operator Scheduling for CNN Inference Acceleration on GPUIEEE Transactions on Services Computing10.1109/TSC.2023.334595217:1(277-290)Online publication date: Jan-2024
https://doi.org/10.1109/TSC.2023.3345952
Hou XTang PXu TXu CLi CGuo M(2024)CPM: A Cross-layer Power Management Facility to Enable QoS-Aware AIoT Systems2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682859(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682859
Hou XXu TLi CXu CLiu JHu YZhao JLeng JCheng KGuo M(2024)A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00022(167-181)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00022
Nie PWang ZWan CLin ZJiang HZhao JChen Y(2024)OPASS: Orchestrating TVM's Passes for Lowering Memory Footprints of Computation Graphs2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00026(175-186)Online publication date: 6-Oct-2024
https://doi.org/10.1109/ICSME58944.2024.00026
Navardi MHumes EManjunath TMohsenin T(2023)MetaE2RL: Toward Meta-Reasoning for Energy-Efficient Multigoal Reinforcement Learning With Squeezed-Edge You Only Look OnceIEEE Micro10.1109/MM.2023.331820043:6(29-39)Online publication date: 25-Sep-2023
https://dl.acm.org/doi/10.1109/MM.2023.3318200
Zhong SLi MLiang YWang RHuang R(2023)Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323998(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323998
Sun XXu CLi C(2023)Minimizing Peak Memory Footprint of Inference on IoTs Devices by Efficient RecomputationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4761-4_2(15-26)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1007/978-981-99-4761-4_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten