skip to main content
10.1145/3672202.3673744acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
short-paper

Poster: Flexible Scheduling of Network and Computing Resources for Distributed AI Tasks

Published: 05 August 2024 Publication History

Abstract

Many emerging Artificial Intelligence (AI) applications require on-demand provisioning of large-scale computing, which can only be enabled by leveraging distributed computing services interconnected through networking. To address such increasing demand for networking to serve AI tasks, we investigate new scheduling strategies to improve communication efficiency and test them on a programmable testbed. We also show relevant challenges and research directions.

References

[1]
Xuanyu Cao, Tamer Başar, Suhas Diggavi, Yonina C. Eldar, Khaled B. Letaief, H. Vincent Poor, and Junshan Zhang. 2023. Communication-Efficient Distributed Learning: An Overview. IEEE Journal on Selected Areas in Communications 41, 4 (2023), 851--873.
[2]
Guo Chen, Yuanwei Lu, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, and Thomas Moscibroda. 2019. MP-RDMA: Enabling RDMA With Multi-Path Transport in Datacenters. IEEE/ACM Transactions on Networking 27, 6 (2019), 2308--2323.
[3]
Nicola Di Cicco, Amir Al Sadi, Chiara Grasselli, Andrea Melis, Gianni Antichi, and Massimo Tornatore. 2023. Poster: Continual Network Learning. In Proceedings of the ACM SIGCOMM 2023 Conference (New York, NY, USA) (ACM SIGCOMM '23). Association for Computing Machinery, New York, NY, USA, 1096--1098.
[4]
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 770--778. https://api.semanticscholar.org/CorpusID:206594692
[5]
Junki Ichikawa, Kiwami Inoue, Tomoya Hibi, Toru Mano, Yukio Tsukishima, Kenji Shimizu, and Hideki Nishizawa. 2021. RDMA acceleration method for long-distance optical networks. IEICE Technical Report; IEICE Tech. Rep. 121, 185 (2021), 39--44.
[6]
Prabhat Kumar, Randhir Kumar, A. K. M. Najmul Islam, Sahil Garg, Georges Kaddoum, and Zhu Han. 2023. Distributed AI and Blockchain for 6G-Assisted Terrestrial and Non-Terrestrial Networks: Challenges and Future Directions. IEEE Network 37, 2 (2023), 70--77.
[7]
Khaled B. Letaief, Yuanming Shi, Jianmin Lu, and Jianhua Lu. 2022. Edge Artificial Intelligence for 6G: Vision, Enabling Technologies, and Applications. IEEE Journal on Selected Areas in Communications 40, 1 (2022), 5--36.
[8]
Juncai Liu, Jessie Hui Wang, and Yimin Jiang. 2023. Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models. In Proceedings of the ACM SIGCOMM 2023 Conference (New York, NY, USA) (ACM SIGCOMM '23). Association for Computing Machinery, New York, NY, USA, 486--498.
[9]
Ruilong Ma, Jingyu Wang, Qi Qi, Xiang Yang, Haifeng Sun, Zirui Zhuang, and Jianxin Liao. 2023. Poster: PipeLLM: Pipeline LLM Inference on Heterogeneous Devices with Sequence Slicing. In Proceedings of the ACM SIGCOMM 2023 Conference (New York, NY, USA) (ACM SIGCOMM '23). Association for Computing Machinery, New York, NY, USA, 1126--1128.
[10]
Hongjian Shi, Ruhui Ma, Dongmei Li, and Haibing Guan. 2023. Hierarchical Adaptive Collaborative Learning: A Distributed Learning Framework for Customized Cloud Services in 6G Mobile Systems. IEEE Network 37, 2 (2023), 44--53.
[11]
Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan. 2023. Network Load Balancing with In-network Reordering Support for RDMA. In Proceedings of the ACM SIGCOMM 2023 Conference (New York, NY, USA) (ACM SIGCOMM '23). Association for Computing Machinery, New York, NY, USA, 816--831.
[12]
Dang Van Huynh, Van-Dinh Nguyen, Saeed R. Khosravirad, George K. Karagiannidis, and Trung Q. Duong. 2023. Distributed Communication and Computation Resource Management for Digital Twin-Aided Edge Computing With Short-Packet Communications. IEEE Journal on Selected Areas in Communications 41, 10 (2023), 3008--3021.
[13]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000--6010.
[14]
Ruikun Wang, Jiawei Zhang, Zhiqun Gu, Memedhe Ibrahimi, Bojun Zhang, Francesco Musumeci, Massimo Tornatore, and Yuefeng Ji. 2024. Digital-twin-assisted meta learning for soft-failure localization in ROADM-based optical networks. Journal of Optical Communications and Networking 16, 7 (2024), C11--C19.
[15]
Ruikun Wang, Jiawei Zhang, Memedhe Ibrahimi, Zhiqun Gu, Yuming Xiao, Francesco Musumeci, Massimo Tornatore, and Yuefeng Ji. 2024. Network for AI: Communication-Efficient Federated Learning with MST-based Scheduling and Multi-Aggregation over Optical Networks. In 2024 Optical Fiber Communications Conference and Exhibition (OFC). 1--3.
[16]
Wenquan Xu, Zijian Zhang, Yong Feng, Haoyu Song, Zhikang Chen, Wenfei Wu, Guyue Liu, Yinchao Zhang, Shuxin Liu, Zerui Tian, and Bin Liu. 2023. ClickINC: In-network Computing as a Service in Heterogeneous Programmable Data-center Networks. In Proceedings of the ACM SIGCOMM 2023 Conference (New York, NY, USA) (ACM SIGCOMM '23). Association for Computing Machinery, New York, NY, USA, 798--815.
[17]
Bojun Zhang, Jiawei Zhang, Zhiqun Gu, Bitao Pan, Huitao Zhou, Yongcheng Li, Zeshan Chang, and Yuefeng Ji. 2024. An All Optical Metro Spine-Leaf Network Architecture with Collaborative OCS and OTS. In 2024 Opto-Electronics and Communications Conference (OECC). 1--4.
[18]
Jiawei Zhang, Zhuo Chen, Bojun Zhang, Ruikun Wang, Huangxu Ma, and Yuefeng Ji. 2023. ADMIRE: collaborative data-driven and model-driven intelligent routing engine for traffic grooming in multi-layer X-Haul networks. Journal of Optical Communications and Networking 15, 2 (2023), A63--A73.
[19]
Jiawei Zhang, Lu Cui, Zhen Liu, and Yuefeng Ji. 2020. Demonstration of Geo-Distributed Data Processing and Aggregation in MEC-Empowered Metro Optical Networks. In 2020 Optical Fiber Communications Conference and Exhibition (OFC). 1--3.

Cited By

View all
  • (2024)Minimizing Power Consumption in IPoWDM Networks with ZR/ZR+ Using Network Reconfiguration2024 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC)10.1109/ACP/IPOC63121.2024.10809949(1-6)Online publication date: 2-Nov-2024

Index Terms

  1. Poster: Flexible Scheduling of Network and Computing Resources for Distributed AI Tasks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ACM SIGCOMM Posters and Demos '24: Proceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos
    August 2024
    140 pages
    ISBN:9798400707179
    DOI:10.1145/3672202
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 August 2024

    Check for updates

    Author Tags

    1. artificial intelligence
    2. networking

    Qualifiers

    • Short-paper

    Funding Sources

    • National Key R&D Program of China
    • the Italian Ministry of University and Research (MUR) and the European Union (EU) under the PON/REACT project

    Conference

    ACM SIGCOMM Posters and Demos '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 92 of 158 submissions, 58%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)169
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Minimizing Power Consumption in IPoWDM Networks with ZR/ZR+ Using Network Reconfiguration2024 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC)10.1109/ACP/IPOC63121.2024.10809949(1-6)Online publication date: 2-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media