Scheduling In-Band Network Telemetry With Convergence-Preserving Federated Learning | IEEE Journals & Magazine | IEEE Xplore

Scheduling In-Band Network Telemetry With Convergence-Preserving Federated Learning


Abstract:

Conducting federated learning across distributed sites with In-Band Network Telemetry (INT) based data collection faces critical challenges, including control decisions o...Show More

Abstract:

Conducting federated learning across distributed sites with In-Band Network Telemetry (INT) based data collection faces critical challenges, including control decisions of different frequencies, convergence of the models being trained, and resource provisioning coupled over time. To study this problem, we formulate a non-linear mixed-integer program to optimize the long-term INT overhead, resource cost, and federated learning cost. We then design polynomial-time online algorithms to solve this problem with only observable inputs on the fly, featuring laziness-aware resource adaption, online-learning-based INT flow selection and model aggregation control, as well as expectation-preserving randomized dependent rounding. We rigorously prove the parameterized-constant competitive ratio of our approach against the offline optimum, and the time-averaged constraint violation that vanishes in the long run. With extensive trace-driven evaluations, we confirm the superiority of our approach over other alternative approaches for reducing total cost and the efficacy of our trained models for solving real machine learning problems, reducing the real-time cost by 34% on average.
Published in: IEEE/ACM Transactions on Networking ( Volume: 31, Issue: 5, October 2023)
Page(s): 2313 - 2328
Date of Publication: 14 March 2023

ISSN Information:

Funding Agency:


References

References is not available for this document.