Loading [a11y]/accessibility-menu.js
Scheduling In-Band Network Telemetry With Convergence-Preserving Federated Learning | IEEE Journals & Magazine | IEEE Xplore

Scheduling In-Band Network Telemetry With Convergence-Preserving Federated Learning


Abstract:

Conducting federated learning across distributed sites with In-Band Network Telemetry (INT) based data collection faces critical challenges, including control decisions o...Show More

Abstract:

Conducting federated learning across distributed sites with In-Band Network Telemetry (INT) based data collection faces critical challenges, including control decisions of different frequencies, convergence of the models being trained, and resource provisioning coupled over time. To study this problem, we formulate a non-linear mixed-integer program to optimize the long-term INT overhead, resource cost, and federated learning cost. We then design polynomial-time online algorithms to solve this problem with only observable inputs on the fly, featuring laziness-aware resource adaption, online-learning-based INT flow selection and model aggregation control, as well as expectation-preserving randomized dependent rounding. We rigorously prove the parameterized-constant competitive ratio of our approach against the offline optimum, and the time-averaged constraint violation that vanishes in the long run. With extensive trace-driven evaluations, we confirm the superiority of our approach over other alternative approaches for reducing total cost and the efficacy of our trained models for solving real machine learning problems, reducing the real-time cost by 34% on average.
Published in: IEEE/ACM Transactions on Networking ( Volume: 31, Issue: 5, October 2023)
Page(s): 2313 - 2328
Date of Publication: 14 March 2023

ISSN Information:

Funding Agency:


I. Introduction

In-Band Network Telemetry (INT) [1], [2] enables the network switch to insert its state information (e.g., queue length, hop latency, link utilization) into the packet header. Such state data are then transported by the packet and separated from the packet payload for further processing at the destination. Compared to conventional network monitoring approaches, INT can collect data at line speed in the data-forwarding plane, measure desired switches en route, adapt to almost any encapsulation format, and scale to large networks of diverse types. P4 switch is an industrial example to realize INT [2].

Contact IEEE to Subscribe

References

References is not available for this document.