Abstract:
Machine learning (ML) models have been deployed in mobile networks to deal with massive data from different layers to enable automated network management. To overcome hig...Show MoreMetadata
Abstract:
Machine learning (ML) models have been deployed in mobile networks to deal with massive data from different layers to enable automated network management. To overcome high communication cost and severe privacy concerns of centralized ML, federated learning (FL) has been proposed to achieve distributed ML among numerous networked devices. While the computation and communication limitation has been widely studied, the impact of limited storage of mobile devices on the performance of FL is still not explored. Without an effective data selection policy to filter the massive streaming networked data on devices, classical FL can suffer from much longer model training time ( 4\times ) and dramatic inference accuracy reduction (7%), observed in our experiments. In this work, we take the first step to consider the online data selection for FL with limited on-device storage. We first define a new data valuation metric for data selection in FL with theoretical guarantee for simultaneously accelerating model convergence and enhancing final accuracy. We further design ODE, an Online Data sElection framework for FL, to coordinate networked devices to store valuable data samples collaboratively. Experimental results on one industrial dataset and three public datasets show the remarkable advantages of ODE over the state-of-the-art approaches. Particularly, on the industrial dataset, ODE achieves as high as 2.5\times speedup of training time and 6% increase in final accuracy, and is robust to various factors in practical environments.
Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 4, August 2024)