Skip to main content

Federated Learning in Heterogeneous Data Settings for Virtual Assistants – A Case Study

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2022)

Abstract

Due to recent increased interest in data privacy, it is important to consider how personal virtual assistants (VA) handle data. The established design of VAs makes data sharing mandatory. Federated learning (FL) appears to be the most optimal way of increasing data privacy of data processed by VAs, as in FL, models are trained directly on users’ devices, without sending them to a centralized server. However, VAs operate in a heterogeneous environment – they are installed on various devices and acquire various quantities of data. In our work, we check how FL performs in such heterogeneous settings. We compare the performance of several optimizers for data of various levels of heterogeneity and various percentages of stragglers. As a FL algorithm, we used FedProx, proposed by Sahu et al. in 2018. For a test database, we use a publicly available Leyzer corpus, dedicated to VA-related experiments. We show that skewed quantity and label distributions affect the quality of VA models trained to solve intent classification problems. We conclude by showing that a carefully selected local optimizer can successfully mitigate this effect, yielding 99% accuracy for the ADAM and RMSProp optimizers even for highly skewed distributions and a high share of stragglers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amiri, M.M., Gündüz, D.: Federated learning over wireless fading channels. IEEE Trans. Wireless Commun. 19(5), 3546–3557 (2020). https://doi.org/10.1109/TWC.2020.2974748

    Article  Google Scholar 

  2. Ammari, T., Kaye, J., Tsai, J., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26, 1–28 (2019). https://doi.org/10.1145/3311956

  3. Bonawitz, K., et al.: Towards federated learning at scale: system design (2019)

    Google Scholar 

  4. Chung, H., Iorga, M., Voas, J., Lee, S.: Alexa, can I trust you? Computer 50, 100–104 (2017). https://doi.org/10.1109/MC.2017.3571053

  5. European_Commission: 2018 reform of EU data protection rules (2018). https://ec.europa.eu/commission/sites/beta-political/files/data-protection-factsheet-changes_en.pdf

  6. Google: BERT model: uncased_l-12_h-768_a-12 (2018). https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip

  7. Hard, A., et al.: Federated learning for mobile keyboard prediction (2019)

    Google Scholar 

  8. Hsieh, K., Phanishayee, A., Mutlu, O., Gibbons, P.: The non-IID data quagmire of decentralized machine learning. In: Daumé, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 4387–4398. PMLR, 13–18 July 2020. https://proceedings.mlr.press/v119/hsieh20a.html

  9. Hsu, T.M.H., Qi, H., Brown, M.: Measuring the effects of non-identical data distribution for federated visual classification (2019)

    Google Scholar 

  10. Jiang, P., Agrawal, G.: A linear speedup analysis of distributed deep learning with sparse and quantized communication. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/17326d10d511828f6b34fa6d751739e2-Paper.pdf

  11. Kinsella, B.: Samsung Bixby has 10 million active users globally, October 2017. https://voicebot.ai/2017/10/19/samsung-bixby-10-million-active-users-globally

  12. Konečný, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency. In: NIPS Workshop on Private Multi-Party Machine Learning (2016). https://arxiv.org/abs/1610.05492

  13. Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on Non-IID data silos: an experimental study. ArXiv:abs/2102.02079 (2021)

  14. Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020). https://doi.org/10.1109/msp.2020.2975749

  15. Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of FedAvg on non-IID data (2020)

    Google Scholar 

  16. Malkin, N., Deatrick, J., Tong, A., Wijesekera, P., Egelman, S., Wagner, D.: Privacy attitudes of smart speaker users. In: Proceedings on Privacy Enhancing Technologies 2019, pp. 250–271, October 2019. https://doi.org/10.2478/popets-2019-0068

  17. McMahan, H.B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)

    Google Scholar 

  18. Reddi, S.J., et al.: Adaptive federated optimization. CoRR abs/2003.00295 (2020). https://arxiv.org/abs/2003.00295

  19. Sahu, A.K., Li, T., Sanjabi, M., Zaheer, M., Talwalkar, A., Smith, V.: On the convergence of federated optimization in heterogeneous networks. CoRR abs/1812.06127 (2018). http://arxiv.org/abs/1812.06127

  20. Schwartz, E.H.: Samsung Bixby Lives! New features quash premature demise rumors, October 2021. https://voicebot.ai/2021/10/26/samsung-bixby-lives-new-features-quash-premature-demise-rumors/

  21. Shamir, O., Srebro, N., Zhang, T.: Communication-efficient distributed optimization using an approximate newton-type method. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32(2), pp. 1000–1008. PMLR, Beijing, China, 22–24 June 2014. https://proceedings.mlr.press/v32/shamir14.html

  22. Sowański, M., Janicki, A.: Leyzer: a dataset for multilingual virtual assistants. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds.) TSD 2020. LNCS (LNAI), vol. 12284, pp. 477–486. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58323-1_51

    Chapter  Google Scholar 

  23. Zemčík, T.: A brief history of chatbots. DEStech Trans. Comput. Sci. Eng. (2019). https://doi.org/10.12783/dtcse/aicae2019/31439, https://www.dpi-proceedings.com/index.php/dtcse/article/view/31439

  24. Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-IID data (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mateusz Góra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pardela, P., Fajfer, A., Góra, M., Janicki, A. (2022). Federated Learning in Heterogeneous Data Settings for Virtual Assistants – A Case Study. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16270-1_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16269-5

  • Online ISBN: 978-3-031-16270-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics