Federated Learning in Heterogeneous Data Settings for Virtual Assistants – A Case Study

Pardela, Paweł; Fajfer, Anna; Góra, Mateusz; Janicki, Artur

doi:10.1007/978-3-031-16270-1_37

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

829 Accesses
1 Citations

Abstract

Due to recent increased interest in data privacy, it is important to consider how personal virtual assistants (VA) handle data. The established design of VAs makes data sharing mandatory. Federated learning (FL) appears to be the most optimal way of increasing data privacy of data processed by VAs, as in FL, models are trained directly on users’ devices, without sending them to a centralized server. However, VAs operate in a heterogeneous environment – they are installed on various devices and acquire various quantities of data. In our work, we check how FL performs in such heterogeneous settings. We compare the performance of several optimizers for data of various levels of heterogeneity and various percentages of stragglers. As a FL algorithm, we used FedProx, proposed by Sahu et al. in 2018. For a test database, we use a publicly available Leyzer corpus, dedicated to VA-related experiments. We show that skewed quantity and label distributions affect the quality of VA models trained to solve intent classification problems. We conclude by showing that a carefully selected local optimizer can successfully mitigate this effect, yielding 99% accuracy for the ADAM and RMSProp optimizers even for highly skewed distributions and a high share of stragglers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amiri, M.M., Gündüz, D.: Federated learning over wireless fading channels. IEEE Trans. Wireless Commun. 19(5), 3546–3557 (2020). https://doi.org/10.1109/TWC.2020.2974748
Article Google Scholar
Ammari, T., Kaye, J., Tsai, J., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26, 1–28 (2019). https://doi.org/10.1145/3311956
Bonawitz, K., et al.: Towards federated learning at scale: system design (2019)
Google Scholar
Chung, H., Iorga, M., Voas, J., Lee, S.: Alexa, can I trust you? Computer 50, 100–104 (2017). https://doi.org/10.1109/MC.2017.3571053
European_Commission: 2018 reform of EU data protection rules (2018). https://ec.europa.eu/commission/sites/beta-political/files/data-protection-factsheet-changes_en.pdf
Google: BERT model: uncased_l-12_h-768_a-12 (2018). https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
Hard, A., et al.: Federated learning for mobile keyboard prediction (2019)
Google Scholar
Hsieh, K., Phanishayee, A., Mutlu, O., Gibbons, P.: The non-IID data quagmire of decentralized machine learning. In: Daumé, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 4387–4398. PMLR, 13–18 July 2020. https://proceedings.mlr.press/v119/hsieh20a.html
Hsu, T.M.H., Qi, H., Brown, M.: Measuring the effects of non-identical data distribution for federated visual classification (2019)
Google Scholar
Jiang, P., Agrawal, G.: A linear speedup analysis of distributed deep learning with sparse and quantized communication. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/17326d10d511828f6b34fa6d751739e2-Paper.pdf
Kinsella, B.: Samsung Bixby has 10 million active users globally, October 2017. https://voicebot.ai/2017/10/19/samsung-bixby-10-million-active-users-globally
Konečný, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency. In: NIPS Workshop on Private Multi-Party Machine Learning (2016). https://arxiv.org/abs/1610.05492
Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on Non-IID data silos: an experimental study. ArXiv:abs/2102.02079 (2021)
Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020). https://doi.org/10.1109/msp.2020.2975749
Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of FedAvg on non-IID data (2020)
Google Scholar
Malkin, N., Deatrick, J., Tong, A., Wijesekera, P., Egelman, S., Wagner, D.: Privacy attitudes of smart speaker users. In: Proceedings on Privacy Enhancing Technologies 2019, pp. 250–271, October 2019. https://doi.org/10.2478/popets-2019-0068
McMahan, H.B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)
Google Scholar
Reddi, S.J., et al.: Adaptive federated optimization. CoRR abs/2003.00295 (2020). https://arxiv.org/abs/2003.00295
Sahu, A.K., Li, T., Sanjabi, M., Zaheer, M., Talwalkar, A., Smith, V.: On the convergence of federated optimization in heterogeneous networks. CoRR abs/1812.06127 (2018). http://arxiv.org/abs/1812.06127
Schwartz, E.H.: Samsung Bixby Lives! New features quash premature demise rumors, October 2021. https://voicebot.ai/2021/10/26/samsung-bixby-lives-new-features-quash-premature-demise-rumors/
Shamir, O., Srebro, N., Zhang, T.: Communication-efficient distributed optimization using an approximate newton-type method. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32(2), pp. 1000–1008. PMLR, Beijing, China, 22–24 June 2014. https://proceedings.mlr.press/v32/shamir14.html
Sowański, M., Janicki, A.: Leyzer: a dataset for multilingual virtual assistants. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds.) TSD 2020. LNCS (LNAI), vol. 12284, pp. 477–486. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58323-1_51
Chapter Google Scholar
Zemčík, T.: A brief history of chatbots. DEStech Trans. Comput. Sci. Eng. (2019). https://doi.org/10.12783/dtcse/aicae2019/31439, https://www.dpi-proceedings.com/index.php/dtcse/article/view/31439
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-IID data (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Samsung R &D Institute Poland, Warsaw, Poland
Paweł Pardela, Anna Fajfer & Mateusz Góra
Warsaw University of Technology, Warsaw, Poland
Paweł Pardela & Artur Janicki

Authors

Paweł Pardela
View author publications
You can also search for this author in PubMed Google Scholar
Anna Fajfer
View author publications
You can also search for this author in PubMed Google Scholar
Mateusz Góra
View author publications
You can also search for this author in PubMed Google Scholar
Artur Janicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mateusz Góra .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pardela, P., Fajfer, A., Góra, M., Janicki, A. (2022). Federated Learning in Heterogeneous Data Settings for Virtual Assistants – A Case Study. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-16270-1_37
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Federated Learning in Heterogeneous Data Settings for Virtual Assistants – A Case Study