Cross Lingual Style Transfer Using Multiscale Loss Function for Soliga: A Low Resource Tribal Language

Dasare, Ashwini; Reddy, B. Lohith; Koushik, A. Sai Chandra; Raj, B. Sai; Rohith, V. Krishna Sai; Basavaraju, Satisha; Deepak, K. T.

doi:10.1007/978-3-031-48312-7_15

Ashwini Dasare¹³,
B. Lohith Reddy¹³,
A. Sai Chandra Koushik¹³,
B. Sai Raj¹³,
V. Krishna Sai Rohith¹³,
Satisha Basavaraju¹⁴ &
…
K. T. Deepak¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14339))

Included in the following conference series:

International Conference on Speech and Computer

297 Accesses

Abstract

Voice conversion is the art of mimicking different speaker voices and styles. In this paper, we present a cross-lingual speaker style adaptation based on a multi-scale loss function, using a deep learning framework for syntactically similar languages Kannada and Soliga, under a low resource setup. The existing speaker adaptation methods usually depend on monolingual data and cannot be directly adopted for cross-lingual data. The proposed method calculates multi-scale reconstruction loss on the generated mel-spectrogram with that of the original mel-spectrogram and adopts its weights based on the loss function for various scales. Extensive experimental results illustrate that the multi-scale reconstruction resulted in a significant reduction of generator noise compared to the baseline model and faithfully transfers Soliga speaker styles to Kannada speakers while retaining the linguistic aspects of Soliga.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

AlBadawy, E.A., Lyu, S.: Voice conversion using speech-to-speech neuro-style transfer. In: Interspeech, pp. 4726–4730 (2020)
Google Scholar
Biadsy, F., Weiss, R.J., Moreno, P.J., Kanevsky, D., Jia, Y.: Parrotron: an end-to-end speech-to-speech conversion model and its applications to hearing-impaired speech and speech separation. arXiv preprint arXiv:1904.04169 (2019)
Chandramouli, C., General, R.: Census of india 2011. Provisional Population Totals. New Delhi: Government of India, pp. 409–413 (2011)
Google Scholar
Dasare, A., Deepak, K., Prasanna, M., Vijaya, K.S.: Text to speech system for lambani-a zero resource, tribal language of india. In: 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1–6. IEEE (2022)
Google Scholar
Ghiasi, G., Lee, H., Kudlur, M., Dumoulin, V., Shlens, J.: Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830 (2017)
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)
Google Scholar
Haokip, T.: Artificial intelligence and endangered languages. Available at SSRN 4212504 (2022)
Google Scholar
Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model. arXiv preprint arXiv:1904.06037 (2019)
Joly, C.: Real-time voice cloning (2018). https://doi.org/10.5281/zenodo.1472609
Kons, Z., Shechtman, S., Sorin, A., Rabinovitz, C., Hoory, R.: High quality, lightweight and adaptable tts using lpcnet. arXiv preprint arXiv:1905.00590 (2019)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Conference on Neural Information Processing Systems (NIPS), pp. 1–9 (2017)
Google Scholar
Moseley, C.: The UNESCO atlas of the world’s languages in danger: Context and process. World Oral Literature Project (2012)
Google Scholar
Nag, S.: Early reading in kannada: the pace of acquisition of orthographic knowledge and phonemic awareness. J. Res. Reading 30(1), 7–22 (2007)
Article Google Scholar
Nautiyal, S., Rajasekaran, C., Varsha, N.: Cross-cultural ecological knowledge related to the use of plant biodiversity in the traditional health care systems in biligiriranga-swamy temple tiger reserve, karnataka. Medicinal Plants-Inter. J. Phytomed. Related Indus. 6(4), 254–271 (2014)
Article Google Scholar
Oord, A.v.d., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Openslr. https://www.openslr.org/79/, (Accessed 29 Sept 2023)
Ping, W., et al.: Deep voice 3: scaling text-to-speech with convolutional sequence learning. arXiv preprint arXiv:1710.07654 (2017)
Results page. https://style-transfer-five.vercel.app/, (Accessed 29th Sept 2023)
Shen, J., et al.: Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4779–4783. IEEE (2018)
Google Scholar
Swadesh, M.: Lexico-statistic dating of prehistoric ethnic contacts: with special reference to north American indians and eskimos. Proc. Am. Philos. Soc. 96(4), 452–463 (1952)
Google Scholar
Wang, X., Feng, S., Zhu, J., Hasegawa-Johnson, M., Scharenborg, O.: Show and speak: directly synthesize spoken description of images. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4190–4194. IEEE (2021)
Google Scholar
Wang, Y., et al.: Style tokens: unsupervised style modeling, control and transfer in end-to-end speech synthesis. In: International Conference on Machine Learning, pp. 5180–5189. PMLR (2018)
Google Scholar
Yamamoto, R.: Wavenet vocoder (2018). https://github.com/r9y9/wavenet_vocoder

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology, Dharwad, India
Ashwini Dasare, B. Lohith Reddy, A. Sai Chandra Koushik, B. Sai Raj, V. Krishna Sai Rohith & K. T. Deepak
Beltech AI Pvt. Ltd., Bangalore, India
Satisha Basavaraju

Authors

Ashwini Dasare
View author publications
You can also search for this author in PubMed Google Scholar
B. Lohith Reddy
View author publications
You can also search for this author in PubMed Google Scholar
A. Sai Chandra Koushik
View author publications
You can also search for this author in PubMed Google Scholar
B. Sai Raj
View author publications
You can also search for this author in PubMed Google Scholar
V. Krishna Sai Rohith
View author publications
You can also search for this author in PubMed Google Scholar
Satisha Basavaraju
View author publications
You can also search for this author in PubMed Google Scholar
K. T. Deepak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashwini Dasare .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dasare, A. et al. (2023). Cross Lingual Style Transfer Using Multiscale Loss Function for Soliga: A Low Resource Tribal Language. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14339. Springer, Cham. https://doi.org/10.1007/978-3-031-48312-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-48312-7_15
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48311-0
Online ISBN: 978-3-031-48312-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cross Lingual Style Transfer Using Multiscale Loss Function for Soliga: A Low Resource Tribal Language