Skip to main content
Log in

Data Augmentation for Internet of Things Dialog System

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

With rapid development of voice control technology, making speech recognition more precisely in various IoT domains have been an intractable problem to be solved. Since there are various conversation scenes, understanding the context of a dialog scene is a key issue of voice control systems. However, the reality is available training data for dialog system are always insufficient. In this paper, we mainly solve the problem of data lacking in dialog systems by data augmentation technique. A Generative Adversarial Network(GAN)-based model is proposed and the data are augmented effectively. It can generate from text to text, enhance the original data with text retelling, and improve the robustness of parameter estimation of unknown data by using the sample data generated by GAN model. A new N-gram language model is used to evaluate multiple recognition candidates of speech recognition, and the candidate sentences with the highest evaluation scores are selected as the final result of speech recognition. Our data enhancement algorithm based on the Generative Model is verified by the experiments. In the result of model comparison test, the error rates of data set THCHS30 and AISHELL are 3.3% and 5.1% which are lower than that of the baseline system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Hirsimaki T, Pylkkonen J, Kurimo M (2009) Importance of High-Order N-Gram Models in Morph-Based Speech Recognition[J]. IEEE Trans Audio Speech Lang Process 17(4):724–732

    Article  Google Scholar 

  2. Siivola V, Hirsimäki T et al (2007) On growing and pruning Kneser-Ney smoothed N-gram models[J]. IEEE Trans Audio Speech Lang Process 15(5):1617–1624

    Article  Google Scholar 

  3. Cohen L, Krustedt RL, May M (2009) Fluency, Text Structure, and Retelling: A Complex Relationship[J]. Read Horiz 49:101–124

    Google Scholar 

  4. Kucer SB (2011) Going beyond the author: what retellings tell us about comprehending narrative and expository texts[J]. Literacy 45(2):62–69

    Article  Google Scholar 

  5. Cui X, Goel V, Kingsbury B (2015) Data augmentation for deep neural network acoustic modeling[J]. IEEE/ACM Trans Audio Speech Lang Process 23(9):1469–1477

    Article  Google Scholar 

  6. Naredo E, Urbano P, Trujillo L (2016) The training set and generalization in grammatical evolution for autonomous agent navigation[J]. Soft Comput 21(15):1–18

    Google Scholar 

  7. Chang WD (2014) Recurrent neural network modeling combined with bilinear model structure[J]. Neural Comput & Applic 24(3–4):765–773

    Article  Google Scholar 

  8. Wang J, Jie Z, Wang X, Bilateral LSTM (2018) A Two-Dimensional Long Short-Term Memory Model With Multiply Memory Units for Short-Term Cycle Time Forecasting in Re-entrant Manufacturing Systems[J]. IEEE Trans Ind Inf 14(2):748–758

    Article  Google Scholar 

  9. Palangi H, Li D, Shen Y et al (2016) Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval[J]. IEEE/ACM Trans Audio Speech Lang Process 24(4):1–1

    Article  Google Scholar 

  10. Blatz J, Fitzgerald E, Foster G et al (2008) Confidence Estimation for Machine Translations[J]. Proc Coling 33(1):9–40

    Google Scholar 

  11. Nothman J, Ringland N, Radford W et al (2013) Learning multilingual named entity recognition from Wikipedia[J]. Artif Intell 194:151–175

    Article  MathSciNet  MATH  Google Scholar 

  12. Kukich K (1983) Design of a knowledge-based report generator. In Proceedings of the 21st annual meeting on Association for Computational Linguistics (ACL ‘83). Association for Computational Linguistics, USA, 145–150

  13. Xu L, Jiang L, Qin C, et al. (2018) How images inspire poems: generating classical Chinese poetry from images with memory networks. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI). pp. 5618–5625.

  14. Otte S, Butz MV, Koryakin D et al (2016) Optimizing recurrent reservoirs with neuro-evolution[J]. Neurocomputing 192:128–138

    Article  Google Scholar 

  15. Shao Y, Hardmeier C, Tiedemann J, et al. (2017) Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. The 8th International Joint Conference on Natural Language Processing (IJCNLP).pp. 59–69.

  16. Mei H, Bansal M, Walter MR (2015) What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment[J]. Computerence 1:720–730

    Google Scholar 

  17. Biesmans W, Das N, Francart T et al (2017) Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario[J]. IEEE Trans Neural Syst Rehab Eng 25(5):402–412

    Article  Google Scholar 

  18. Lebret R et al. (2016) “Generating text from structured data with application to the biography domain.”ArXiv abs/1603.07771

  19. Bengio Y, Ducharme R, Vincent P et al (2003) A neural probabilistic language model[J]. J Mach Learn Res 3(2):1137–1155

    MATH  Google Scholar 

  20. Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), pp. 2852–2858.

  21. Zhu J, Park T, Isola P, Efros AA (2017) "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2242–2251.

  22. Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp.6382–6388.

  23. Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2019) Unsupervised Data Augmentation for Consistency Training, arXiv:1904.12848v4

  24. Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel C (2019) MixMatch: A Holistic Approach to Semi-Supervised Learning, 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada

  25. Huang S-W, Lin C-T, Chen S-P, Wu Y-Y, Lai S-H (2018) AugGAN: Cross Domain Adaptation with GAN-based Data Augmentation. ECCV 2018: European Conference on Computer Vision, pp.718–731.

  26. Hu Z et al. (2019) Learning Data Manipulation for Augmentation and Weighting. In: Advances in Neural Information Processing Systems, pp. 15738–15749.

  27. Li Y et al (2018) A generative model for category text generation. Inf Sci 450:301–315

    Article  MathSciNet  Google Scholar 

  28. Shakeel MH, Karim A, Khan I (2020) A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts. Inf Process Manag 57(3):78–88

    Article  Google Scholar 

  29. Ling ZH, Ai Y, Gu Y et al (2018) Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension[J]. IEEE/ACM Trans Audio Speech Lang Process 26(5):883–894

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Funding from the FCT - Fundação para a Ciência e a Tecnologia through the UID/EEA/50008/2019 Project; and by Brazilian National Council for Scientific and Technological Development (CNPq) via Grant No. 309335/2017-5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saru Kumari.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, E.K., Yu, J., Chen, CM. et al. Data Augmentation for Internet of Things Dialog System. Mobile Netw Appl 27, 158–171 (2022). https://doi.org/10.1007/s11036-020-01638-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-020-01638-9

Keywords

Navigation