Skip to main content
Log in

Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In general, visually impaired students need of another person’s to teach them with the help of computers and book. However, a number of students are not aware of using the computers and understanding the concepts by self. In order to solve this issue, a speech to speech interaction system is developed on the basis of a novel dialogue management system. This interaction is developed by combining Multimedia tools and Partially Observable Markov Decision Process (POMDP) with agenda based model used in the proposed dialogue management system to learn the speech signals from user and system will reply accordingly. The proposed system helps visually impaired students to learn easily using a novel dialogue management system. Word Error Rate, Recognition cum retrieval rate and Misrecognition Retrieval Rate are calculated for the proposed POMDP with Agenda Based dialogue management system. The experimental results are compared with Finite-State Based dialogue management system, Frame Based dialogue management system, and Probabilistic dialogue management system. The experimental results proved that the good performance of the proposed POMDP with Agenda Based dialogue management system. The proposed model is trained with 125 speakers out of which 46 were visually impaired and tested with 95 untrained speakers out of which 32 are visually impaired.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Aida-zade K, Rustamov S, Mustafayev E, Aliyeva N (2012) Humancomputer dialogue understanding hybrid system. Presented at the Innovations in Intelligent Systems and Applications (INISTA), 2012 International Symposium on, Trabzon, pp 1–5

  2. Alexandersson J, Aretoulaki M, Campbell N, Gardner M, Girenko A, Klakow D, Koryzis D, Petukhova V, Specht M, Spiliotopoulos D, Stricker A, Taatgen N (2014) Metalogue: a multiperspective multimodal dialogue system with metacognitive abilities for highly adaptive and flexible dialogue management, pp 365–368

  3. Banchs RE, Li H (2012) IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 system demonstrations, pp 37–42

  4. Baumann T, Kennington C, Hough J, Schlangen D (2017) Recognising conversational speech: what an incremental asr should do for a dialogue system and how to get there. In: Dialogues with social robots. Springer, Singapore, pp 421–432

    Chapter  Google Scholar 

  5. Bokaei MH, Sameti H, Eghbal-zadeh H, BabaAli B, Hosseinzadeh KH, Bahrani M, Veisi H, Sanian A (2010) Niusha, the first Persian speech-enabled IVR platform. In: Telecommunications (IST), 2010 5th international symposium on, pp 591–595

  6. Budkov VY, Prischepa MV, Ronzhin AL, Karpov AA (2010) Multimodal human-robot interaction. In: Ultra modern telecommunications and control systems and workshops (ICUMT), 2010 international congress on, pp 485–488

  7. Bui T, Poel M, Nijholt A, Zwiers J (2009) A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems. Nat Lang Eng 15(2):273–307

    Article  Google Scholar 

  8. Cavazza M, De La Cámara RS, Turunen M, Gil JR, Hakulinen J, Crook N, Field D (2010) ‘How was your day?’: an affective companion ECA prototype. In: Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue, pp 277–280

  9. Celikyilmaz A, Hakkani-Tur D, Tur G (2012) Statistical semantic interpretation modeling for spoken language understanding with enriched semantic features. In: Spoken language technology workshop (SLT), 2012 IEEE, pp 216–221

  10. Cortana (software) - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Cortana_(software). Accessed 30 Apr 2016

  11. Di Lecce V, Calabrese M, Soldo D, Quarto A Dialogueoriented interface for linguistic human-computer interaction: a chatbased application. Presented at the 2010 IEEE international conference on virtual environments, human-computer interfaces and measurement systems, taranto, pp. 103–108

  12. Dinarelli M, Stepanov EA, Varges S, Riccardi G (2010) The LUNA spoken dialogue system: beyond utterance classification. In: ICASSP, pp 5366–5369

  13. Doshi F, Roy N (2007) Efficient model learning for dialog management. In: Proceedings of the ACM/IEEE international conference on human-robot interaction. ACM, pp 65–72. ISBN 978-1- 59593-617-2

  14. Dzikovska MO, Moore JD, Steinhauser N, Campbell G, Farrow E, Callaway CB (2010) Beetle II: a system for tutoring and computational linguistics experimentation. In: Proceedings of the ACL 2010 system demonstrations, pp 13–18

  15. Dzikovska MO, Isard A, Bell P, Moore JD, Steinhauser N, Campbell G (2011) BEETLE II: an adaptable tutorial dialogue system. In: Proceedings of the SIGDIAL 2011 conference, pp 338–340

  16. Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, Lally A, Murdock JW, Nyberg E, Prager J, others (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79

    Article  Google Scholar 

  17. Galescu L, Allen J, Ferguson G, Quinn J, Swift M (2009) Speech recognition in a dialog system for patient health monitoring

  18. Galibert O, Illouz G, Rosset S (2005) Ritel: an open-domain, humancomputer dialog system. In: Interspeech, pp 909–912

  19. Google Now - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Google_Now. Accessed 30 Apr 2016

  20. Hastie H, Aufaure M-A, Alexopoulos P, Cuayáhuitl H, Dethlefs N, Gasic M, Henderson J, Lemon O, Liu X, Mika P, others (2013) Demonstration of the parlance system: a data-driven, incremental, spoken dialogue system for interactive search. In: Proceedings of the SIGDIAL 2013 conference, pp 154–156

  21. Henderson J, Lemon O, Georgila K (2005) Hybrid reinforcement/supervised learning for dialogue policies from communicator data. In: IJCAI workshop on knowledge and reasoning in practical dialogue systems, pp 68–75

  22. Hsieh M-C, Hung W-S, Lin S-W, Luo C-H (2009) Designing an assistive dialog agent for a case of spinal cord injury, pp 67–72

  23. Hung V, Gonzalez A, DeMara R (2009) Towards a context-based dialog management layer for expert systems, pp 60–65

  24. Jokinen K, Wilcock G (2011) Emergent verbal behaviour in humanrobot interaction. InL Cognitive Infocommunications (CogInfoCom), 2011 2nd international conference on, pp 1–4

  25. Kanisha B, Lokesh S, Kumar PM et al (2018) Speech recognition with improved support vector machine using dual classifiers and cross fitness validation. Pers Ubiquit Comput. https://doi.org/10.1007/s00779-018-1139-0

    Article  Google Scholar 

  26. Karpov A, Ronzhin A, Kipyatkova I, Ronzhin A, Akarun L (2010) Multimodal human computer interaction with MIDAS intelligent infokiosk, pp 3862–3865

  27. Kim D, Sim HS, Kim KE, Kim JH, Kim H, Sung JW (2008) Effects of user modeling on POMDP based dialogue systems. In: Proceedings of interspeech

  28. Lee C, Cha Y-S, Kuc T-Y (2008) Implementation of dialogue system for intelligent service robots. In: Control, automation and systems, 2008. ICCAS 2008. International conference on, pp 2038–2042

  29. Lefevre F, Gasic M, Jurcicek F, Keizer S, Mairesse F, Thomson B, Yu K, Young S (2009) k-nearest neighbor Monte-Carlo control algorithm for POMDP-based dialogue systems. In: Proceedings of SIGDIAL

  30. Lemaignan S, Ros R, Alami R, Beetz M (2011) What are you talking about? Grounding dialogue in a perspective-aware robotic architecture. In: RO-MAN, 2011 IEEE, pp 107–112

  31. Li L, Williams JD, Balakrishnan S (2009) Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection In: Proceedings of interspeech

  32. Liu J, Cyphers S, Pasupat P, McGraw I, Glass JR (2012) A conversational movie search system based on conditional random fields. In: INTERSPEECH, pp 2454–2457

  33. Lokesh S, Balakrishnan G (2012) Speech enhancement using mel-LPC cepstrum and vector quantization for ASR. Eur J Sci Res 73(2):202–209

    Google Scholar 

  34. Lokesh S, Balakrishnan G (2012) Robust speech feature prediction using Mel-LPC to improve recognition accuracy. Inf Technol J 11(11):1644–1699

    Article  Google Scholar 

  35. Lokesh S, Devi MR (2017) Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Clust Comput. https://doi.org/10.1007/s10586-017-1447-6. Springer

    Article  Google Scholar 

  36. Lokesh S, Malarvizhi Kumar P, Ramya Devi M et al (2018) An automatic Tamil speech recognition system by using bidirectional recurrent neural network with self-organizing map. Neural Computing & Applications. https://doi.org/10.1007/s00521-018-3466-5

    Article  Google Scholar 

  37. Mantena GV, Rajendran S, Rambabu B, Gangashetty SV, Yegnanarayana B, Prahallad K (2011) A speech-based conversation system for accessing agriculture commodity prices in Indian languages. In: Hands-free speech communication and microphone arrays (HSCMA), 2011 joint workshop on, pp 153–154

  38. Mantena GV, Rajendran S, Gangashetty SV, Prahallad K (2011) Development of a spoken dialogue system for accessing agricultural information in Telugu. In: Proceedings of ICON-2011, 9th international conference on natural language processing

  39. Morbini F, Forbell E, DeVault D, Sagae K, Traum DR, Rizzo AA (2012) A mixed-initiative conversational dialogue system for healthcare. In: Proceedings of the 13th annual meeting of the special interest group on discourse and dialogue, pp 137–139

  40. Peters J, Vijayakumar S, Schaal S (2005) Natural actor-critic. In: Proceedings of ECML. Springer, Heidelberg, pp 280–291

    Google Scholar 

  41. Roy N, Pineau J, Thrun S (2000) Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL

  42. Schwarzler S, Schenk J, Ruske G, Wallhoff F (2009) A multi-agent framework for a hybrid dialog management system. Presented at the IEEE international conference on multimedia and expo, New York, NY, pp 958–961

  43. Selvaraj L, Ganesan B (2014) Enhancing speech recognition using improved particle swarm optimization based hidden Markov model. Sci World J. https://doi.org/10.1155/2014/270576

    Article  Google Scholar 

  44. Shahnawazuddin S, Thotappa D, Sarma BD, Deka A, Prasanna SRM, Sinha R (2013) Assamese spoken query system to access the price of agricultural commodities. In: Communications (NCC), 2013 National Conference on, pp 1–5

  45. Sharma K, Haksar P (2012) Speech denoising using different types of filters. International Journal of Engineering Research and Applications 2(1):809–811

    Google Scholar 

  46. Siri - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Siri. Accessed 30 Apr 2016

  47. Thomson B, Schatzmann J, Young S (2008) Bayesian update of dialogue state for robust dialogue systems. In: Proceedings of ICASSP, pp 4937–4940

  48. Ultes S, Barahona LMR, Su PH, Vandyke D, Kim D, Casanueva I, … Young S (2017) Pydial: a multi-domain statistical dialogue system toolkit. Proceedings of ACL 2017, system demonstrations, pp 73–78

  49. Varatharajan R, Manogaran G (2017) Wearable sensor devices for early detection of Alzheimer disease using dynamic time warping algorithm. Clust Comput. https://doi.org/10.1007/s10586-017-0977-2

    Article  Google Scholar 

  50. Varatharajan R, Manogaran G, Priyan MK, Balas V, Barna C (2017) Visual analysis of geospatial habitat suitability model based on inverse distance weighting with paired comparison analysis. Multimedia Tools and Applications:1–21. https://doi.org/10.1007/s11042-017-4768-9

    Article  Google Scholar 

  51. Varatharajan R, Vasanth K, Gunasekaran M, Priyan M, Gao XZ (2017) An adaptive decision based kriging interpolation algorithm for the removal of high density salt and pepper noise in images. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2017.05.035

    Article  Google Scholar 

  52. Vishnupriya R, Devi T (2014) Speech recognition tools for mobile phone - a comparative study, pp 426–430

  53. Vlasenko B, Wendemuth A (2009) Heading toward to the natural way of human-machine interaction: the NIMITEK project. In: Multimedia and expo, 2009. ICME 2009. IEEE international conference on, pp 950–953

  54. Wang H, Cai G, MacEachren AM (2008) GeoDialogue: a software agent enabling collaborative dialogues between a user and a conversational GIS, pp 357–360

  55. Watson (computer) - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Watson_(computer). Accessed 01 May 2016

  56. Williams JD (2008b) Integrating expert knowledge into POMDP optimization for spoken dialog systems. In: Proceedings of the AAAI workshop on advancements in POMDP solvers

  57. Williams JD, Young S (2007) Scaling POMDPs for spoken dialog management. IEEE Trans Audio Speech Lang Process 15:2116–2129

    Article  Google Scholar 

  58. Young S (2017) Statistical spoken dialogue systems and the challenges for machine learning. In: Proceedings of the tenth ACM international conference on web search and data mining. ACM, p 577

  59. Young SJ, Williams JD, Schatzmann J, Stuttle MN, Weilhammer K (2005) The hidden information state approach to dialogue management. Technical Report CUED/FINFENG/TR.544, Cambridge University Engineering Department

  60. Young S, Gasic M, Keizer S, Mairesse F, Schatzmann J, Thomson B, Yu K (2009) The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Comput Speech Lang 24:150–174. ISSN 08852308

    Article  Google Scholar 

  61. Zhang B, Cai Q, Mao J, Chang E, Guo B (2001) Spoken dialogue management as planning and acting under uncertainty. In: Seventh European conference on speech communication and technology

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Lokesh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lokesh, S., Kanisha, B., Nalini, S. et al. Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students. Multimed Tools Appl 79, 5023–5042 (2020). https://doi.org/10.1007/s11042-018-6264-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6264-2

Keywords

Navigation