Skip to main content
Log in

Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the advancement of interface technologies in smart devices, voice-controlled assistants have quickly gained popularity. These assistants are designed to use voice commands to achieve a more human-friendly interaction. On these lines, we propose a cloud-connected voice based home assistant in this paper. It accepts voice commands to control or monitor devices in a home. It can understand and schedule device operations based on time or sensor data through a simple voice based approach. To enhance its capability, it is designed to identify the speakers. Mel-Frequency Cepstrum Coefficients (MFCC) in combination with other speech features are used as feature vector. We use Vector Quantization (VQ) and Principal Component Analysis (PCA) for dimensionality reduction of the feature vector, followed by Gaussian Mixture Model (GMM) for classification. The validation of the short speech speaker identification is carried out on a set of Indian speakers in an uncontrolled indoor environment. An accuracy greater than 92% is achieved for speech samples as small as 1 second. A database of more than 50 different commands per speaker is also created for validation of the proposed virtual assistant. IBM’s Bluemix and Google’s cloud service is used for speech to text conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Ahmed E, Islam A, Sarker F, Huda MN, Abdullah-al-mamun K (2016) In: 2016 5th International Conference on Electronics and Vision (ICIEV) Informatics. IEEE, pp 472–477

  2. Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of dwt and mfcc feature warping in the presence of noise and reverberation conditions. IEEE Access 5:15400

    Article  Google Scholar 

  3. Alepis E, Patsakis C (2017) Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5:17841

    Article  Google Scholar 

  4. Bijuraj L (2013) In: proceedings of National Conference on New Horizons in IT-NCNHIT, pp 169

  5. Bizjak J, Gradišek A, Stepančič L, Gjoreski H, Gams M (2017) Intelligent assistant carer for active aging. EURASIP J Adv Signal Process 2017(1):76

    Article  Google Scholar 

  6. Chen SC, Wu CM, Chen YJ, Chin JT, Chen YY (2017) In: 2017 International Conference on Applied System Innovation (ICASI). IEEE, pp 503–506

  7. Cumani S, Laface P (2012) Analysis of large-scale svm training algorithms for language and speaker recognition. IEEE Trans Audio Speech, Lang Process 20(5):1585

    Article  Google Scholar 

  8. Cumani S, Laface P (2014) Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 22 (11):1590

    Article  Google Scholar 

  9. Dobrowolski AP, Majda E (2011) In: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings (SPA). IEEE, pp 1–6

  10. El Ayadi M, Hassan AKS, Abdel-Naby A, Elgendy OA (2017) Text-independent speaker identification using robust statistics estimation. Speech Comm 92:52

    Article  Google Scholar 

  11. Farrell KR, Mammone RJ, Assaleh KT (1994) Speaker recognition using neural networks and conventional classifiers. IEEE Trans Speech Audio Process 2(1):194

    Article  Google Scholar 

  12. Garimella S, Mallidi SH, Hermansky H (2012) Regularized auto-associative neural networks for speaker verification. IEEE Signal Process Lett 19(12):841

    Article  Google Scholar 

  13. Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32(6):74

    Article  Google Scholar 

  14. Hossain MS, Rahman MA, Muhammad G (2017) Cyber–physical cloud-oriented multi-sensory smart home framework for elderly people: an energy efficiency perspective. J Parallel Distrib Comput 103:11

    Article  Google Scholar 

  15. Huete AJ, Victores JG, Martinez S, Giménez A, Balaguer C (2012) Personal autonomy rehabilitation in home environments by a portable assistive robot. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):561

    Article  Google Scholar 

  16. Huo C, Shao Y, Gao X (2009) In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC). IEEE, pp 997–1000

  17. Jaimes A, Se be N (2007) Multimodal human–computer interaction: a survey. Comput Vis Image Underst 108(1-2):116

    Article  Google Scholar 

  18. Jose AC, Malekian R, Ye N (2016) Improving home automation security; integrating device fingerprinting into smart home. IEEE Access 4:5776

    Article  Google Scholar 

  19. Kang B, Kim D, Choo H (2017) Internet of everything: a large-scale autonomic iot gateway. IEEE Trans Multi-Scale Comput Syst 3(3):206

    Article  Google Scholar 

  20. Kelly SDT, Suryadevara NK, Mukhopadhyay SC (2013) Towards the implementation of iot for environmental condition monitoring in homes. IEEE Sensors J 13(10):3846

    Article  Google Scholar 

  21. Matza A, Bistritz Y (2014) Skew gaussian mixture models for speaker recognition. IET Signal Process 8(8):860

    Article  Google Scholar 

  22. Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining mfcc and phase information. IEEE Trans Audio, Speech, Language Process 20(4):1085

    Article  Google Scholar 

  23. Patané G, Russo M (2001) The enhanced lbg algorithm. Neural Netw 14 (9):1219

    Article  Google Scholar 

  24. Rafferty J, Nugent CD, Liu J, Chen L (2017) From activity recognition to intention recognition for assisted living within smart homes. IEEE Trans Human-Mach Syst 47(3):368

    Article  Google Scholar 

  25. Ranjan S, Hansen JH, Ranjan S, Hansen JH (2018) Curriculum learning based approaches for noise robust speaker recognition. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 26(1):197

    Article  Google Scholar 

  26. Reynolds D (1995) In: the lincoln laboratory journal. Citeseer

  27. Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671

    Article  Google Scholar 

  28. Ringnér M (2008) What is principal component analysis?. Nat Biotechnol 26 (3):303

    Article  Google Scholar 

  29. Roch M, Hurtig RR (2002) The integral decode: a smoothing technique for robust hmm-based speaker recognition. IEEE Trans Speech Audio Process 10(5):315

    Article  Google Scholar 

  30. Sadewa RA, Wirayuda TAB, Sa’adah S (2015) In: 2015 3rd International Conference on Information and Communication Technology (ICoICT). IEEE, pp 261–265

  31. Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Process Lett 17(6):599

    Article  Google Scholar 

  32. Sahidullah M, Saha G (2013) A novel windowing technique for efficient computation of mfcc for speaker recognition. IEEE Signal Process Lett 20(2):149

    Article  Google Scholar 

  33. Sarikaya R (2017) The technology behind personal digital assistants: an overview of the system architecture and key components. IEEE Signal Proc Mag 34(1):67

    Article  Google Scholar 

  34. Saunders J, Syrdal DS, Koay KL, Burke N, Dautenhahn K (2016) ”teach me–show me”—end-user personalization of a smart home and companion robot. IEEE Trans Human-Mach Syst 46(1):27

    Article  Google Scholar 

  35. Schroeter C, Mueller S, Volkhardt M, Einhorn E, Huijnen C, van den Heuvel H, van Berlo A, Bley A, Gross HM In: 2013 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1153–1159, vol 2013

  36. Son SC, Kim NW, Lee BT, Cho CH, Chong JW (2016) A time synchronization technique for coap-based home automation systems. IEEE Trans Consum Electron 62(1):10

    Article  Google Scholar 

  37. Song T, Li R, Mei B, Yu J, Xing X, Cheng X (2017) A privacy preserving communication protocol for iot applications in smart homes. IEEE Internet Things J 4(6):1844

    Article  Google Scholar 

  38. Stojmenski A, Joksimoski B, Chorbev I, Trajkovikj V (2016) In: 2016 IEEE 12th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, pp 13–18

  39. Tan ZH, Lindberg B (2010) Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J Sel Top Signal Process 4(5):798

    Article  Google Scholar 

  40. Tiwari V, Keskar A, Shivaprakash N (2016) Design of an iot enabled local network based home monitoring system with a priority scheme. Eng Technol Appl Sci Res 7(2):1464

    Google Scholar 

  41. Tiwari V, Keskar A, Shivaprakash N (2017) A reconfigurable iot architecture with energy efficient event-based data traffic reduction scheme. Int J Online Eng (iJOE) 13(02):34

    Article  Google Scholar 

  42. Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182

    Article  Google Scholar 

  43. Wu Z, Cao Z (2005) Improved mfcc-based feature for robust speaker identification. Tsinghua Sci Technol 10(2):158

    Article  Google Scholar 

  44. Wu E, Zhang P, Lu T, Gu H, Gu N (2016) In: 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, pp 560–565

  45. Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access 4:5258

    Article  Google Scholar 

  46. Yang Hy, Jing Xx (2012) In: 2012 International Conference on Machine Learning and Cybernetics (ICMLC). vol 1, IEEE, pp 321–325

  47. You CH, Lee KA, Li H (2010) Gmm-svm kernel with a bhattacharyya-based distance for speaker recognition. IEEE Trans Audio Speech Lang Process 18(6):1300

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the help of all the people involved in the preparation of the database. Authors would also like to thank Vipin Kamble and Sudhir Mishra for their suggestions in improving the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Varun Tiwari.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work was financially supported by Department of Electronics and Information Technology under Ministry of communications and IT, Government of India

Appendix: Sample commands

Appendix: Sample commands

The combination of different devices and locations leads to a good number of possible commands. These commands are categorized in 5 types on the basis of their operation and requirement. The categories with some examples are shown as follow:

  1. 1.

    Switching devices without speaker identification

    1. (a)

      Turn on the light of bedroom1,

    2. (b)

      Switch off the fan of hall,

    3. (c)

      Turn on the AC of guestroom,

    4. (d)

      Turn on the computer of bedroom2,

    5. (e)

      Turn on the night lamp of bedroom1.

  2. 2.

    Switching devices with speaker identification

    1. (a)

      Turn on the lights of my room,

    2. (b)

      Switch on my TV,

    3. (c)

      Switch off the AC,

    4. (d)

      Turn on my computer,

    5. (e)

      Switch on the light.

  3. 3.

    Reading sensor values without speaker identification

    1. (a)

      What is the temperature in the kitchen,

    2. (b)

      What is the humidity level in hall,

    3. (c)

      What is the LPG sensor reading,

    4. (d)

      How much is the light intensity outside,

    5. (e)

      Is there somebody at the entrance.

  4. 4.

    Reading sensor values with speaker identification

    1. (a)

      What is the temperature of my room,

    2. (b)

      What is the humidity in my room,

    3. (c)

      What is the temperature,

    4. (d)

      How much is the humidity,

    5. (e)

      How much is the temperature.

  5. 5.

    Special Commands (requires speaker identification along with user authentication code)

    1. (a)

      Update the database,

    2. (b)

      Add new Keyword,

    3. (c)

      Rename a location,

    4. (d)

      Schedule an event,

    5. (e)

      Add new device.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tiwari, V., Hashmi, M.F., Keskar, A. et al. Virtual home assistant for voice based controlling and scheduling with short speech speaker identification. Multimed Tools Appl 79, 5243–5268 (2020). https://doi.org/10.1007/s11042-018-6358-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6358-x

Keywords

Navigation