Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

Tiwari, Varun; Hashmi, Mohammad Farukh; Keskar, Avinash; Shivaprakash, N. C.

doi:10.1007/s11042-018-6358-x

Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

Published: 23 July 2018

Volume 79, pages 5243–5268, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Varun Tiwari¹,
Mohammad Farukh Hashmi²,
Avinash Keskar¹ &
…
N. C. Shivaprakash³

1483 Accesses
13 Citations
Explore all metrics

Abstract

With the advancement of interface technologies in smart devices, voice-controlled assistants have quickly gained popularity. These assistants are designed to use voice commands to achieve a more human-friendly interaction. On these lines, we propose a cloud-connected voice based home assistant in this paper. It accepts voice commands to control or monitor devices in a home. It can understand and schedule device operations based on time or sensor data through a simple voice based approach. To enhance its capability, it is designed to identify the speakers. Mel-Frequency Cepstrum Coefficients (MFCC) in combination with other speech features are used as feature vector. We use Vector Quantization (VQ) and Principal Component Analysis (PCA) for dimensionality reduction of the feature vector, followed by Gaussian Mixture Model (GMM) for classification. The validation of the short speech speaker identification is carried out on a set of Indian speakers in an uncontrolled indoor environment. An accuracy greater than 92% is achieved for speech samples as small as 1 second. A database of more than 50 different commands per speaker is also created for validation of the proposed virtual assistant. IBM’s Bluemix and Google’s cloud service is used for speech to text conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous Speech Recognition and Identification of the Speaker System

Building a speech recognition system with privacy identification information based on Google Voice for social robots

Article 12 April 2022

Text-dependent Speaker Recognition System Based on Speaking Frequency Characteristics

References

Ahmed E, Islam A, Sarker F, Huda MN, Abdullah-al-mamun K (2016) In: 2016 5th International Conference on Electronics and Vision (ICIEV) Informatics. IEEE, pp 472–477
Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of dwt and mfcc feature warping in the presence of noise and reverberation conditions. IEEE Access 5:15400
Article Google Scholar
Alepis E, Patsakis C (2017) Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5:17841
Article Google Scholar
Bijuraj L (2013) In: proceedings of National Conference on New Horizons in IT-NCNHIT, pp 169
Bizjak J, Gradišek A, Stepančič L, Gjoreski H, Gams M (2017) Intelligent assistant carer for active aging. EURASIP J Adv Signal Process 2017(1):76
Article Google Scholar
Chen SC, Wu CM, Chen YJ, Chin JT, Chen YY (2017) In: 2017 International Conference on Applied System Innovation (ICASI). IEEE, pp 503–506
Cumani S, Laface P (2012) Analysis of large-scale svm training algorithms for language and speaker recognition. IEEE Trans Audio Speech, Lang Process 20(5):1585
Article Google Scholar
Cumani S, Laface P (2014) Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 22 (11):1590
Article Google Scholar
Dobrowolski AP, Majda E (2011) In: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings (SPA). IEEE, pp 1–6
El Ayadi M, Hassan AKS, Abdel-Naby A, Elgendy OA (2017) Text-independent speaker identification using robust statistics estimation. Speech Comm 92:52
Article Google Scholar
Farrell KR, Mammone RJ, Assaleh KT (1994) Speaker recognition using neural networks and conventional classifiers. IEEE Trans Speech Audio Process 2(1):194
Article Google Scholar
Garimella S, Mallidi SH, Hermansky H (2012) Regularized auto-associative neural networks for speaker verification. IEEE Signal Process Lett 19(12):841
Article Google Scholar
Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32(6):74
Article Google Scholar
Hossain MS, Rahman MA, Muhammad G (2017) Cyber–physical cloud-oriented multi-sensory smart home framework for elderly people: an energy efficiency perspective. J Parallel Distrib Comput 103:11
Article Google Scholar
Huete AJ, Victores JG, Martinez S, Giménez A, Balaguer C (2012) Personal autonomy rehabilitation in home environments by a portable assistive robot. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):561
Article Google Scholar
Huo C, Shao Y, Gao X (2009) In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC). IEEE, pp 997–1000
Jaimes A, Se be N (2007) Multimodal human–computer interaction: a survey. Comput Vis Image Underst 108(1-2):116
Article Google Scholar
Jose AC, Malekian R, Ye N (2016) Improving home automation security; integrating device fingerprinting into smart home. IEEE Access 4:5776
Article Google Scholar
Kang B, Kim D, Choo H (2017) Internet of everything: a large-scale autonomic iot gateway. IEEE Trans Multi-Scale Comput Syst 3(3):206
Article Google Scholar
Kelly SDT, Suryadevara NK, Mukhopadhyay SC (2013) Towards the implementation of iot for environmental condition monitoring in homes. IEEE Sensors J 13(10):3846
Article Google Scholar
Matza A, Bistritz Y (2014) Skew gaussian mixture models for speaker recognition. IET Signal Process 8(8):860
Article Google Scholar
Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining mfcc and phase information. IEEE Trans Audio, Speech, Language Process 20(4):1085
Article Google Scholar
Patané G, Russo M (2001) The enhanced lbg algorithm. Neural Netw 14 (9):1219
Article Google Scholar
Rafferty J, Nugent CD, Liu J, Chen L (2017) From activity recognition to intention recognition for assisted living within smart homes. IEEE Trans Human-Mach Syst 47(3):368
Article Google Scholar
Ranjan S, Hansen JH, Ranjan S, Hansen JH (2018) Curriculum learning based approaches for noise robust speaker recognition. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 26(1):197
Article Google Scholar
Reynolds D (1995) In: the lincoln laboratory journal. Citeseer
Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671
Article Google Scholar
Ringnér M (2008) What is principal component analysis?. Nat Biotechnol 26 (3):303
Article Google Scholar
Roch M, Hurtig RR (2002) The integral decode: a smoothing technique for robust hmm-based speaker recognition. IEEE Trans Speech Audio Process 10(5):315
Article Google Scholar
Sadewa RA, Wirayuda TAB, Sa’adah S (2015) In: 2015 3rd International Conference on Information and Communication Technology (ICoICT). IEEE, pp 261–265
Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Process Lett 17(6):599
Article Google Scholar
Sahidullah M, Saha G (2013) A novel windowing technique for efficient computation of mfcc for speaker recognition. IEEE Signal Process Lett 20(2):149
Article Google Scholar
Sarikaya R (2017) The technology behind personal digital assistants: an overview of the system architecture and key components. IEEE Signal Proc Mag 34(1):67
Article Google Scholar
Saunders J, Syrdal DS, Koay KL, Burke N, Dautenhahn K (2016) ”teach me–show me”—end-user personalization of a smart home and companion robot. IEEE Trans Human-Mach Syst 46(1):27
Article Google Scholar
Schroeter C, Mueller S, Volkhardt M, Einhorn E, Huijnen C, van den Heuvel H, van Berlo A, Bley A, Gross HM In: 2013 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1153–1159, vol 2013
Son SC, Kim NW, Lee BT, Cho CH, Chong JW (2016) A time synchronization technique for coap-based home automation systems. IEEE Trans Consum Electron 62(1):10
Article Google Scholar
Song T, Li R, Mei B, Yu J, Xing X, Cheng X (2017) A privacy preserving communication protocol for iot applications in smart homes. IEEE Internet Things J 4(6):1844
Article Google Scholar
Stojmenski A, Joksimoski B, Chorbev I, Trajkovikj V (2016) In: 2016 IEEE 12th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, pp 13–18
Tan ZH, Lindberg B (2010) Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J Sel Top Signal Process 4(5):798
Article Google Scholar
Tiwari V, Keskar A, Shivaprakash N (2016) Design of an iot enabled local network based home monitoring system with a priority scheme. Eng Technol Appl Sci Res 7(2):1464
Google Scholar
Tiwari V, Keskar A, Shivaprakash N (2017) A reconfigurable iot architecture with energy efficient event-based data traffic reduction scheme. Int J Online Eng (iJOE) 13(02):34
Article Google Scholar
Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182
Article Google Scholar
Wu Z, Cao Z (2005) Improved mfcc-based feature for robust speaker identification. Tsinghua Sci Technol 10(2):158
Article Google Scholar
Wu E, Zhang P, Lu T, Gu H, Gu N (2016) In: 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, pp 560–565
Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access 4:5258
Article Google Scholar
Yang Hy, Jing Xx (2012) In: 2012 International Conference on Machine Learning and Cybernetics (ICMLC). vol 1, IEEE, pp 321–325
You CH, Lee KA, Li H (2010) Gmm-svm kernel with a bhattacharyya-based distance for speaker recognition. IEEE Trans Audio Speech Lang Process 18(6):1300
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the help of all the people involved in the preparation of the database. Authors would also like to thank Vipin Kamble and Sudhir Mishra for their suggestions in improving the paper.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Visvesvaraya National Institute of Technology, South Ambazari Road, Nagpur, 40010, India
Varun Tiwari & Avinash Keskar
Department of Electronics and Communication Engineering, National Institute of Technology Campus, Warangal, Telangana, 506004, India
Mohammad Farukh Hashmi
Department of Instrumentation and Applied Physics, Indian Institute of Science, C V Raman Ave, Bengaluru, 560012, India
N. C. Shivaprakash

Authors

Varun Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Farukh Hashmi
View author publications
You can also search for this author in PubMed Google Scholar
Avinash Keskar
View author publications
You can also search for this author in PubMed Google Scholar
N. C. Shivaprakash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Varun Tiwari.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work was financially supported by Department of Electronics and Information Technology under Ministry of communications and IT, Government of India

Appendix: Sample commands

The combination of different devices and locations leads to a good number of possible commands. These commands are categorized in 5 types on the basis of their operation and requirement. The categories with some examples are shown as follow:

1.
Switching devices without speaker identification
1. (a)
  Turn on the light of bedroom1,
2. (b)
  Switch off the fan of hall,
3. (c)
  Turn on the AC of guestroom,
4. (d)
  Turn on the computer of bedroom2,
5. (e)
  Turn on the night lamp of bedroom1.
2.
Switching devices with speaker identification
1. (a)
  Turn on the lights of my room,
2. (b)
  Switch on my TV,
3. (c)
  Switch off the AC,
4. (d)
  Turn on my computer,
5. (e)
  Switch on the light.
3.
Reading sensor values without speaker identification
1. (a)
  What is the temperature in the kitchen,
2. (b)
  What is the humidity level in hall,
3. (c)
  What is the LPG sensor reading,
4. (d)
  How much is the light intensity outside,
5. (e)
  Is there somebody at the entrance.
4.
Reading sensor values with speaker identification
1. (a)
  What is the temperature of my room,
2. (b)
  What is the humidity in my room,
3. (c)
  What is the temperature,
4. (d)
  How much is the humidity,
5. (e)
  How much is the temperature.
5.
Special Commands (requires speaker identification along with user authentication code)
1. (a)
  Update the database,
2. (b)
  Add new Keyword,
3. (c)
  Rename a location,
4. (d)
  Schedule an event,
5. (e)
  Add new device.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tiwari, V., Hashmi, M.F., Keskar, A. et al. Virtual home assistant for voice based controlling and scheduling with short speech speaker identification. Multimed Tools Appl 79, 5243–5268 (2020). https://doi.org/10.1007/s11042-018-6358-x

Download citation

Received: 20 May 2018
Revised: 26 June 2018
Accepted: 29 June 2018
Published: 23 July 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11042-018-6358-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

Abstract

Access this article

Similar content being viewed by others

Continuous Speech Recognition and Identification of the Speaker System

Building a speech recognition system with privacy identification information based on Google Voice for social robots

Text-dependent Speaker Recognition System Based on Speaking Frequency Characteristics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix: Sample commands

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

Abstract

Access this article

Similar content being viewed by others

Continuous Speech Recognition and Identification of the Speaker System

Building a speech recognition system with privacy identification information based on Google Voice for social robots

Text-dependent Speaker Recognition System Based on Speaking Frequency Characteristics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix: Sample commands

Appendix: Sample commands

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation