Abstract
With the advancement of interface technologies in smart devices, voice-controlled assistants have quickly gained popularity. These assistants are designed to use voice commands to achieve a more human-friendly interaction. On these lines, we propose a cloud-connected voice based home assistant in this paper. It accepts voice commands to control or monitor devices in a home. It can understand and schedule device operations based on time or sensor data through a simple voice based approach. To enhance its capability, it is designed to identify the speakers. Mel-Frequency Cepstrum Coefficients (MFCC) in combination with other speech features are used as feature vector. We use Vector Quantization (VQ) and Principal Component Analysis (PCA) for dimensionality reduction of the feature vector, followed by Gaussian Mixture Model (GMM) for classification. The validation of the short speech speaker identification is carried out on a set of Indian speakers in an uncontrolled indoor environment. An accuracy greater than 92% is achieved for speech samples as small as 1 second. A database of more than 50 different commands per speaker is also created for validation of the proposed virtual assistant. IBM’s Bluemix and Google’s cloud service is used for speech to text conversion.
Similar content being viewed by others
References
Ahmed E, Islam A, Sarker F, Huda MN, Abdullah-al-mamun K (2016) In: 2016 5th International Conference on Electronics and Vision (ICIEV) Informatics. IEEE, pp 472–477
Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of dwt and mfcc feature warping in the presence of noise and reverberation conditions. IEEE Access 5:15400
Alepis E, Patsakis C (2017) Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5:17841
Bijuraj L (2013) In: proceedings of National Conference on New Horizons in IT-NCNHIT, pp 169
Bizjak J, Gradišek A, Stepančič L, Gjoreski H, Gams M (2017) Intelligent assistant carer for active aging. EURASIP J Adv Signal Process 2017(1):76
Chen SC, Wu CM, Chen YJ, Chin JT, Chen YY (2017) In: 2017 International Conference on Applied System Innovation (ICASI). IEEE, pp 503–506
Cumani S, Laface P (2012) Analysis of large-scale svm training algorithms for language and speaker recognition. IEEE Trans Audio Speech, Lang Process 20(5):1585
Cumani S, Laface P (2014) Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 22 (11):1590
Dobrowolski AP, Majda E (2011) In: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings (SPA). IEEE, pp 1–6
El Ayadi M, Hassan AKS, Abdel-Naby A, Elgendy OA (2017) Text-independent speaker identification using robust statistics estimation. Speech Comm 92:52
Farrell KR, Mammone RJ, Assaleh KT (1994) Speaker recognition using neural networks and conventional classifiers. IEEE Trans Speech Audio Process 2(1):194
Garimella S, Mallidi SH, Hermansky H (2012) Regularized auto-associative neural networks for speaker verification. IEEE Signal Process Lett 19(12):841
Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32(6):74
Hossain MS, Rahman MA, Muhammad G (2017) Cyber–physical cloud-oriented multi-sensory smart home framework for elderly people: an energy efficiency perspective. J Parallel Distrib Comput 103:11
Huete AJ, Victores JG, Martinez S, Giménez A, Balaguer C (2012) Personal autonomy rehabilitation in home environments by a portable assistive robot. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):561
Huo C, Shao Y, Gao X (2009) In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC). IEEE, pp 997–1000
Jaimes A, Se be N (2007) Multimodal human–computer interaction: a survey. Comput Vis Image Underst 108(1-2):116
Jose AC, Malekian R, Ye N (2016) Improving home automation security; integrating device fingerprinting into smart home. IEEE Access 4:5776
Kang B, Kim D, Choo H (2017) Internet of everything: a large-scale autonomic iot gateway. IEEE Trans Multi-Scale Comput Syst 3(3):206
Kelly SDT, Suryadevara NK, Mukhopadhyay SC (2013) Towards the implementation of iot for environmental condition monitoring in homes. IEEE Sensors J 13(10):3846
Matza A, Bistritz Y (2014) Skew gaussian mixture models for speaker recognition. IET Signal Process 8(8):860
Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining mfcc and phase information. IEEE Trans Audio, Speech, Language Process 20(4):1085
Patané G, Russo M (2001) The enhanced lbg algorithm. Neural Netw 14 (9):1219
Rafferty J, Nugent CD, Liu J, Chen L (2017) From activity recognition to intention recognition for assisted living within smart homes. IEEE Trans Human-Mach Syst 47(3):368
Ranjan S, Hansen JH, Ranjan S, Hansen JH (2018) Curriculum learning based approaches for noise robust speaker recognition. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 26(1):197
Reynolds D (1995) In: the lincoln laboratory journal. Citeseer
Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671
Ringnér M (2008) What is principal component analysis?. Nat Biotechnol 26 (3):303
Roch M, Hurtig RR (2002) The integral decode: a smoothing technique for robust hmm-based speaker recognition. IEEE Trans Speech Audio Process 10(5):315
Sadewa RA, Wirayuda TAB, Sa’adah S (2015) In: 2015 3rd International Conference on Information and Communication Technology (ICoICT). IEEE, pp 261–265
Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Process Lett 17(6):599
Sahidullah M, Saha G (2013) A novel windowing technique for efficient computation of mfcc for speaker recognition. IEEE Signal Process Lett 20(2):149
Sarikaya R (2017) The technology behind personal digital assistants: an overview of the system architecture and key components. IEEE Signal Proc Mag 34(1):67
Saunders J, Syrdal DS, Koay KL, Burke N, Dautenhahn K (2016) ”teach me–show me”—end-user personalization of a smart home and companion robot. IEEE Trans Human-Mach Syst 46(1):27
Schroeter C, Mueller S, Volkhardt M, Einhorn E, Huijnen C, van den Heuvel H, van Berlo A, Bley A, Gross HM In: 2013 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1153–1159, vol 2013
Son SC, Kim NW, Lee BT, Cho CH, Chong JW (2016) A time synchronization technique for coap-based home automation systems. IEEE Trans Consum Electron 62(1):10
Song T, Li R, Mei B, Yu J, Xing X, Cheng X (2017) A privacy preserving communication protocol for iot applications in smart homes. IEEE Internet Things J 4(6):1844
Stojmenski A, Joksimoski B, Chorbev I, Trajkovikj V (2016) In: 2016 IEEE 12th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, pp 13–18
Tan ZH, Lindberg B (2010) Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J Sel Top Signal Process 4(5):798
Tiwari V, Keskar A, Shivaprakash N (2016) Design of an iot enabled local network based home monitoring system with a priority scheme. Eng Technol Appl Sci Res 7(2):1464
Tiwari V, Keskar A, Shivaprakash N (2017) A reconfigurable iot architecture with energy efficient event-based data traffic reduction scheme. Int J Online Eng (iJOE) 13(02):34
Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182
Wu Z, Cao Z (2005) Improved mfcc-based feature for robust speaker identification. Tsinghua Sci Technol 10(2):158
Wu E, Zhang P, Lu T, Gu H, Gu N (2016) In: 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, pp 560–565
Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access 4:5258
Yang Hy, Jing Xx (2012) In: 2012 International Conference on Machine Learning and Cybernetics (ICMLC). vol 1, IEEE, pp 321–325
You CH, Lee KA, Li H (2010) Gmm-svm kernel with a bhattacharyya-based distance for speaker recognition. IEEE Trans Audio Speech Lang Process 18(6):1300
Acknowledgements
The authors would like to acknowledge the help of all the people involved in the preparation of the database. Authors would also like to thank Vipin Kamble and Sudhir Mishra for their suggestions in improving the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work was financially supported by Department of Electronics and Information Technology under Ministry of communications and IT, Government of India
Appendix: Sample commands
Appendix: Sample commands
The combination of different devices and locations leads to a good number of possible commands. These commands are categorized in 5 types on the basis of their operation and requirement. The categories with some examples are shown as follow:
- 1.
Switching devices without speaker identification
- (a)
Turn on the light of bedroom1,
- (b)
Switch off the fan of hall,
- (c)
Turn on the AC of guestroom,
- (d)
Turn on the computer of bedroom2,
- (e)
Turn on the night lamp of bedroom1.
- (a)
- 2.
Switching devices with speaker identification
- (a)
Turn on the lights of my room,
- (b)
Switch on my TV,
- (c)
Switch off the AC,
- (d)
Turn on my computer,
- (e)
Switch on the light.
- (a)
- 3.
Reading sensor values without speaker identification
- (a)
What is the temperature in the kitchen,
- (b)
What is the humidity level in hall,
- (c)
What is the LPG sensor reading,
- (d)
How much is the light intensity outside,
- (e)
Is there somebody at the entrance.
- (a)
- 4.
Reading sensor values with speaker identification
- (a)
What is the temperature of my room,
- (b)
What is the humidity in my room,
- (c)
What is the temperature,
- (d)
How much is the humidity,
- (e)
How much is the temperature.
- (a)
- 5.
Special Commands (requires speaker identification along with user authentication code)
- (a)
Update the database,
- (b)
Add new Keyword,
- (c)
Rename a location,
- (d)
Schedule an event,
- (e)
Add new device.
- (a)
Rights and permissions
About this article
Cite this article
Tiwari, V., Hashmi, M.F., Keskar, A. et al. Virtual home assistant for voice based controlling and scheduling with short speech speaker identification. Multimed Tools Appl 79, 5243–5268 (2020). https://doi.org/10.1007/s11042-018-6358-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6358-x