Abstract
Speech is the most prominent and natural form of communication between humans. It has potential of being an important mode of interaction with computer. Man–machine interface has always been proven to be a challenging area in natural language processing and in speech recognition research. There are growing interests in developing machines that can accept speech as input. Normal person generally communicate with the computer through a mouse or keyboard. It requires training and hard work as well as knowledge about computer, which is a limitation at certain levels. Marathi is used as official language at government of Maharashtra. There is a need for developing systems that enable human–machine interaction in Indian regional languages. The objective of this research is to design and development of the Marathi speech Activated Talking Calculator (MSAC) as an interface system. The MSAC is speaker-dependent speech recognition system that is used to perform basic mathematical operation. It can recognize isolated spoken digit from 0 to 50 and basic operation like addition, subtraction, multiplication, start, stop, equal, and exit. Database is an essential requirement to design the speech recognition system. To reach up to the objectives set, a database having 22,320 sizes of vocabularies is developed. The MSAC system trained and tested using the Mel Frequency Cepstral Coefficients (MFCC), Linear Discriminative Analysis (LDA), Principal Component Analysis (PCA), Linear Predictive Codding (LPC), and Rasta-PLP individually. Training and testing of MSAC system are done with individually Mel Frequency Linear Discriminative Analysis (MFLDA), Mel Frequency Principal Component Analysis (MFPCA), Mel Frequency Discrete Wavelet Transformation (MFDWT), and Mel Frequency Linear Discrete Wavelet Transformation (MFLDWT) fusion feature extraction techniques. This experiment is proposed and tested the Wavelet Decomposed Cepstral Coefficient (WDCC) with 18, 36, and 54 coefficients approach. The performance of MSAC system is calculated on the basis of accuracy and real-time factor (RTF). From the experimental results, it is observed that the MFCC with 39 coefficients achieved higher accuracy than 13 and 26 variations. The MFLDWT is proven higher accuracy than MFLDA, MFPCA, MFDWT, and Mel Frequency Principal Discrete Wavelet Transformation (MFPDWT). From this research, we recommended that WDCC is robust and dynamic techniques than MFCC, LDA, PCA, and LPC. MSAC interface application is directly beneficial for society people for their day to day activity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 0975–8887 (2010)
Picheny, M.: Large vocabulary speech recognition 35(4):42–50 (2002)
Arokia Raj, A., Susmitha, R.C.: A voice interface for the visually impaired. In: 3rd International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 27–31, Tunisia (2005)
Roux, J.C., Botha, E.C., Du Preez, J.A.: Developing a multilingual telephone based information system in African languages. In: Proceedings of the Second International Language Resources and Evaluation Conference, no. 2, pp. 975–980. ELRA, Athens (2000)
Robertson, J., Wong, Y.T., Chung, C., Kim, D.K.: Automatic speech recognition for generalized time based media retrieval and indexing. In: Proceedings of the Sixth ACM International Conference on Multimedia, pp. 241–246. Bristol (1998)
Scan soft: Embedded speech solutions. http://www.speechworks.com/ (2004). Accessed 25 Jan 2013
Kandasamy, S.: Speech recognition systems. SURPRISE J. 1(1) (1995)
Dusan, S., Rabiner, L.R.: On integrating insights from human speech perception into automatic speech recognition. In: Proceedings of INTERSPEECH 2005. Lisbon (2005)
Shrawankar, U., Thakare, V.: Speech user interface for computer based education system. In: International Conference on Signal and Image Processing (ICSIP), pp. 148–152 (2010) (15–17 Dec)
Alt, F.L., Rubinoff, M., Yovitts, M.C.: Advances in Computers, pp. 165–230. Academic Press, New York
Rebman Jr., C.M., Aiken, M.W., Cegielski, C.G.: Speech Recognition in the Human–Computer Interface, vol. 40, Issue 6, pp. 509–519, Information & Management. Elsevier (2003)
Furui, S.: 50 Years of progress in speech and speaker recognition research. ECTI Trans. Comput. Inf. Technol. 1(2) (2005)
Nehe, N.S., Holambe, R.S.: New feature extraction techniques for Marathi digit recognition. Int. J. Recent Trends Eng. 2(2) (2009)
Bhosale, R.S.: Enhanced speech recognition using ADAG SVM approach. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 1(4) (2012)
Anumanchipalli, G., Chitturi, R., Joshi, S., Kumar, R., Singh, S.P., Sitaram, R.N.V., Kishore, S.P.: Development of indian language speech databases for large vocabulary speech recognition systems. In: Proceedings of International Conference on Speech and Computer (SPECOM). Patras (2005)
Neti, C., Rajput, N., Verma, A.: A large vocabulary continuous speech recognition system for Hindi. In: Proceedings of the National conference on Communications, pp. 366–370. Mumbai (2002)
Gawali, B.W., Gaikwad, S., Yannawar, P., Mehrotra, S.C.: Marathi Isolated Word Recognition System using MFCC and DTW Features. ACEEE (2010)
Chakraborty, K., Talele, A., Upadhya, S.: Voice recognition using MFCC algorithm. Int. J. Innovative Res. Adv. Eng. (IJIRAE) 1(10) (2014). ISSN: 2349-2163
Patel, K., Prasad, R.K.: Speech recognition and verification using MFCC & VQ. Int. J. Emerg. Sci. Eng. (IJESE) 1(7) (2013). ISSN: 2319–6378
Oh-Wook Kwon, Chan, K., Lee, T.-W.: Speech feature analysis using variational bayesian PCA. IEEE Signal Process. Lett. 10, 137–140 (2003)
Gaikwad, S., Gawali, B., Mehrotra, S.C.: Novel Approach Based Feature Extraction For Marathi Continuous Speech Recognition, pp. 795–804. ACM Digital Library, New York (2012). ISBN: 978-1-4503-1196-0/2012
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994). doi:10.1109/89.326616
Ali, H., Ahmad, N., Zhou, X., Iqbal, K., Muhammad Ali, S.: DWT features performance analysis for automatic speech recognition of Urdu. SpringerPlus 3:204 (2014) doi:10.1186/2193-1801-3-204
Tiwari, A., Zadgaonkar, A.S.: Debauchee’s wavelet analysis of speech signal of different speakers for similar speech set. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(8) (2014)
Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press (1998)
Chan, Y.T.: Wavelet Basics. Kulwer Academic Publications (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this chapter
Cite this chapter
Gaikwad, S., Gawali, B., Mehrotra, S. (2016). Design and Development of Marathi Speech Interface System. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 396. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2653-6_1
Download citation
DOI: https://doi.org/10.1007/978-81-322-2653-6_1
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2651-2
Online ISBN: 978-81-322-2653-6
eBook Packages: EngineeringEngineering (R0)