Fusion of Speech and Face by Enhanced Modular Neural Network

Kala, Rahul; Vazirani, Harsh; Shukla, Anupam; Tiwari, Ritu

doi:10.1007/978-3-642-12035-0_37

Fusion of Speech and Face by Enhanced Modular Neural Network

Rahul Kala⁶,
Harsh Vazirani⁶,
Anupam Shukla⁶ &
…
Ritu Tiwari⁶

Conference paper

1990 Accesses
13 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 54))

Abstract

Biometric Identification is a very old field where we try to identify people by their biometric identities. The field shifted to bi-modal systems where more than one modality was used for the identification purposes. The bi-modal systems face problem related to high dimensionality that may many times result in problems. The individual modules already have large dimensionality. Their fusion adds up the dimensionality resulting in still larger dimensionality. In this paper we solve these problems by the introduction of modularity at these attributes. Here we divide various attributes among various modules of the modular neural network. This limits their dimensionality without much loss in information. The integrator collects the probabilities of the occurrences of the various classes as outputs from these neural networks. The integrator averages these probabilities from the various modules to get the final probability of the occurrence of each class. This averaging is performed on the basis of the efficiencies of the modules at the time of training. A module that is well trained is hence expected to give a better performance than the one which is not well trained. In this manner the final probability vector may be calculated. Then the integrator selects the class that has the highest probability of occurrence. This class is returned as the output class. We tested this algorithm over the fusion of face and speech. The algorithm gave good recognition of 97.5%. This shows the efficiency of the algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of Face and Speech Data for Person Identity Verification. IEEE Transactions On Neural Networks 10(5), 1065 (1999)
Article Google Scholar
Jain, A., Hong, L., Pankanti, S.: Biometric Identification. Communications of the ACM 43(2), 90–98 (2000)
Article Google Scholar
Chen, C.-H., Chu, C.-T.: Combining Multiple Features for High Performance Face Recognition System. In: International Computer Symposium (ICS 2004) Taipei, December 2004, pp. 387–392 (2004)
Google Scholar
Snelick, R., Indovina, M., Yen, J., Mink, A.: Multimodal Biometrics: Issues in Design and Testing. In: ICMI 2003, Canada, November 5-7, pp. 68–72 (2003)
Google Scholar
Ross, A., Jain, A.: Information fusion in biometrics. Pattern Recognition Letters (24), 2115–2125 (2003)
Google Scholar
Rukhin, A.L., Malioutov, I.: Fusion of Biometric Algorithm in the Recognition Problem. Pattern Recogition Letters, 299–314 (2001)
Google Scholar
Frischholz, R.W., Dieckmann, U.: Bioid: A Multimodal Biometric Identification System. IEEE Computer (33), 64–68 (2000)
Google Scholar
Bigün, J., Bigün, B., Fischer, S.: Expert conciliation for multi modal person authentication systems by Bayesian statistics. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 291–300. Springer, Heidelberg (1997)
Chapter Google Scholar
Choudhury, T., Clarkson, B., Jebara, T., Pentland, A.: Multimodal person recognition using unconstrained audio and video. In: Proc. 2ndInt Conf. Audio-Video Based Person Authentication, Washington, DC, March 22-23, pp. 176–180 (1999)
Google Scholar
Ben-Yacoub, S.: Multimodal data fusion for person authentication using SVM. In: Proc. 2nd Int. Conf. Audio-Video Based Biometric Person Authentication, Washington, DC, March 22–23, pp. 25–30 (1999)
Google Scholar
Patterson, E.K., Gurbuz, S., Tufekci, Z., Gowdy, J.N.: Noise-based audio-visual fusion for robust speech recognition. In: International Conference on Auditory-Visual Speech Processing, Denmark (2001)
Google Scholar
Sanderson, C., Paliwal, K.K.: Information Fusion and Person Verification Using Speech & Face Information, IDIAP, Martigny, Research Report, 02-33 (2002)
Google Scholar
Shukla, A., Tiwari, R.: A Novel Approach of Speaker Authentication by Fusion of Speech and Image Features using ANN. International Journal of Information and Communication Technology (IJICT) (1)(2), 159–170 (2008)
Google Scholar
Jain, A.K., Hong, L., Kulkarni, Y.: A multimodal biometric system using fingerprints, face and speech. In: Proc 2nd Int Conf Audio-Video Based Biometric Person Authentication, Washington, D.C., March 22-23, pp. 182–187 (1999)
Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Machine Intell. 20, 226–239 (1998)
Article Google Scholar
Fogelman Soulie, F., Viennet, E., Lamy, B.: Multi-modular neural network architectures: applications in optical character and human face recognition. International Journal of Pattern Recognition and Artificial Intelligence 7(4), 721–755 (1993)
Article Google Scholar
Perrone, M.P., Cooper, L.N.: When Networks Disagree: Ensemble Methods for Hybird Neural Networks. In: Neural Networks for Speech and Image Processing (1993)
Google Scholar
Gonzalez, R.C., Wood, R.E.: Digital Image Processing. In: Pearson Education Asia (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Indian Institute of Information Technology and Management Gwalior, Gwalior, MP, India
Rahul Kala, Harsh Vazirani, Anupam Shukla & Ritu Tiwari

Authors

Rahul Kala
View author publications
You can also search for this author in PubMed Google Scholar
Harsh Vazirani
View author publications
You can also search for this author in PubMed Google Scholar
Anupam Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Ritu Tiwari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Georgia State University, 34 Peachtree Street, 30303, Atlanta, GA, USA
Sushil K. Prasad
Tata Research, Development and Design Center (TRDDC), Pune, India
Harrick M. Vin
CISE Department, CSE 301, University of Florida, FL 32611, Gainesville, USA
Sartaj Sahni
Management Development Institute (MDI), Mehrauli Road, Sukhrali, Gurgaon, India
Mahadeo P. Jaiswal
Dept. of Computer Engineering, King Mongkut’s University of Technology, Thonburi, 10140, Bangkok, Thailand
Bundit Thipakorn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kala, R., Vazirani, H., Shukla, A., Tiwari, R. (2010). Fusion of Speech and Face by Enhanced Modular Neural Network. In: Prasad, S.K., Vin, H.M., Sahni, S., Jaiswal, M.P., Thipakorn, B. (eds) Information Systems, Technology and Management. ICISTM 2010. Communications in Computer and Information Science, vol 54. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12035-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-12035-0_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12034-3
Online ISBN: 978-3-642-12035-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics