Skip to main content

Fusion of Speech and Face by Enhanced Modular Neural Network

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 54))

Abstract

Biometric Identification is a very old field where we try to identify people by their biometric identities. The field shifted to bi-modal systems where more than one modality was used for the identification purposes. The bi-modal systems face problem related to high dimensionality that may many times result in problems. The individual modules already have large dimensionality. Their fusion adds up the dimensionality resulting in still larger dimensionality. In this paper we solve these problems by the introduction of modularity at these attributes. Here we divide various attributes among various modules of the modular neural network. This limits their dimensionality without much loss in information. The integrator collects the probabilities of the occurrences of the various classes as outputs from these neural networks. The integrator averages these probabilities from the various modules to get the final probability of the occurrence of each class. This averaging is performed on the basis of the efficiencies of the modules at the time of training. A module that is well trained is hence expected to give a better performance than the one which is not well trained. In this manner the final probability vector may be calculated. Then the integrator selects the class that has the highest probability of occurrence. This class is returned as the output class. We tested this algorithm over the fusion of face and speech. The algorithm gave good recognition of 97.5%. This shows the efficiency of the algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of Face and Speech Data for Person Identity Verification. IEEE Transactions On Neural Networks 10(5), 1065 (1999)

    Article  Google Scholar 

  2. Jain, A., Hong, L., Pankanti, S.: Biometric Identification. Communications of the ACM 43(2), 90–98 (2000)

    Article  Google Scholar 

  3. Chen, C.-H., Chu, C.-T.: Combining Multiple Features for High Performance Face Recognition System. In: International Computer Symposium (ICS 2004) Taipei, December 2004, pp. 387–392 (2004)

    Google Scholar 

  4. Snelick, R., Indovina, M., Yen, J., Mink, A.: Multimodal Biometrics: Issues in Design and Testing. In: ICMI 2003, Canada, November 5-7, pp. 68–72 (2003)

    Google Scholar 

  5. Ross, A., Jain, A.: Information fusion in biometrics. Pattern Recognition Letters (24), 2115–2125 (2003)

    Google Scholar 

  6. Rukhin, A.L., Malioutov, I.: Fusion of Biometric Algorithm in the Recognition Problem. Pattern Recogition Letters, 299–314 (2001)

    Google Scholar 

  7. Frischholz, R.W., Dieckmann, U.: Bioid: A Multimodal Biometric Identification System. IEEE Computer (33), 64–68 (2000)

    Google Scholar 

  8. Bigün, J., Bigün, B., Fischer, S.: Expert conciliation for multi modal person authentication systems by Bayesian statistics. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 291–300. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  9. Choudhury, T., Clarkson, B., Jebara, T., Pentland, A.: Multimodal person recognition using unconstrained audio and video. In: Proc. 2ndInt Conf. Audio-Video Based Person Authentication, Washington, DC, March 22-23, pp. 176–180 (1999)

    Google Scholar 

  10. Ben-Yacoub, S.: Multimodal data fusion for person authentication using SVM. In: Proc. 2nd Int. Conf. Audio-Video Based Biometric Person Authentication, Washington, DC, March 22–23, pp. 25–30 (1999)

    Google Scholar 

  11. Patterson, E.K., Gurbuz, S., Tufekci, Z., Gowdy, J.N.: Noise-based audio-visual fusion for robust speech recognition. In: International Conference on Auditory-Visual Speech Processing, Denmark (2001)

    Google Scholar 

  12. Sanderson, C., Paliwal, K.K.: Information Fusion and Person Verification Using Speech & Face Information, IDIAP, Martigny, Research Report, 02-33 (2002)

    Google Scholar 

  13. Shukla, A., Tiwari, R.: A Novel Approach of Speaker Authentication by Fusion of Speech and Image Features using ANN. International Journal of Information and Communication Technology (IJICT) (1)(2), 159–170 (2008)

    Google Scholar 

  14. Jain, A.K., Hong, L., Kulkarni, Y.: A multimodal biometric system using fingerprints, face and speech. In: Proc 2nd Int Conf Audio-Video Based Biometric Person Authentication, Washington, D.C., March 22-23, pp. 182–187 (1999)

    Google Scholar 

  15. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Machine Intell. 20, 226–239 (1998)

    Article  Google Scholar 

  16. Fogelman Soulie, F., Viennet, E., Lamy, B.: Multi-modular neural network architectures: applications in optical character and human face recognition. International Journal of Pattern Recognition and Artificial Intelligence 7(4), 721–755 (1993)

    Article  Google Scholar 

  17. Perrone, M.P., Cooper, L.N.: When Networks Disagree: Ensemble Methods for Hybird Neural Networks. In: Neural Networks for Speech and Image Processing (1993)

    Google Scholar 

  18. Gonzalez, R.C., Wood, R.E.: Digital Image Processing. In: Pearson Education Asia (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kala, R., Vazirani, H., Shukla, A., Tiwari, R. (2010). Fusion of Speech and Face by Enhanced Modular Neural Network. In: Prasad, S.K., Vin, H.M., Sahni, S., Jaiswal, M.P., Thipakorn, B. (eds) Information Systems, Technology and Management. ICISTM 2010. Communications in Computer and Information Science, vol 54. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12035-0_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12035-0_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12034-3

  • Online ISBN: 978-3-642-12035-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics