Skip to main content
Log in

RELIEF-MM: effective modality weighting for multimedia information retrieval

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Fusing multimodal information in multimedia data usually improves the retrieval performance. One of the major issues in multimodal fusion is how to determine the best modalities. To combine the modalities more effectively, we propose a RELIEF-based modality weighting approach, named as RELIEF-MM. The original RELIEF algorithm is extended for weaknesses in several major issues: class-specific feature selection, complexities with multi-labeled data and noise, handling unbalanced datasets, and using the algorithm with classifier predictions. RELIEF-MM employs an improved weight estimation function, which exploits the representation and reliability capabilities of modalities, as well as the discrimination capability, without any increase in the computational complexity. The comprehensive experiments conducted on TRECVID 2007, TRECVID 2008 and CCV datasets validate RELIEF-MM as an efficient, accurate and robust way of modality weighting for multimedia data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. This paper is a revised and extended version of [54].

  2. The final goal of this study is to select the effective modalities by weighting the available modalities and each modality is a multi-dimensional feature. Thus, from now on, the phrases ‘modality selection’, ‘modality weighting’ and ‘multimodal feature selection’ are used interchangeably.

  3. This two-step process is applied for the TRECVID 2007 and 2008 datasets, where the number of modalities lead to inefficient situations. For the CCV dataset, an exhaustive weight search process is performed with 0.01 precision.

  4. The measurements are taken on a machine with “Intel(R) Xeon(R) CPU E5530 @2.40GHz”. The values on the graph and table are obtained without a parallel programming approach.

References

  1. Atrey, P.K., Kankanhalli, M.S., Oommen, J.B.: Goal-oriented optimal subset selection of correlated multimedia streams. ACM Trans. Multimedia Comput. Commun. Appl. 3(1) (2007). doi:10.1145/1198302.1198304

  2. Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: Yaafe, an easy to use and efficient audio feature extraction software (2010). In: Proceedings of the 11th ISMIR Conference, Utrecht, Netherlands

  3. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

  4. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). doi:10.1145/1007730.1007733

    Article  Google Scholar 

  5. Chen, Y.Y., Hsu, W., Liao, H.Y.: Automatic training image acquisition and effective feature selection from community-contributed photos for facial attribute detection. Multimedia, IEEE Transactions on 15(6), 1388–1399 (2013). doi:10.1109/TMM.2013.2250492

    Article  Google Scholar 

  6. Dietterich, T.G.: Machine-learning research: Four current directions. The AI Magazine 18(4), 97–136 (1998)

    Google Scholar 

  7. Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Proceedings of the 11th International Conference on Artificial Neural Networks Conference on Advances in Computational Intelligence-vol. Part I, IWANN’11, pp. 9–16. Springer, Berlin, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=2023252.2023255

  8. Ferri, F.J., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Gelsema, E.S., Kamal, L.N. (eds.) Pattern Recognition in Practice IV, Multiple Paradigms, Comporative Studies and Hybrid Systems, pp. 403–413. Elsevier, Amsterdam (1994)

  9. Fumera, G., Roli, F.: A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE TPAMI 27(6), 942–956 (2005). doi:10.1109/TPAMI.2005.109

    Article  Google Scholar 

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003). http://dl.acm.org/citation.cfm?id=944919.944968

    Google Scholar 

  11. Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, Department of Computer Science, University of Waikato, New Zealand (1999)

  12. Huang, K.C., Lin, H.Y.S., Chan, J.C., Kuo, Y.H.: Learning collaborative decision-making parameters for multimodal emotion recognition. In: Multimedia and Expo (ICME), 2013 IEEE International Conference, pp. 1–6 (2013). doi:10.1109/ICME.2013.6607472

  13. Hunt, E.B., Stone, P.J., Marin, J.: Experiments in induction/Earl B. Hunt, Janet Marin, Philip J. Stone. Academic Press, New York (1966)

  14. Inoue, N., Kamishima, Y., Wada, T., Shinoda, K., Sato, S.: Tokyotech+canon at trecvid 2011. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)

  15. Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recognition 38(12), 2270–2285 (2005)

    Article  Google Scholar 

  16. Jain, A.K., Duin, R.P., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 4–37 (2000)

    Article  Google Scholar 

  17. Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. Int. J. Multimedia Info. Retr. 1–29 (2012). doi:10.1007/s13735-012-0024-2

  18. Jiang, Y.G., Yanagawa, A., Chang, S.F., Ngo, C.W.: CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection. Tech. rep., Columbia University ADVENT #223-2008-1 (2008)

  19. Jiang, Y.G., Ye, G., Chang, S.F., Ellis, D., Loui, A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, pp. 29:1–29:8. ACM, New York, NY, USA (2011). doi:10.1145/1991996.1992025

  20. Jiang, Y.G., Zeng, X., Ye, G., Ellis, D., Chang, S.F., Bhattacharya, S., Shah, M.: Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: P. Over, G. Awad, J.G. Fiscus, B. Antonishek, M. Michel, W. Kraaij, A.F. Smeaton, G. Quénot (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2010)

  21. Kalamaras, I., Mademlis, A., Malassiotis, S., Tzovaras, D.: A novel framework for retrieval and interactive visualization of multimodal data. Electron. Lett. Comput. Vis. Image Anal. 12(2) (2013). http://elcvia.cvc.uab.es/article/view/518

  22. Kankanhalli, M., Wang, J., Jain, R.: Experiential sampling on multiple data streams. Multimedia, IEEE Transactions on 8(5), 947–955 (2006)

    Article  Google Scholar 

  23. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, ML ’92, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1992). http://dl.acm.org/citation.cfm?id=645525.656966

  24. Kittler, J.: Feature set search algorithms. In: Chen, C.H. (ed.) Pattern Recognition and Signal Processing, pp. 41–60. Sijthoff & Noordhoff International Publishers B.V., Alphen aan den Rijn, The Netherlands (1978)

  25. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 226–239 (1998)

    Article  Google Scholar 

  26. Kludas, J., Bruno, E., Marchand-Maillet, S.: Information fusion in multimedia information retrieval. In: Proceedings of the 5th International Workshop on Adaptive Multimedia Retrieval (AMR). Paris, France (2007)

  27. Kludas, J., Bruno, E., Marchand-Maillet, S.: Can feature information interaction help for information fusion in multimedia problems?. Multimedia Tools Appl. 42, 57–71 (2009)

    Article  Google Scholar 

  28. Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relieff and f-statistic feature selections for image annotation. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 2352 –2359 (2012). doi:10.1109/CVPR.2012.6247947

  29. Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning, pp. 171–182. Springer, New York, Inc., Secaucus, NJ, USA (1994). http://dl.acm.org/citation.cfm?id=188408.188427

  30. Liu, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artif. Intell. 159, 49–74 (2004). doi:10.1016/j.artint.2004.05.009. http://dl.acm.org/citation.cfm?id=1039211.1039214

    Google Scholar 

  31. Atrey, P., Hossain, M., Saddik, A.E., Kankanhalli, M.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16, 345–379 (2010)

    Article  Google Scholar 

  32. Moulin, C., Largeron, C., Ducottet, C., Géry, M., Barat, C.: Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recognit. 47(1), 260–269 (2014). doi:10.1016/j.patcog.2013.06.003. http://www.sciencedirect.com/science/article/pii/S0031320313002550

  33. MPEG: Mpeg-7 reference software experimentation model (2003). http://standards.iso.org/ittf/PubliclyAvailableStandards/c035364_ISO_IEC_15938-6(E)_Reference_Software.zip

  34. Natarajan, P., Manohar, V., Wu, S., Tsakalidis, S., Vitaladevuni, S.N., Zhuang, X., Prasad, R., Ye, G., Liu, D., Jhuo, I., Chang, S., Izadinia, H., Saleemi, I., Shah, M., White, B., Yeh, T., Davis, L.: Bbn viser trecvid 2011 multimedia event detection system. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)

  35. Over, P., Awad, G., Kraaij, W., Smeaton, A.F.: Trecvid 2007—overview. In: Over, P., Awad, G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2007)

  36. Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F.: Trecvid 2008—goals, tasks, data, evaluation mechanisms and metrics. In: Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2008)

  37. Poh, N., Kittler, J.: Multimodal Information Fusion: Theory and Applications for Human-Computer Interaction, chap 8, pp. 153–169. Academic Press, (2010)

    Google Scholar 

  38. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). doi:10.1023/A:1022643204877. http://dl.acm.org/citation.cfm?id=637962.637969

  39. Rahman, M., You, D., Simpson, M., Antani, S., Demner-Fushman, D., Thoma, G.: Multimodal biomedical image retrieval using hierarchical classification and modality fusion. Int. J. Multimedia Info. Retr. 2(3), 159–173 (2013). doi:10.1007/s13735-013-0038-4

    Article  Google Scholar 

  40. Robnik-Sikonja, M., Kononenko, I.: An adaptation of relief for attribute estimation in regression. In: Fisher, D.H. (ed.) ICML, pp. 296–304. Morgan Kaufmann, San Francisco (1997)

  41. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53, 23–69 (2003). doi:10.1023/A:1025667309714. http://dl.acm.org/citation.cfm?id=940854.940876

  42. Saeys, Y., Inza, I.n., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007). doi:10.1093/bioinformatics/btm344. http://dl.acm.org/citation.cfm?id=1349154.1349169

    Google Scholar 

  43. Sikonja, M.R.: Speeding up relief algorithm with k-d trees. In: Proceedings of Electrotechnical and Computer Science Conference (ERK’98), pp. 137–140 (1998)

  44. Snidaro, L., Niu, R., Foresti, G., Varshney, P.: Quality-based fusion of multiple video sensors for video surveillance. SMC-B: Cybernetics, IEEE Trans. on 37(4), 1044–1051 (2007)

    Google Scholar 

  45. Snoek, C.G.M., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25(1), 5–35 (2005)

    Article  Google Scholar 

  46. Sun, Y.: Iterative relief for feature weighting: Algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)

    Article  Google Scholar 

  47. Temko, A., Macho, D., Nadeu, C.: Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognit. 41(5), 1814–1823 (2008). doi:10.1016/j.patcog.2007.10.026. http://www.sciencedirect.com/science/article/pii/S003132030700489X

    Google Scholar 

  48. Tsoumakas, G., Katakis, I., Vlahavas, I.P.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer US, Berlin (2010)

  49. Tumer, K., Ghosh, J.: Linear and order statistics combiners for pattern classification. CoRR cs.NE/9905012 (1999). http://dblp.uni-trier.de

  50. Wang, L., Zhou, N., Chu, F.: A general wrapper approach to selection of class-dependent features. IEEE Transactions on Neural Networks 19(7), 1267–1278 (2008)

    Article  Google Scholar 

  51. Wu Q., Wang Z., Deng F., Chi Z., Feng D.: (2013) Realistic human action recognition with multimodal feature selection and fusion. Syst. Man Cybern. Syst. IEEE Trans. 43(4), 875–885. doi:10.1109/TSMCA.2012.2226575

  52. Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM Multimedia, pp. 572–579. ACM, New York, NY, USA (2004)

  53. Yan, R., Hauptmann, A.G.: The combination limit in multimedia retrieval. In: Proceedings of the 11th ACM International Conference on Multimedia, MULTIMEDIA ’03, pp. 339–342. ACM, New York, NY, USA (2003)

  54. Yilmaz, T., Gulen, E., Yazici, A., Kitsuregawa, M.: A relief-based modality weighting approach for multimodal information retrieval. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR ’12, pp. 54:1–54:8. ACM, New York, NY, USA (2012). doi:10.1145/2324796.2324858

  55. Yilmaz, T., Yazici, A., Yildirim, Y.: Exploiting class-specific features in multi-feature dissimilarity space for efficient querying of images. In: Christiansen, H., Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H. (eds.) Flexible Query Answering Systems, Lecture Notes in Computer Science, vol. 7022, pp. 149–161. Springer, Berlin, Heidelberg (2011). doi:10.1007/978-3-642-24764-4_14

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Turgay Yilmaz.

Additional information

Communicated by L. Zhang.

This work is supported in part by a research grant from TÜBİTAK EEEAG (Grant Number 109E014).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yilmaz, T., Yazici, A. & Kitsuregawa, M. RELIEF-MM: effective modality weighting for multimedia information retrieval. Multimedia Systems 20, 389–413 (2014). https://doi.org/10.1007/s00530-014-0360-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0360-6

Keywords

Navigation