RELIEF-MM: effective modality weighting for multimedia information retrieval

Yilmaz, Turgay; Yazici, Adnan; Kitsuregawa, Masaru

doi:10.1007/s00530-014-0360-6

RELIEF-MM: effective modality weighting for multimedia information retrieval

Regular Paper
Published: 16 February 2014

Volume 20, pages 389–413, (2014)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Turgay Yilmaz^1,2,
Adnan Yazici¹ &
Masaru Kitsuregawa^2,3

311 Accesses
Explore all metrics

Abstract

Fusing multimodal information in multimedia data usually improves the retrieval performance. One of the major issues in multimodal fusion is how to determine the best modalities. To combine the modalities more effectively, we propose a RELIEF-based modality weighting approach, named as RELIEF-MM. The original RELIEF algorithm is extended for weaknesses in several major issues: class-specific feature selection, complexities with multi-labeled data and noise, handling unbalanced datasets, and using the algorithm with classifier predictions. RELIEF-MM employs an improved weight estimation function, which exploits the representation and reliability capabilities of modalities, as well as the discrimination capability, without any increase in the computational complexity. The comprehensive experiments conducted on TRECVID 2007, TRECVID 2008 and CCV datasets validate RELIEF-MM as an efficient, accurate and robust way of modality weighting for multimedia data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inexpensive and Effective Data Fusion Methods with Performance Weights

Data Fusion Methods with Graded Relevance Judgment

Blind late fusion in multimedia event retrieval

Article 08 November 2016

Notes

This paper is a revised and extended version of [54].
The final goal of this study is to select the effective modalities by weighting the available modalities and each modality is a multi-dimensional feature. Thus, from now on, the phrases ‘modality selection’, ‘modality weighting’ and ‘multimodal feature selection’ are used interchangeably.
This two-step process is applied for the TRECVID 2007 and 2008 datasets, where the number of modalities lead to inefficient situations. For the CCV dataset, an exhaustive weight search process is performed with 0.01 precision.
The measurements are taken on a machine with “Intel(R) Xeon(R) CPU E5530 @2.40GHz”. The values on the graph and table are obtained without a parallel programming approach.

References

Atrey, P.K., Kankanhalli, M.S., Oommen, J.B.: Goal-oriented optimal subset selection of correlated multimedia streams. ACM Trans. Multimedia Comput. Commun. Appl. 3(1) (2007). doi:10.1145/1198302.1198304
Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: Yaafe, an easy to use and efficient audio feature extraction software (2010). In: Proceedings of the 11th ISMIR Conference, Utrecht, Netherlands
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). doi:10.1145/1007730.1007733
Article Google Scholar
Chen, Y.Y., Hsu, W., Liao, H.Y.: Automatic training image acquisition and effective feature selection from community-contributed photos for facial attribute detection. Multimedia, IEEE Transactions on 15(6), 1388–1399 (2013). doi:10.1109/TMM.2013.2250492
Article Google Scholar
Dietterich, T.G.: Machine-learning research: Four current directions. The AI Magazine 18(4), 97–136 (1998)
Google Scholar
Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Proceedings of the 11th International Conference on Artificial Neural Networks Conference on Advances in Computational Intelligence-vol. Part I, IWANN’11, pp. 9–16. Springer, Berlin, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=2023252.2023255
Ferri, F.J., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Gelsema, E.S., Kamal, L.N. (eds.) Pattern Recognition in Practice IV, Multiple Paradigms, Comporative Studies and Hybrid Systems, pp. 403–413. Elsevier, Amsterdam (1994)
Fumera, G., Roli, F.: A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE TPAMI 27(6), 942–956 (2005). doi:10.1109/TPAMI.2005.109
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003). http://dl.acm.org/citation.cfm?id=944919.944968
Google Scholar
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, Department of Computer Science, University of Waikato, New Zealand (1999)
Huang, K.C., Lin, H.Y.S., Chan, J.C., Kuo, Y.H.: Learning collaborative decision-making parameters for multimodal emotion recognition. In: Multimedia and Expo (ICME), 2013 IEEE International Conference, pp. 1–6 (2013). doi:10.1109/ICME.2013.6607472
Hunt, E.B., Stone, P.J., Marin, J.: Experiments in induction/Earl B. Hunt, Janet Marin, Philip J. Stone. Academic Press, New York (1966)
Inoue, N., Kamishima, Y., Wada, T., Shinoda, K., Sato, S.: Tokyotech+canon at trecvid 2011. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)
Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recognition 38(12), 2270–2285 (2005)
Article Google Scholar
Jain, A.K., Duin, R.P., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 4–37 (2000)
Article Google Scholar
Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. Int. J. Multimedia Info. Retr. 1–29 (2012). doi:10.1007/s13735-012-0024-2
Jiang, Y.G., Yanagawa, A., Chang, S.F., Ngo, C.W.: CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection. Tech. rep., Columbia University ADVENT #223-2008-1 (2008)
Jiang, Y.G., Ye, G., Chang, S.F., Ellis, D., Loui, A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, pp. 29:1–29:8. ACM, New York, NY, USA (2011). doi:10.1145/1991996.1992025
Jiang, Y.G., Zeng, X., Ye, G., Ellis, D., Chang, S.F., Bhattacharya, S., Shah, M.: Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: P. Over, G. Awad, J.G. Fiscus, B. Antonishek, M. Michel, W. Kraaij, A.F. Smeaton, G. Quénot (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2010)
Kalamaras, I., Mademlis, A., Malassiotis, S., Tzovaras, D.: A novel framework for retrieval and interactive visualization of multimodal data. Electron. Lett. Comput. Vis. Image Anal. 12(2) (2013). http://elcvia.cvc.uab.es/article/view/518
Kankanhalli, M., Wang, J., Jain, R.: Experiential sampling on multiple data streams. Multimedia, IEEE Transactions on 8(5), 947–955 (2006)
Article Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, ML ’92, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1992). http://dl.acm.org/citation.cfm?id=645525.656966
Kittler, J.: Feature set search algorithms. In: Chen, C.H. (ed.) Pattern Recognition and Signal Processing, pp. 41–60. Sijthoff & Noordhoff International Publishers B.V., Alphen aan den Rijn, The Netherlands (1978)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 226–239 (1998)
Article Google Scholar
Kludas, J., Bruno, E., Marchand-Maillet, S.: Information fusion in multimedia information retrieval. In: Proceedings of the 5th International Workshop on Adaptive Multimedia Retrieval (AMR). Paris, France (2007)
Kludas, J., Bruno, E., Marchand-Maillet, S.: Can feature information interaction help for information fusion in multimedia problems?. Multimedia Tools Appl. 42, 57–71 (2009)
Article Google Scholar
Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relieff and f-statistic feature selections for image annotation. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 2352 –2359 (2012). doi:10.1109/CVPR.2012.6247947
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning, pp. 171–182. Springer, New York, Inc., Secaucus, NJ, USA (1994). http://dl.acm.org/citation.cfm?id=188408.188427
Liu, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artif. Intell. 159, 49–74 (2004). doi:10.1016/j.artint.2004.05.009. http://dl.acm.org/citation.cfm?id=1039211.1039214
Google Scholar
Atrey, P., Hossain, M., Saddik, A.E., Kankanhalli, M.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16, 345–379 (2010)
Article Google Scholar
Moulin, C., Largeron, C., Ducottet, C., Géry, M., Barat, C.: Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recognit. 47(1), 260–269 (2014). doi:10.1016/j.patcog.2013.06.003. http://www.sciencedirect.com/science/article/pii/S0031320313002550
MPEG: Mpeg-7 reference software experimentation model (2003). http://standards.iso.org/ittf/PubliclyAvailableStandards/c035364_ISO_IEC_15938-6(E)_Reference_Software.zip
Natarajan, P., Manohar, V., Wu, S., Tsakalidis, S., Vitaladevuni, S.N., Zhuang, X., Prasad, R., Ye, G., Liu, D., Jhuo, I., Chang, S., Izadinia, H., Saleemi, I., Shah, M., White, B., Yeh, T., Davis, L.: Bbn viser trecvid 2011 multimedia event detection system. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)
Over, P., Awad, G., Kraaij, W., Smeaton, A.F.: Trecvid 2007—overview. In: Over, P., Awad, G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2007)
Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F.: Trecvid 2008—goals, tasks, data, evaluation mechanisms and metrics. In: Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2008)
Poh, N., Kittler, J.: Multimodal Information Fusion: Theory and Applications for Human-Computer Interaction, chap 8, pp. 153–169. Academic Press, (2010)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). doi:10.1023/A:1022643204877. http://dl.acm.org/citation.cfm?id=637962.637969
Rahman, M., You, D., Simpson, M., Antani, S., Demner-Fushman, D., Thoma, G.: Multimodal biomedical image retrieval using hierarchical classification and modality fusion. Int. J. Multimedia Info. Retr. 2(3), 159–173 (2013). doi:10.1007/s13735-013-0038-4
Article Google Scholar
Robnik-Sikonja, M., Kononenko, I.: An adaptation of relief for attribute estimation in regression. In: Fisher, D.H. (ed.) ICML, pp. 296–304. Morgan Kaufmann, San Francisco (1997)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53, 23–69 (2003). doi:10.1023/A:1025667309714. http://dl.acm.org/citation.cfm?id=940854.940876
Saeys, Y., Inza, I.n., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007). doi:10.1093/bioinformatics/btm344. http://dl.acm.org/citation.cfm?id=1349154.1349169
Google Scholar
Sikonja, M.R.: Speeding up relief algorithm with k-d trees. In: Proceedings of Electrotechnical and Computer Science Conference (ERK’98), pp. 137–140 (1998)
Snidaro, L., Niu, R., Foresti, G., Varshney, P.: Quality-based fusion of multiple video sensors for video surveillance. SMC-B: Cybernetics, IEEE Trans. on 37(4), 1044–1051 (2007)
Google Scholar
Snoek, C.G.M., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25(1), 5–35 (2005)
Article Google Scholar
Sun, Y.: Iterative relief for feature weighting: Algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)
Article Google Scholar
Temko, A., Macho, D., Nadeu, C.: Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognit. 41(5), 1814–1823 (2008). doi:10.1016/j.patcog.2007.10.026. http://www.sciencedirect.com/science/article/pii/S003132030700489X
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.P.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer US, Berlin (2010)
Tumer, K., Ghosh, J.: Linear and order statistics combiners for pattern classification. CoRR cs.NE/9905012 (1999). http://dblp.uni-trier.de
Wang, L., Zhou, N., Chu, F.: A general wrapper approach to selection of class-dependent features. IEEE Transactions on Neural Networks 19(7), 1267–1278 (2008)
Article Google Scholar
Wu Q., Wang Z., Deng F., Chi Z., Feng D.: (2013) Realistic human action recognition with multimodal feature selection and fusion. Syst. Man Cybern. Syst. IEEE Trans. 43(4), 875–885. doi:10.1109/TSMCA.2012.2226575
Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM Multimedia, pp. 572–579. ACM, New York, NY, USA (2004)
Yan, R., Hauptmann, A.G.: The combination limit in multimedia retrieval. In: Proceedings of the 11th ACM International Conference on Multimedia, MULTIMEDIA ’03, pp. 339–342. ACM, New York, NY, USA (2003)
Yilmaz, T., Gulen, E., Yazici, A., Kitsuregawa, M.: A relief-based modality weighting approach for multimodal information retrieval. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR ’12, pp. 54:1–54:8. ACM, New York, NY, USA (2012). doi:10.1145/2324796.2324858
Yilmaz, T., Yazici, A., Yildirim, Y.: Exploiting class-specific features in multi-feature dissimilarity space for efficient querying of images. In: Christiansen, H., Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H. (eds.) Flexible Query Answering Systems, Lecture Notes in Computer Science, vol. 7022, pp. 149–161. Springer, Berlin, Heidelberg (2011). doi:10.1007/978-3-642-24764-4_14

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Middle East Technical University, 06531, Ankara, Turkey
Turgay Yilmaz & Adnan Yazici
Institute of Industrial Science, The University of Tokyo, Tokyo, 153-8505, Japan
Turgay Yilmaz & Masaru Kitsuregawa
National Institute of Informatics, Chiyoda-ku, Tokyo, 101-8430, Japan
Masaru Kitsuregawa

Authors

Turgay Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Yazici
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Turgay Yilmaz.

Additional information

Communicated by L. Zhang.

This work is supported in part by a research grant from TÜBİTAK EEEAG (Grant Number 109E014).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yilmaz, T., Yazici, A. & Kitsuregawa, M. RELIEF-MM: effective modality weighting for multimedia information retrieval. Multimedia Systems 20, 389–413 (2014). https://doi.org/10.1007/s00530-014-0360-6

Download citation

Received: 31 May 2013
Accepted: 17 January 2014
Published: 16 February 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s00530-014-0360-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RELIEF-MM: effective modality weighting for multimedia information retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Inexpensive and Effective Data Fusion Methods with Performance Weights

Data Fusion Methods with Graded Relevance Judgment

Blind late fusion in multimedia event retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now