Abstract
Fusing multimodal information in multimedia data usually improves the retrieval performance. One of the major issues in multimodal fusion is how to determine the best modalities. To combine the modalities more effectively, we propose a RELIEF-based modality weighting approach, named as RELIEF-MM. The original RELIEF algorithm is extended for weaknesses in several major issues: class-specific feature selection, complexities with multi-labeled data and noise, handling unbalanced datasets, and using the algorithm with classifier predictions. RELIEF-MM employs an improved weight estimation function, which exploits the representation and reliability capabilities of modalities, as well as the discrimination capability, without any increase in the computational complexity. The comprehensive experiments conducted on TRECVID 2007, TRECVID 2008 and CCV datasets validate RELIEF-MM as an efficient, accurate and robust way of modality weighting for multimedia data.












Similar content being viewed by others
Notes
This paper is a revised and extended version of [54].
The final goal of this study is to select the effective modalities by weighting the available modalities and each modality is a multi-dimensional feature. Thus, from now on, the phrases ‘modality selection’, ‘modality weighting’ and ‘multimodal feature selection’ are used interchangeably.
This two-step process is applied for the TRECVID 2007 and 2008 datasets, where the number of modalities lead to inefficient situations. For the CCV dataset, an exhaustive weight search process is performed with 0.01 precision.
The measurements are taken on a machine with “Intel(R) Xeon(R) CPU E5530 @2.40GHz”. The values on the graph and table are obtained without a parallel programming approach.
References
Atrey, P.K., Kankanhalli, M.S., Oommen, J.B.: Goal-oriented optimal subset selection of correlated multimedia streams. ACM Trans. Multimedia Comput. Commun. Appl. 3(1) (2007). doi:10.1145/1198302.1198304
Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: Yaafe, an easy to use and efficient audio feature extraction software (2010). In: Proceedings of the 11th ISMIR Conference, Utrecht, Netherlands
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). doi:10.1145/1007730.1007733
Chen, Y.Y., Hsu, W., Liao, H.Y.: Automatic training image acquisition and effective feature selection from community-contributed photos for facial attribute detection. Multimedia, IEEE Transactions on 15(6), 1388–1399 (2013). doi:10.1109/TMM.2013.2250492
Dietterich, T.G.: Machine-learning research: Four current directions. The AI Magazine 18(4), 97–136 (1998)
Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Proceedings of the 11th International Conference on Artificial Neural Networks Conference on Advances in Computational Intelligence-vol. Part I, IWANN’11, pp. 9–16. Springer, Berlin, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=2023252.2023255
Ferri, F.J., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Gelsema, E.S., Kamal, L.N. (eds.) Pattern Recognition in Practice IV, Multiple Paradigms, Comporative Studies and Hybrid Systems, pp. 403–413. Elsevier, Amsterdam (1994)
Fumera, G., Roli, F.: A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE TPAMI 27(6), 942–956 (2005). doi:10.1109/TPAMI.2005.109
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003). http://dl.acm.org/citation.cfm?id=944919.944968
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, Department of Computer Science, University of Waikato, New Zealand (1999)
Huang, K.C., Lin, H.Y.S., Chan, J.C., Kuo, Y.H.: Learning collaborative decision-making parameters for multimodal emotion recognition. In: Multimedia and Expo (ICME), 2013 IEEE International Conference, pp. 1–6 (2013). doi:10.1109/ICME.2013.6607472
Hunt, E.B., Stone, P.J., Marin, J.: Experiments in induction/Earl B. Hunt, Janet Marin, Philip J. Stone. Academic Press, New York (1966)
Inoue, N., Kamishima, Y., Wada, T., Shinoda, K., Sato, S.: Tokyotech+canon at trecvid 2011. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)
Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recognition 38(12), 2270–2285 (2005)
Jain, A.K., Duin, R.P., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 4–37 (2000)
Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. Int. J. Multimedia Info. Retr. 1–29 (2012). doi:10.1007/s13735-012-0024-2
Jiang, Y.G., Yanagawa, A., Chang, S.F., Ngo, C.W.: CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection. Tech. rep., Columbia University ADVENT #223-2008-1 (2008)
Jiang, Y.G., Ye, G., Chang, S.F., Ellis, D., Loui, A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, pp. 29:1–29:8. ACM, New York, NY, USA (2011). doi:10.1145/1991996.1992025
Jiang, Y.G., Zeng, X., Ye, G., Ellis, D., Chang, S.F., Bhattacharya, S., Shah, M.: Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: P. Over, G. Awad, J.G. Fiscus, B. Antonishek, M. Michel, W. Kraaij, A.F. Smeaton, G. Quénot (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2010)
Kalamaras, I., Mademlis, A., Malassiotis, S., Tzovaras, D.: A novel framework for retrieval and interactive visualization of multimodal data. Electron. Lett. Comput. Vis. Image Anal. 12(2) (2013). http://elcvia.cvc.uab.es/article/view/518
Kankanhalli, M., Wang, J., Jain, R.: Experiential sampling on multiple data streams. Multimedia, IEEE Transactions on 8(5), 947–955 (2006)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, ML ’92, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1992). http://dl.acm.org/citation.cfm?id=645525.656966
Kittler, J.: Feature set search algorithms. In: Chen, C.H. (ed.) Pattern Recognition and Signal Processing, pp. 41–60. Sijthoff & Noordhoff International Publishers B.V., Alphen aan den Rijn, The Netherlands (1978)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 226–239 (1998)
Kludas, J., Bruno, E., Marchand-Maillet, S.: Information fusion in multimedia information retrieval. In: Proceedings of the 5th International Workshop on Adaptive Multimedia Retrieval (AMR). Paris, France (2007)
Kludas, J., Bruno, E., Marchand-Maillet, S.: Can feature information interaction help for information fusion in multimedia problems?. Multimedia Tools Appl. 42, 57–71 (2009)
Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relieff and f-statistic feature selections for image annotation. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 2352 –2359 (2012). doi:10.1109/CVPR.2012.6247947
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning, pp. 171–182. Springer, New York, Inc., Secaucus, NJ, USA (1994). http://dl.acm.org/citation.cfm?id=188408.188427
Liu, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artif. Intell. 159, 49–74 (2004). doi:10.1016/j.artint.2004.05.009. http://dl.acm.org/citation.cfm?id=1039211.1039214
Atrey, P., Hossain, M., Saddik, A.E., Kankanhalli, M.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16, 345–379 (2010)
Moulin, C., Largeron, C., Ducottet, C., Géry, M., Barat, C.: Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recognit. 47(1), 260–269 (2014). doi:10.1016/j.patcog.2013.06.003. http://www.sciencedirect.com/science/article/pii/S0031320313002550
MPEG: Mpeg-7 reference software experimentation model (2003). http://standards.iso.org/ittf/PubliclyAvailableStandards/c035364_ISO_IEC_15938-6(E)_Reference_Software.zip
Natarajan, P., Manohar, V., Wu, S., Tsakalidis, S., Vitaladevuni, S.N., Zhuang, X., Prasad, R., Ye, G., Liu, D., Jhuo, I., Chang, S., Izadinia, H., Saleemi, I., Shah, M., White, B., Yeh, T., Davis, L.: Bbn viser trecvid 2011 multimedia event detection system. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)
Over, P., Awad, G., Kraaij, W., Smeaton, A.F.: Trecvid 2007—overview. In: Over, P., Awad, G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2007)
Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F.: Trecvid 2008—goals, tasks, data, evaluation mechanisms and metrics. In: Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2008)
Poh, N., Kittler, J.: Multimodal Information Fusion: Theory and Applications for Human-Computer Interaction, chap 8, pp. 153–169. Academic Press, (2010)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). doi:10.1023/A:1022643204877. http://dl.acm.org/citation.cfm?id=637962.637969
Rahman, M., You, D., Simpson, M., Antani, S., Demner-Fushman, D., Thoma, G.: Multimodal biomedical image retrieval using hierarchical classification and modality fusion. Int. J. Multimedia Info. Retr. 2(3), 159–173 (2013). doi:10.1007/s13735-013-0038-4
Robnik-Sikonja, M., Kononenko, I.: An adaptation of relief for attribute estimation in regression. In: Fisher, D.H. (ed.) ICML, pp. 296–304. Morgan Kaufmann, San Francisco (1997)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53, 23–69 (2003). doi:10.1023/A:1025667309714. http://dl.acm.org/citation.cfm?id=940854.940876
Saeys, Y., Inza, I.n., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007). doi:10.1093/bioinformatics/btm344. http://dl.acm.org/citation.cfm?id=1349154.1349169
Sikonja, M.R.: Speeding up relief algorithm with k-d trees. In: Proceedings of Electrotechnical and Computer Science Conference (ERK’98), pp. 137–140 (1998)
Snidaro, L., Niu, R., Foresti, G., Varshney, P.: Quality-based fusion of multiple video sensors for video surveillance. SMC-B: Cybernetics, IEEE Trans. on 37(4), 1044–1051 (2007)
Snoek, C.G.M., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25(1), 5–35 (2005)
Sun, Y.: Iterative relief for feature weighting: Algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)
Temko, A., Macho, D., Nadeu, C.: Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognit. 41(5), 1814–1823 (2008). doi:10.1016/j.patcog.2007.10.026. http://www.sciencedirect.com/science/article/pii/S003132030700489X
Tsoumakas, G., Katakis, I., Vlahavas, I.P.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer US, Berlin (2010)
Tumer, K., Ghosh, J.: Linear and order statistics combiners for pattern classification. CoRR cs.NE/9905012 (1999). http://dblp.uni-trier.de
Wang, L., Zhou, N., Chu, F.: A general wrapper approach to selection of class-dependent features. IEEE Transactions on Neural Networks 19(7), 1267–1278 (2008)
Wu Q., Wang Z., Deng F., Chi Z., Feng D.: (2013) Realistic human action recognition with multimodal feature selection and fusion. Syst. Man Cybern. Syst. IEEE Trans. 43(4), 875–885. doi:10.1109/TSMCA.2012.2226575
Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM Multimedia, pp. 572–579. ACM, New York, NY, USA (2004)
Yan, R., Hauptmann, A.G.: The combination limit in multimedia retrieval. In: Proceedings of the 11th ACM International Conference on Multimedia, MULTIMEDIA ’03, pp. 339–342. ACM, New York, NY, USA (2003)
Yilmaz, T., Gulen, E., Yazici, A., Kitsuregawa, M.: A relief-based modality weighting approach for multimodal information retrieval. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR ’12, pp. 54:1–54:8. ACM, New York, NY, USA (2012). doi:10.1145/2324796.2324858
Yilmaz, T., Yazici, A., Yildirim, Y.: Exploiting class-specific features in multi-feature dissimilarity space for efficient querying of images. In: Christiansen, H., Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H. (eds.) Flexible Query Answering Systems, Lecture Notes in Computer Science, vol. 7022, pp. 149–161. Springer, Berlin, Heidelberg (2011). doi:10.1007/978-3-642-24764-4_14
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by L. Zhang.
This work is supported in part by a research grant from TÜBİTAK EEEAG (Grant Number 109E014).
Rights and permissions
About this article
Cite this article
Yilmaz, T., Yazici, A. & Kitsuregawa, M. RELIEF-MM: effective modality weighting for multimedia information retrieval. Multimedia Systems 20, 389–413 (2014). https://doi.org/10.1007/s00530-014-0360-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0360-6