Abstract
The recent development of the digital camera technology and the popularity of social network websites such as Facebook and Youtube have created huge amounts of multimedia data. Multimedia information is ubiquitous and essential in many applications. In order to fill the gap between data and application requirements (or the so-called semantic gap), advanced methods and tools are needed to automatically mine and annotate high-level concepts to assist in associating the low-level features to the high-level concepts directly. It has been shown that concept-concept association can be effective in bridging the semantic gap in multimedia data. In this paper, a concept-concept association information integration and multi-model collaboration framework is proposed to enhance high-level semantic concept detection from multimedia data. Several experiments are conducted and the comparison results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.
Similar content being viewed by others
References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In International conference on very large data bases (pp. 487–499). Santiago de Chile, Chile.
Archambeau, C., Valle, M., Assenza, A., Verleysen, M. (2006). Assessment of probability density estimation method: Parzen window and finite gaussian mixtures. In IEEE international symposium on circuits and systems (pp. 499–503). Island of Kos, Greece.
Aytar, Y., Orhan, O.B., Shah, M. (2007). Improving semantic concept detection and retrieval using contextual estimates. In IEEE international conference on multimedia and expo (pp. 536–539). Beijing, China.
Ballan, L., Bertinti, M., Bimbo, A.D., Serra, G. (2010). Video annotation and retrieval using ontologies an rule learning. IEEE Multimedia, 17(4), 80–88.
Bar, M., & Ullman, S. (1993). Spatial context in recognition. Perception, 25(3), 324–352.
Benmokhtar, R., & Huet, B. (2011). An ontology-based evidential framework for video indexing using high-level multimodal fusion. Multimedia Tools and Applications, 55, 1–27.
Chen, C., Lin, L., Shyu, M.L. (2011). Utilization of co-occurrence relationships between semantic concepts in re-ranking for information retrieval. In IEEE international symposium on multimedia (ISM2011) (pp. 53–60). Dana Point, California.
Chen, M.Y., & Hauptmann, A. (2007). Discriminative fields for modeling semantic concepts in video. In Large scale semantic access to content (text, image, video, and sound) (pp. 151–166). Pittsburgh, Pennsylvania.
Cherman, E.A., Metz, J., Monard, M.C. (2011). Incorporating label dependency into the binary relevance framework for multi-label classification. Expert Systems with Applications, 39(2), 1647–1655.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition (Vol. 1, pp. 886–893). San Diego, USA.
Elleuch, N., Zarka, M., Ammar, A.B., Alimi, A.M. (2011). A fuzzy ontology-based framework for reasoning in visual video content analysis and indexing. In The eleventh international workshop on multimedia data mining (pp. 1–8). San Diego, CA.
Galleguillos, C., Rabinovich, A., Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 144–151). Anchorage, AK.
Goldberg, D., Nichols, D., Oki, B.M., Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of ACM, 35(12), 61–70.
Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D. (2008). Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80(3), 300–316.
Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In The 10th European conference on computer vision (pp. 30–43). Marseille, France.
Heitz, G., Gould, S., Saxena, A., Koller, D. (2008). Cascaded classification models: Combining models for holistic scene understanding. In Neural information processing systems (pp. 417–424). Vancouver, Canada.
Jiang, W., Chang, S.F., Loui, A.C. (2006). Active context-based concept fusion with partial user labels. In IEEE international conference on image processing (pp. 2917–2920). Atlanta, Georgia.
Jiang, Y.G. (2010). Prediction scores on TRECVID 2010 data set. http://www.ee.columbia.edu/ln/dvmm/CU-VIREO374/. Last accessed on 8 Sept 2011.
Jiang, Y.G., Wang, J., Chang, S.F., Ngo, C.W. (2009). Domain adaptive semantic diffusion for large scale context-based video annotation. In International conference on computer vision (ICCV) (pp. 1420–1427). Kyoto, Japan.
Jiang, Y.G., Dai, Q., Wang, J., Ngo, C.W., Xue, X., Chang, S.F. (2012). Fast semantic diffusion for large scale context-based image and video annotation. IEEE Transactions on Image Processing, 21(6), 3080–3091.
Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C. (2008). Correlation-based video semantic concept detection using multiple correspondence analysis. In IEEE international symposium on multimedia (pp. 316–321). Berkeley, USA.
Lin, L., Chen, C., Shyu, M.L., Chen, S.C. (2011). Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE Multimedia, 18(3), 32–43.
Lowe, D.G. (1999). Object recognition from local scale-invariant features. In IEEE international conference on computer vision (Vol. 2, pp. 1150–1157). Kerkyra, Greece.
Meng, T., & Shyu, M.L. (2012a). Leveraging concept association network for multimedia rare concept mining and retrieval. In IEEE international conference on multimedia and expo (pp. 860-865). Melbourne, Australia.
Meng, T., & Shyu, M.L. (2012b). Model-driven collaboration and information integration for enhancing video semantic concept detection. In The 13th IEEE international conference on information integration and reuse (IRI2012) (pp. 144–151). Las Vegas, Nevada.
Merler, M., Huang, B., Xie, L., Hua, G., Natsev, A. (2012). Semantic model vectors for complex video event recognition. IEEE Transactions on Multimedia, 14(1), 88–101.
Miller, G.A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.
Naphade, M., Smith, J., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J. (2006). Large-scale concept ontology for multimedia. IEEE MultiMedia, 13(3), 86–91.
Naphade, M.R., Kristjansson, T., Frey, B., Huang, T.S. (1998). Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. In IEEE international conference on image processing (Vol. 3, pp. 536–540). Chicago, IL.
Naphade, M.R., Kozinetsey, I., Huang, T.S., Ramchandran, K. (2000). A factor graph framework for semantic indexing and retrieval in video. In The IEEE workshop on content-based access of image and video libraries (pp. 35–39). Washington DC.
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S. (2007). Objects in context. In IEEE international conference on computer vision (pp. 1–8). Rio de Janeiro, Brazil.
Shyu, M.L., Xie, Z., Chen, M., Chen, S.C. (2008). Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia, 10, 252–259.
Smeaton, A.F., Over, P., Kraaij, W. (2006). Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM international workshop on multimedia information retrieval (pp. 321–330). doi:10.1145/1178677.1178722.
Smith, J.R., Naphade, M., Natsev, A. (2003). Multimedia semantic indexing using model vectors. In IEEE international conference on multimedia and expo (pp. 445–448). Baltimore, MD.
Tang, J., Hua, X.S., Wang, M., Gu, Z., Qi, G.J., Wu, X. (2009). Correlative linear neighborhood propagation for video annotation. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 39(2), 409–416.
Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, (2), 169–191.
Torralba, A.B., Murphy, K.P., Freeman, W.T. (2004). Contextual models for object detection using boosted random fields. In Neural information processing systems (pp. 1401–1408). Vancouver, British Columbia, Canada.
Wei, X.Y., Ngo, C.W., Jiang, Y.G. (2008). Selection of concept detectors for video search by ontology-enriched semantic spaces. IEEE Transactions on Multimedia, 10(6), 1085–1096.
Yang, Y.H. (2008). Video search reranking via online ordinal reranking. In IEEE international conference on multimedia and expo (pp. 285–288). Hannover, Germany.
Zha, Z.J., Mei, T., Wang, Z., Hua, X.S. (2007). Building a comprehensive ontology to refine concept video detection. In International workshop on multimedia information retrieval (pp. 227–236). Augsburg, Germany.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Meng, T., Shyu, ML. Concept-concept association information integration and multi-model collaboration for multimedia semantic concept detection. Inf Syst Front 16, 787–799 (2014). https://doi.org/10.1007/s10796-013-9427-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-013-9427-8