ABSTRACT
To better understand the content of multimedia, a lot of research efforts have been made on how to learn from multi-modal feature. In this paper, it is studied from a graph point of view: each kind of feature from one modality is represented as one independent graph; and the learning task is formulated as inferring from the constraints in every graph as well as supervision information (if available). For semi-supervised learning, two different fusion schemes, namely linear form and sequential form, are proposed. For each scheme, it is derived from optimization point of view; and further justified from two sides: similarity propagation and Bayesian interpretation. By doing so, we reveal the regular optimization nature, transductive learning nature as well as prior fusion nature of the proposed schemes, respectively. Moreover, the proposed method can be easily extended to unsupervised learning, including clustering and embedding. Systematic experimental results validate the effectiveness of the proposed method.
- Belkin, M., and Niyogi, P. Laplacian Eigenmaps and spectral techniques for embedding and clustering. Neural Computation, pp. 1373--1396, 2003.]] Google ScholarDigital Library
- Bickel, S., and Scheffer, T. Multi-view clustering. Proc. of Int. Conf. on Data Mining, pp. 19--26, 2004.]] Google ScholarDigital Library
- Blum, A., and Mitchell, T. Combining labeled and unlabeled data with Co-Training. Proc. of the Conf. on Computational Learning Theory, pp. 92--100, 1998.]] Google ScholarDigital Library
- Cai, D., He, X., Li, Z., Ma, W.Y., and Wen, J.R. Hierarchical clustering of WWW image search results using visual, textual and link information. Proc. of the ACM Conf. on Information Retrieval, pp. 952--959, 2004.]] Google ScholarDigital Library
- Cascia, M.L., Sethi, S., and Sclaroff, S. Combining textural and visual cues for content-based image retrieval on the world wide web. IEEE Workshop on Content-based Access of Image and Video Libaries, pp. 24--28, 1998.]] Google ScholarDigital Library
- Dupont, S., and Luettin, J. Audio-visual speech modeling for continuous speech recognition. IEEE Trans. on Multimedia, 2(3): 141--151, 2000.]]Google ScholarDigital Library
- Feng, H., Shi, R., and Chua, T.S. A bootstrapping framework for annotating and retrieving WWW images. Proc. of the ACM Int. Conf. on Multimedia, pp. 960--967, 2004.]] Google ScholarDigital Library
- Garg, A., Potamianos, G., Neti, C., and Huang, T.S. Frame-dependent multi-stream reliability indications for audio-visual speech recognition, Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 24--27, 2003.]]Google Scholar
- Ghani, R. Combining labeled and unlabeled data multi-class text categorization. Proc. of the Intl. Conf. on Machine Learning, pp. 187--194, 2002.]] Google ScholarDigital Library
- He, J., Li, M., Zhang, H.J., Tong, H., and Zhang, C. Manifold ranking based image retrieval. Proc. of the ACM Conf. on Information Retrieval, pp. 9--16, 2004.]] Google ScholarDigital Library
- Heckmann, M., Berthommier, F., and Kroschel, K. Noise adaptive stream weighting in audio-visual speech recognition, EURASIP Journal on Applied Signal Process, pp. 1260--1273, 2002.]]Google ScholarDigital Library
- Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., and Zabih, R. Image indexing using color correlograms. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 762--768, 1997.]] Google ScholarDigital Library
- Kailing, K., Kriegel, H., Pryakhin, A., and Schubert, M. Clustering multi-represented objects with noise. Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 394--403, 2004.]]Google ScholarCross Ref
- Kittler, J., Hatef, M., and Duin, R.P.W. Combining classifiers. Pattern Recognition, pp. 897--901, 1996.]] Google ScholarDigital Library
- Mallat, S.G., A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674--693, 1989.]] Google ScholarDigital Library
- Ng, A.Y., Jordan, M.I., and Weiss, Y. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2001.]]Google Scholar
- Nigam, K., and Ghani, R. Analyzing the effectiveness and applicability of Co-Training. Proc. of Information and Knowledge Management, pp. 86--93, 2000]] Google ScholarDigital Library
- Swain, M., and Ballard, D. Color indexing. Int. Journal of Computer Vision, 7(1): 11--32, 1991.]] Google ScholarDigital Library
- Suen, C.Y., and Lam, L. Multiple classifier combination methodologies for different output level. Proc. of the First Int. Workshop on Multiple Classifier, pp. 52--66, 2000.]] Google ScholarDigital Library
- Reference removed for double-blind review]]Google Scholar
- Tamura, H., Mori, S., and Yamawaki, T. Textural features corresponding to visual perception. IEEE Trans. on Systems., Man and Cybernetics, pp. 460--472, 1978.]]Google ScholarCross Ref
- The WebKB dataset. http://meganesia.int.gu.edu.au/~phmartin/WebKB/.]]Google Scholar
- Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., and Ma. W.Y. Recom: reinforcement clustering of multi-type interrelated data objects. Proc. of the ACM Conf. on Information Retrieval, pp. 274--281, 2003.]] Google ScholarDigital Library
- Wu, Y., Chang, E.Y., Chang, K.C.C., and Smith, J.R. Optimal multimodal fusion for multimedia data analysis. Proc. of the ACM Int. Conf. on Multimedia, pp. 572--579, 2004.]] Google ScholarDigital Library
- Yan, R., and Hauptmann, A.G. The combination limit in multimedia retrieval. Proc. of the ACM Int. Conf. on Multimedia, pp. 339--342, 2003.]] Google ScholarDigital Library
- Yi, X. Zhang, C, and Wang, J. Multi-view EM algorithm and its application to color image segmentation. IEEE Int. Conf. on Multimedia and Expo, pp. 351--354, 2004.]]Google Scholar
- Zheng, X., Cai, D., He, X., Ma, W.Y., and Lin, X. Locality preserving clustering for image database. Proc. of the ACM Conf. on Information Retrieval, pp. 885--891, 2004.]] Google ScholarDigital Library
- Zhou, D., and Schölkopf, B. A regularization framework for learning from graph data. Workshop on Statistical Relational Learning at Int. Conf. on Machine Learning, pp. 132--137, 2004.]]Google Scholar
- Zhou, D., and Schölkopf, B. Transductive Inference with Graphs. MPI Technical Report, 2004.]]Google Scholar
- Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Learning with local and global consistency. 18th Annual Conf. on Neural Information Processing Systems, pp. 237--244, 2003.]]Google Scholar
- Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Ranking on data manifolds. 18th Annual Conf. on Neural Information Processing System, pp. 169--176, 2003.]]Google Scholar
Index Terms
- Graph based multi-modality learning
Recommendations
Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
Highlights- MCGCN for the first time builds cross-modal graph and jointly learns modality-specific and modality-shared features for semi-supervised cross-modal hashing.
- MCGCN provides a three-channel network architecture, including two modality-...
AbstractCross-modal hashing maps heterogeneous multimedia data into Hamming space for retrieving relevant samples across modalities, which has received great research interests due to its rapid retrieval and low storage cost. In real-world applications, ...
Multi-graph Multi-label Learning with Dual-granularity Labeling
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningGraphs are a powerful and versatile data structure that easily captures real life relationship. Multi-graph Multi-label learning (MGML) is a supervised learning task, which aims to learn a Multi-label classifier to label a set of objects of interest (...
Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation
ICSC '07: Proceedings of the International Conference on Semantic ComputingActive learning methods have been widely applied to reduce human labeling effort in multimedia annotation tasks. However, in traditional methods multiple concepts are usually sequentially annotated, i.e., each concept is exhaustively annotated before ...
Comments