Article

Graph based multi-modality learning

Authors:
Hanghang Tong

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jingrui He

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Mingjing Li

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Changshui Zhang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Wei-Ying Ma

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaNovember 2005Pages 862–871https://doi.org/10.1145/1101149.1101337

Published:06 November 2005Publication History

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Pages 862–871

ABSTRACT

To better understand the content of multimedia, a lot of research efforts have been made on how to learn from multi-modal feature. In this paper, it is studied from a graph point of view: each kind of feature from one modality is represented as one independent graph; and the learning task is formulated as inferring from the constraints in every graph as well as supervision information (if available). For semi-supervised learning, two different fusion schemes, namely linear form and sequential form, are proposed. For each scheme, it is derived from optimization point of view; and further justified from two sides: similarity propagation and Bayesian interpretation. By doing so, we reveal the regular optimization nature, transductive learning nature as well as prior fusion nature of the proposed schemes, respectively. Moreover, the proposed method can be easily extended to unsupervised learning, including clustering and embedding. Systematic experimental results validate the effectiveness of the proposed method.

References

Belkin, M., and Niyogi, P. Laplacian Eigenmaps and spectral techniques for embedding and clustering. Neural Computation, pp. 1373--1396, 2003.]] Google ScholarDigital Library
Bickel, S., and Scheffer, T. Multi-view clustering. Proc. of Int. Conf. on Data Mining, pp. 19--26, 2004.]] Google ScholarDigital Library
Blum, A., and Mitchell, T. Combining labeled and unlabeled data with Co-Training. Proc. of the Conf. on Computational Learning Theory, pp. 92--100, 1998.]] Google ScholarDigital Library
Cai, D., He, X., Li, Z., Ma, W.Y., and Wen, J.R. Hierarchical clustering of WWW image search results using visual, textual and link information. Proc. of the ACM Conf. on Information Retrieval, pp. 952--959, 2004.]] Google ScholarDigital Library
Cascia, M.L., Sethi, S., and Sclaroff, S. Combining textural and visual cues for content-based image retrieval on the world wide web. IEEE Workshop on Content-based Access of Image and Video Libaries, pp. 24--28, 1998.]] Google ScholarDigital Library
Dupont, S., and Luettin, J. Audio-visual speech modeling for continuous speech recognition. IEEE Trans. on Multimedia, 2(3): 141--151, 2000.]]Google ScholarDigital Library
Feng, H., Shi, R., and Chua, T.S. A bootstrapping framework for annotating and retrieving WWW images. Proc. of the ACM Int. Conf. on Multimedia, pp. 960--967, 2004.]] Google ScholarDigital Library
Garg, A., Potamianos, G., Neti, C., and Huang, T.S. Frame-dependent multi-stream reliability indications for audio-visual speech recognition, Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 24--27, 2003.]]Google Scholar
Ghani, R. Combining labeled and unlabeled data multi-class text categorization. Proc. of the Intl. Conf. on Machine Learning, pp. 187--194, 2002.]] Google ScholarDigital Library
He, J., Li, M., Zhang, H.J., Tong, H., and Zhang, C. Manifold ranking based image retrieval. Proc. of the ACM Conf. on Information Retrieval, pp. 9--16, 2004.]] Google ScholarDigital Library
Heckmann, M., Berthommier, F., and Kroschel, K. Noise adaptive stream weighting in audio-visual speech recognition, EURASIP Journal on Applied Signal Process, pp. 1260--1273, 2002.]]Google ScholarDigital Library
Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., and Zabih, R. Image indexing using color correlograms. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 762--768, 1997.]] Google ScholarDigital Library
Kailing, K., Kriegel, H., Pryakhin, A., and Schubert, M. Clustering multi-represented objects with noise. Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 394--403, 2004.]]Google ScholarCross Ref
Kittler, J., Hatef, M., and Duin, R.P.W. Combining classifiers. Pattern Recognition, pp. 897--901, 1996.]] Google ScholarDigital Library
Mallat, S.G., A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674--693, 1989.]] Google ScholarDigital Library
Ng, A.Y., Jordan, M.I., and Weiss, Y. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2001.]]Google Scholar
Nigam, K., and Ghani, R. Analyzing the effectiveness and applicability of Co-Training. Proc. of Information and Knowledge Management, pp. 86--93, 2000]] Google ScholarDigital Library
Swain, M., and Ballard, D. Color indexing. Int. Journal of Computer Vision, 7(1): 11--32, 1991.]] Google ScholarDigital Library
Suen, C.Y., and Lam, L. Multiple classifier combination methodologies for different output level. Proc. of the First Int. Workshop on Multiple Classifier, pp. 52--66, 2000.]] Google ScholarDigital Library
Reference removed for double-blind review]]Google Scholar
Tamura, H., Mori, S., and Yamawaki, T. Textural features corresponding to visual perception. IEEE Trans. on Systems., Man and Cybernetics, pp. 460--472, 1978.]]Google ScholarCross Ref
The WebKB dataset. http://meganesia.int.gu.edu.au/~phmartin/WebKB/.]]Google Scholar
Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., and Ma. W.Y. Recom: reinforcement clustering of multi-type interrelated data objects. Proc. of the ACM Conf. on Information Retrieval, pp. 274--281, 2003.]] Google ScholarDigital Library
Wu, Y., Chang, E.Y., Chang, K.C.C., and Smith, J.R. Optimal multimodal fusion for multimedia data analysis. Proc. of the ACM Int. Conf. on Multimedia, pp. 572--579, 2004.]] Google ScholarDigital Library
Yan, R., and Hauptmann, A.G. The combination limit in multimedia retrieval. Proc. of the ACM Int. Conf. on Multimedia, pp. 339--342, 2003.]] Google ScholarDigital Library
Yi, X. Zhang, C, and Wang, J. Multi-view EM algorithm and its application to color image segmentation. IEEE Int. Conf. on Multimedia and Expo, pp. 351--354, 2004.]]Google Scholar
Zheng, X., Cai, D., He, X., Ma, W.Y., and Lin, X. Locality preserving clustering for image database. Proc. of the ACM Conf. on Information Retrieval, pp. 885--891, 2004.]] Google ScholarDigital Library
Zhou, D., and Schölkopf, B. A regularization framework for learning from graph data. Workshop on Statistical Relational Learning at Int. Conf. on Machine Learning, pp. 132--137, 2004.]]Google Scholar
Zhou, D., and Schölkopf, B. Transductive Inference with Graphs. MPI Technical Report, 2004.]]Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Learning with local and global consistency. 18th Annual Conf. on Neural Information Processing Systems, pp. 237--244, 2003.]]Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Ranking on data manifolds. 18th Annual Conf. on Neural Information Processing System, pp. 169--176, 2003.]]Google Scholar

Index Terms

Graph based multi-modality learning
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
Highlights
- MCGCN for the first time builds cross-modal graph and jointly learns modality-specific and modality-shared features for semi-supervised cross-modal hashing.
- MCGCN provides a three-channel network architecture, including two modality-...
Abstract
Cross-modal hashing maps heterogeneous multimedia data into Hamming space for retrieving relevant samples across modalities, which has received great research interests due to its rapid retrieval and low storage cost. In real-world applications, ...
Read More
Multi-graph Multi-label Learning with Dual-granularity Labeling
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Graphs are a powerful and versatile data structure that easily captures real life relationship. Multi-graph Multi-label learning (MGML) is a supervised learning task, which aims to learn a Multi-label classifier to label a set of objects of interest (...
Read More
Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation
ICSC '07: Proceedings of the International Conference on Semantic Computing

Active learning methods have been widely applied to reduce human labeling effort in multimedia annotation tasks. However, in traditional methods multiple concepts are usually sequentially annotated, i.e., each concept is exhaustively annotated before ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
November 2005
1110 pages
ISBN:1595930442
DOI:10.1145/1101149
General Chairs:
Hongjiang Zhang
Microsoft Research Asia, China
,
Tat-Seng Chua
National University of Singapore, Singapore
,
Program Chairs:
Ralf Steinmetz
Technische Universitat Darmstadt, Germany
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Lynn Wilcox
FXPAL
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian interpretation
graph model
multi-modality analysis
regularized optimization
similarity propagation
Qualifiers
- Article
Conference

Acceptance Rates
MULTIMEDIA '05 Paper Acceptance Rate49of312submissions,16%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 1,115
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Graph based multi-modality learning

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks

Multi-graph Multi-label Learning with Dual-granularity Labeling

Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation