ABSTRACT
This architecture paper presents EXTENT, a probabilistic framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. EXTENT fuses contextual information (location, time, and camera parameters), photo content (perceptual features), and semantic ontology in a synergistic way. It uses causal strengths to encode causalities between variables, and between variables and semantic labels. Through a landmark-recognition case study, we show that EXTENT can provide high-quality annotation, substantially better than any traditional unimodal methods.
- K. Barnard and D. Forsyth. Learning the semantics of words and pictures. In International Conference on Computer Vision, volume 2, pages 408--415, 2000.Google Scholar
- E. Y. Chang. Extent: Combining context, content, and semantic ontology for photo annotation. US Provisional Patent, 2005.Google Scholar
- E. Y. Chang, K. Goh, G. Sychay, and G. Wu. Content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans. on Circuits and Systems for Video Technology Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description, 13(1):26--38, 2003. Google ScholarDigital Library
- M. Davis, S. King, N. Good, and R. Sarvas. From context to content: Leveraging context to infer media metadata. ACM International Conference on Multimedia, 2004. Google ScholarDigital Library
- A. Deshpande, C. Guestrin, S. Madden, and W. Hong. Beyong pixels: Exploiting camera metadata for photo classification. IEEE CVPR, 2004.Google Scholar
- A. K. Dey. Understanding and using context. Personal and Ubiquitous Computing Journal, 5(1), 2001. Google ScholarDigital Library
- D. S. Diomidis. Position-annotated photographs: a geotemporal web. IEEE Pervasive Computing, 2(2), 2003. Google ScholarDigital Library
- N. Friedman and D. Koller. Learning bayesian networks from data (tutorial). NIPS, 2000.Google Scholar
- K.-S. Goh and E. Y. Chang. One, two class svms for multi-class image annotation. IEEE Transactions on Knowledge and Data Engineering (TKDE) (accepted), 2005. Google ScholarDigital Library
- K.-S. Goh, E. Y. Chang, and K.-T. Cheng. Svm binary classifier ensembles for multi-class image classification. ACM International Conference on Information and Knowledge Management (CIKM), pages 395--402, 2001. Google ScholarDigital Library
- D. Heckerman. A bayesian approach to learning causal networks. Conference on Uncertainty in Artificial Intelligence, pages 107--118, 1995.Google ScholarDigital Library
- D. Heckerman and R. Shachter. Decision-theoretic foundations for causal reasoning. MSR-TR-94-11, 1994.Google Scholar
- D. G. Lowe. Object recognition from local scale-invariant features. International Conference on Computer Vision, 1999. Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004. Google ScholarDigital Library
- M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina, and A. Paepcke. Context data in geo-referenced digital photo collections. ACM International Conference on Multimedia, 2004. Google ScholarDigital Library
- M. Naaman, A. Paepcke, and H. Garcia-Molina. From where to what: Metadata sharing for digital photographs with geographic coordinates. International Conference on Cooperative Information Systems (CoopIS), 2003.Google ScholarCross Ref
- L. R. Novick and P. W. Cheng. Assessing interactive causal influence. Psychological Review, 111(2):455--485, 2004.Google ScholarCross Ref
- J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and bayesian inference. Behavioral and Brain Sciences, 24:629--641, 2001.Google Scholar
- S. Tong and E. Chang. Support vector machine active learning for image retrieval. Proceedings of ACM International Conference on Multimedia, pages 107--118, October 2001. Google ScholarDigital Library
- J. Williamson. Causality, in Dov Gabbay & F. Guenthner (eds.): Handbook of Philosophical Logic. Kluwer (to appear), 2005.Google Scholar
- EXTENT: fusing context, content, and semantic ontology for photo annotation
Recommendations
Multimodal metadata fusion using causal strength
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaWe propose a probabilistic framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual ...
Linked tag: image annotation using semantic relationships between image tags
State of the art image tagging systems are limited because they allow users to annotate image tags in noun form, which cannot fully express the semantics of image content. In this paper, we propose Linked Tag, a semi-automatic image annotation system ...
Social image tag enrichment based on textual similarity modeling
In social image sharing websites, users provide several descriptive tags to annotate their shared images. Usually, the user annotated tags are noisy, biased and incomplete. How to improve tag quality is very important for tag based applications. The ...
Comments