ABSTRACT
We describe an efficient and scalable system for automatic image categorization. Our approach seeks to marry scalable "model-free" neighborhood-based annotation with accurate boosting-based per-tag modeling. For accelerated neighborhood-based classification, we use a set of spatial data structures as weak classifiers for an arbitrary number of categories. We employ standard edge and color features and an approximation scheme that scales to large training sets. The weak classifier outputs are combined in a tag-dependent fashion via boosting to improve accuracy. The method performs competitively with standard SVM-based per-tag classification with substantially reduced computational requirements. We present multi-label image annotation experiments using data sets of more than two million photos.
- J. Adcock, M. L. Cooper, and J. Pickens. Experiments in interactive video search by addition and subtraction. In ACM Conf. on Image and Video Retrieval, 2008. Google ScholarDigital Library
- S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6):891--923, 1998. Google ScholarDigital Library
- V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. Boostmap: An embedding method for efficient nearest neighbor retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1):89--104, Jan. 2008. Google ScholarDigital Library
- K. Barnard, P. Duygulu, N. d. Freitas, D. Forsyth, D. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107--1135, 2003. Google ScholarDigital Library
- L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, volume 20, Cambridge, MA, 2008. MIT Press.Google Scholar
- M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757--1771, 2004.Google ScholarCross Ref
- G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(3):394--410, March 2007. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision&Pattern Recognition (CVPR), volume II, pages 886--893, 2005. Google ScholarDigital Library
- R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2):1--60, 2008. Google ScholarDigital Library
- H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi. Approximate nearest neighbor searching in multimedia databases. Data Engineering, International Conference on, 0:0503, 2001. Google ScholarDigital Library
- D. Fradkin and D. Madigan. Experiments with random projections for machine learning. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 517--522, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933--969, November 2003. Google ScholarDigital Library
- A. Halevy, P. Norvig, and F. Pereira. The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2):8--12, March--April 2009. Google ScholarDigital Library
- A. Hauptmann, R. Yan, and W.-H. Lin. How many high-level concepts will fill the semantic gap in news video retrieval? In CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval, pages 627--634, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- J. Hays and A. A. Efros. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007), 26(3), 2007. Google ScholarDigital Library
- J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 119--126, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- X. Li, L. Chen, L. Zhang, F. Lin, and W.-Y. Ma. Image annotation by large-scale content-based image retrieval. In MULTIMEDIA '06: Proceedings of the 14th annual ACM international conference on Multimedia, pages 607--610, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- X. Li, C. G. M. Snoek, and M. Worring. Learning tag relevance by neighbor voting for social image retrieval. In Proceedings of the ACM International Conference on Multimedia Information Retrieval, pages 180 -- 187, Vancouver, Canada, October 2008. Google ScholarDigital Library
- X. Li, C. G. M. Snoek, and M. Worring. Annotating images by harnessing worldwide user-tagged photos. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, April 2009. Invited paper. Google ScholarDigital Library
- X. Li, D. Wang, J. Li, and B. Zhang. Video search in concept subspace: a text-like paradigm. In CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval, pages 603--610, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. M. Mount and S. Arya. Ann: A library for approximate nearest neighbor searching, version 1.1.1. http://www.cs.umd.edu/mount/ANN/.Google Scholar
- M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Applications (VISAPP'09), 2009.Google Scholar
- M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. IEEE Multimedia Magazine, 13(3), 2006. Google ScholarDigital Library
- M. R. Naphade and J. R. Smith. On the detection of semantic concepts at trecvid. In MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia, pages 660--667, New York, NY,USA, 2004. ACM. Google ScholarDigital Library
- A. P. Natsev, M. R. Naphade, and J. Tesic. Learning the semantics of multimedia queries and concepts from a small number of examples. In MULTI-MEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia, pages 598--607, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- R. E. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999. Google ScholarDigital Library
- C. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, and M. Worring. Adding semantics to detectors for video retrieval. IEEE Trans. on Multimedia, 9(5):975--986, Aug. 2007. Google ScholarDigital Library
- C. G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In MULTIMEDIA '06: Proceedings of the 14th annual ACM international conference on Multimedia, pages 421---430, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- D. Tao, X. Tang, X. Li, and X. Wu. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7):1088--1099, July 2006. Google ScholarDigital Library
- K. Tieu and P. A. Viola. Boosting image retrieval. International Journal of Computer Vision, 56(1--2):17--36, 2004. Google ScholarDigital Library
- A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Macihine Intelligence, 30(11):1958--1970,Nov. 2008. Google ScholarDigital Library
- A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5):854--869, May 2007. Google ScholarDigital Library
- J. C. van Gemert, J.-M. Geusebroek, C. J. Veenman, C. G. M. Snoek,and A. W. M. Smeulders. Robust scene categorization by learning image statistics in context. In CVPRW '06: Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop, page 105, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- C. Wang, F. Jing, L. Zhang, and H.-J. Zhang. Scalable search-based image annotation. Multimedia Systems, 14(4):205--220, 2008.Google ScholarDigital Library
- D. Wang and M. Cooper. Image orientation detection using scalable non-parametric classification. Pattern Analysis and Applications, (In preparation) 2009.Google Scholar
- X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma. Annosearch: Image auto-annotation by search. In IEEE. CVPR 2006, pages II: 1483--1490, 2006. Google ScholarDigital Library
- R. Yan, M.-Y. Chen, and A. Hauptmann. Mining relationships between concepts using probabalistic graphical models. In Proc. IEEE ICME, 2006.Google Scholar
- R. Yan, J. Tesic, and J. R. Smith. Model-shared subspace boosting for multi-label classification. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 834---843, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- M.-L. Zhang and Z.-H. Zhou. Ml-knn: A lazy approach to multi-label learning. Pattern Recognition, 40(7):2038--2048, 2007. Google ScholarDigital Library
Index Terms
- Image categorization combining neighborhood methods and boosting
Recommendations
Boosting k-NN for Categorization of Natural Scenes
The k-nearest neighbors (k-NN) classification rule has proven extremely successful in countless many computer vision applications. For example, image categorization often relies on uniform voting among the nearest prototypes in the space of descriptors. ...
Boosting recombined weak classifiers
Boosting is a set of methods for the construction of classifier ensembles. The differential feature of these methods is that they allow to obtain a strong classifier from the combination of weak classifiers. Therefore, it is possible to use boosting ...
Using boosting to prune bagging ensembles
Boosting is used to determine the order in which classifiers are aggregated in a bagging ensemble. Early stopping in the aggregation of the classifiers in the ordered bagging ensemble allows the identification of subensembles that require less memory ...
Comments