OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

Li, Li-Jia; Fei-Fei, Li

doi:10.1007/s11263-009-0265-6

OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

Published: 25 July 2009

Volume 88, pages 147–168, (2010)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Li-Jia Li¹ &
Li Fei-Fei^1,2

1118 Accesses
135 Citations
3 Altmetric
Explore all metrics

Abstract

The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal, S., Awan, A., Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. PAMI, 26(11), 1475–1490.
Google Scholar
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. The Journal of Machine Learning Research, 3.
Barnard, K., & Forsyth, D. (2001). Learning the semantics of words and pictures. In Eighth IEEE international conference on computer vision (pp. 408–415).
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.
Article Google Scholar
Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In Proc. computer vision and pattern recognition.
Besemer, J., Lomsadze, A., & Borodovsky, M. (2001). GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research, 29(12), 2607.
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, New York.
Book MATH Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Article MATH Google Scholar
Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via pLSA. Proc. ECCV, 4, 517–530.
Google Scholar
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory (pp. 144–152).
Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., & Malik, J. (1999). Blobworld: A system for region-based image indexing and retrieval. In Third international conference on visual information systems (pp. 509–516).
Chen, Y., Wang, J. Z., & Krovetz, R. (2003). Content-based image retrieval by clustering. In Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval (pp. 193–200).
Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Toward scalable dataset construction: An active learning approach. In Proc. ECCV.
Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (pp. 1–22).
Deng, Y. Manjunath, B. S., Kenney, C., Moore, M.S., & Shin, H. (2001). An efficient color representation for image retrieval. IEEE Transactions on Image Processing, 10(1), 140–147.
Article MATH Google Scholar
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchy model for learning natural scene categories. Computer Vision and Pattern Recognition.
Fei-Fei, L., Fergus, R., & Perona, P. (2003). A Bayesian approach to unsupervised one-Shot learning of object categories. In Proceedings of the 9th international conference on computer vision (pp. 1134–1141). Nice, France.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In Workshop on generative-model based vision.
Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Fei-Fei, L., Fergus, R., & Torralba, A. (2007). Recognizing and learning object categories. Short course CVPR. http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html.
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 1, 55–79.
Article Google Scholar
Feng, H. M., & Chua, T. S. (2003). A bootstrapping approach to annotating large image collection. In Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval (pp. 55–62).
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proc. computer vision and pattern recognition (pp. 264–271).
Fergus, R., Perona, P., & Zisserman, A. (2004). A visual category filter for Google images. In Proc. 8th European conf. on computer vision.
Fergus, R., Perona, P., & Zisserman, A. (2005a). A sparse object category model for efficient learning and exhaustive recognition. In Proc. computer vision and pattern recognition.
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005b). Learning object categories from Google image search. In Computer vision, 2005. ICCV 2005. Tenth IEEE international conference.
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2), 209–230.
Article MATH MathSciNet Google Scholar
Fink, M., & Ullman, S. (2007). From Aardvark to Zorro: A benchmark of mammal images.
Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory: Second European conference, EuroCOLT’95, proceedings (p. 23). Barcelona, Spain, 13–15 March 1995. Berlin: Springer.
Google Scholar
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine learning—international workshop then conference (pp. 148–156). San Mateo: Morgan Kaufmann.
Google Scholar
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 721–741.
Griffin, G., Holub, A., Perona, P. (2007). Caltech-256 object category dataset.
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 50–57).
Jain, A. K., & Vailaya, A. (1996). Image retrieval using color and shape. Pattern Recognition, 29(8), 1233–1244.
Article Google Scholar
Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 119–126). New York: ACM.
Google Scholar
Kadir, T., & Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83–105.
Article MATH Google Scholar
Krempp, S., Geman, D., & Amit, Y. (2002). Sequential learning with reusable parts for object detection (Technical report). Johns Hopkins University.
LeCun, Y., Huang, F., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proc. CVPR.
Leibe, B., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proc. workshop on statistical learning in computer vision, Prague, Czech Republic.
Li, J., Wang, J. Z., & Wiederhold, G. (2000). IRM: integrated region matching for image retrieval. In Proceedings of the eighth ACM international conference on multimedia (pp. 147–156).
Lowe, D. (1999). Object recognition from local scale-invariant features. In Proc. international conference on computer vision.
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proc. 17th international conf. on machine learning (pp. 591–598).
McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. In Proceedings of the main conference on human language technology conference of the North American Chapter of the Association of Computational Linguistics (pp. 152–159). Morristown: Association for Computational Linguistics.
Chapter Google Scholar
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39.
Article Google Scholar
Neal, R., & Hinton, G. (1998). A view of the EM algorithm that justifies incremental, sparse and other variants. In M.I. Jordan (Ed.), Learning in graphical models (pp. 355–368). Norwell: Kluwer Academic.
Google Scholar
PASCAL (2006). The PASCAL object recognition database collection. http://www.pascal-network.org/challenges/VOC/databases.html.
Pawan Kumar, M., Torr, P.H.S., & Zisserman, A. (2005). Obj cut. In Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (Vol. 1, pp. 18–25). Washington, DC, USA, 2005. Los Alamitos: IEEE Computer Society.
Google Scholar
Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). Semi-supervised selftraining of object detection models. In Seventh IEEE workshop on applications of computer vision.
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005) Labelme: a database and web-based tool for image annotation.
Schroff, F., Criminisi, A., & Zisserman, A. (2007). Harvesting image databases from the web. In Computer vision, 2007. ICCV 2007. IEEE 11th international conference.
Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4(2), 639–650.
MATH MathSciNet Google Scholar
Sivic, J., Russell, B. C., Efros, A., Zisserman, A., & Freeman, W. T. (2005). Discovering object categories in image collections. In Proc. international conference on computer vision.
Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In Ninth IEEE international conference on computer vision, 2003. Proceedings (pp. 1470–1477).
Sudderth, E., Torralba, A., Freeman, W., Willsky, A. (2005a). Describing visual scenes using transformed Dirichlet processes. Advances in Neural Information Processing Systems, 18, 1297–1304.
Google Scholar
Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, A. (2005b). Learning hierarchical models of scenes, objects, and parts. In Proc. international conference on computer vision.
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association.
Wang, G., Zhang, Y., & Fei-Fei, L. (2006). Using dependent regions for object categorization in a generative framework. Computer Vision and Pattern Recognition.
Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In Proc. European conference on computer vision (Vol. 2, pp. 101–108).
Yanai, K., & Barnard, K. (2005). Probabilistic web image gathering. In ACM SIGMM workshop on multimedia information retrieval (pp. 57–64).
Yao, Z.-Y., Yang, X., & Zhu, S.C. (2007). Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks. In 6th int. conf. on EMMCVPR.
Zhou, X. S. & Huang, T. S. (2002). Unifying keywords and visual contents in image retrieval. Multimedia, IEEE, 9(2), 23–33.
Article MathSciNet Google Scholar
Zhu, X. (2006). Semi-supervised learning literature survey. Computer Science, University of Wisconsin—Madison.

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Princeton University, Princeton, USA
Li-Jia Li & Li Fei-Fei
Dept. of Computer Science, Stanford University, Stanford, USA
Li Fei-Fei

Authors

Li-Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Fei-Fei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li-Jia Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, LJ., Fei-Fei, L. OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning. Int J Comput Vis 88, 147–168 (2010). https://doi.org/10.1007/s11263-009-0265-6

Download citation

Received: 21 July 2008
Accepted: 18 June 2009
Published: 25 July 2009
Issue Date: June 2010
DOI: https://doi.org/10.1007/s11263-009-0265-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

Abstract

Access this article

Similar content being viewed by others

Accumulative image categorization: a personal photo classification method for progressive collection

Toward Unsupervised, Multi-object Discovery in Large-Scale Image Collections

Efficient Label Collection for Image Datasets via Hierarchical Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

Abstract

Access this article

Similar content being viewed by others

Accumulative image categorization: a personal photo classification method for progressive collection

Toward Unsupervised, Multi-object Discovery in Large-Scale Image Collections

Efficient Label Collection for Image Datasets via Hierarchical Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation