Skip to main content

Advertisement

Log in

OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, S., Awan, A., Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. PAMI, 26(11), 1475–1490.

    Google Scholar 

  • Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. The Journal of Machine Learning Research, 3.

  • Barnard, K., & Forsyth, D. (2001). Learning the semantics of words and pictures. In Eighth IEEE international conference on computer vision (pp. 408–415).

  • Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.

    Article  Google Scholar 

  • Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In Proc. computer vision and pattern recognition.

  • Besemer, J., Lomsadze, A., & Borodovsky, M. (2001). GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research, 29(12), 2607.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, New York.

    Book  MATH  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    Article  MATH  Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via pLSA. Proc. ECCV, 4, 517–530.

    Google Scholar 

  • Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory (pp. 144–152).

  • Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., & Malik, J. (1999). Blobworld: A system for region-based image indexing and retrieval. In Third international conference on visual information systems (pp. 509–516).

  • Chen, Y., Wang, J. Z., & Krovetz, R. (2003). Content-based image retrieval by clustering. In Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval (pp. 193–200).

  • Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Toward scalable dataset construction: An active learning approach. In Proc. ECCV.

  • Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (pp. 1–22).

  • Deng, Y. Manjunath, B. S., Kenney, C., Moore, M.S., & Shin, H. (2001). An efficient color representation for image retrieval. IEEE Transactions on Image Processing, 10(1), 140–147.

    Article  MATH  Google Scholar 

  • Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchy model for learning natural scene categories. Computer Vision and Pattern Recognition.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2003). A Bayesian approach to unsupervised one-Shot learning of object categories. In Proceedings of the 9th international conference on computer vision (pp. 1134–1141). Nice, France.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In Workshop on generative-model based vision.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Fei-Fei, L., Fergus, R., & Torralba, A. (2007). Recognizing and learning object categories. Short course CVPR. http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html.

  • Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 1, 55–79.

    Article  Google Scholar 

  • Feng, H. M., & Chua, T. S. (2003). A bootstrapping approach to annotating large image collection. In Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval (pp. 55–62).

  • Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proc. computer vision and pattern recognition (pp. 264–271).

  • Fergus, R., Perona, P., & Zisserman, A. (2004). A visual category filter for Google images. In Proc. 8th European conf. on computer vision.

  • Fergus, R., Perona, P., & Zisserman, A. (2005a). A sparse object category model for efficient learning and exhaustive recognition. In Proc. computer vision and pattern recognition.

  • Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005b). Learning object categories from Google image search. In Computer vision, 2005. ICCV 2005. Tenth IEEE international conference.

  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2), 209–230.

    Article  MATH  MathSciNet  Google Scholar 

  • Fink, M., & Ullman, S. (2007). From Aardvark to Zorro: A benchmark of mammal images.

  • Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory: Second European conference, EuroCOLT’95, proceedings (p. 23). Barcelona, Spain, 13–15 March 1995. Berlin: Springer.

    Google Scholar 

  • Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine learning—international workshop then conference (pp. 148–156). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 721–741.

  • Griffin, G., Holub, A., Perona, P. (2007). Caltech-256 object category dataset.

  • Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 50–57).

  • Jain, A. K., & Vailaya, A. (1996). Image retrieval using color and shape. Pattern Recognition, 29(8), 1233–1244.

    Article  Google Scholar 

  • Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 119–126). New York: ACM.

    Google Scholar 

  • Kadir, T., & Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83–105.

    Article  MATH  Google Scholar 

  • Krempp, S., Geman, D., & Amit, Y. (2002). Sequential learning with reusable parts for object detection (Technical report). Johns Hopkins University.

  • LeCun, Y., Huang, F., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proc. CVPR.

  • Leibe, B., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proc. workshop on statistical learning in computer vision, Prague, Czech Republic.

  • Li, J., Wang, J. Z., & Wiederhold, G. (2000). IRM: integrated region matching for image retrieval. In Proceedings of the eighth ACM international conference on multimedia (pp. 147–156).

  • Lowe, D. (1999). Object recognition from local scale-invariant features. In Proc. international conference on computer vision.

  • McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proc. 17th international conf. on machine learning (pp. 591–598).

  • McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. In Proceedings of the main conference on human language technology conference of the North American Chapter of the Association of Computational Linguistics (pp. 152–159). Morristown: Association for Computational Linguistics.

    Chapter  Google Scholar 

  • Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39.

    Article  Google Scholar 

  • Neal, R., & Hinton, G. (1998). A view of the EM algorithm that justifies incremental, sparse and other variants. In M.I. Jordan (Ed.), Learning in graphical models (pp. 355–368). Norwell: Kluwer Academic.

    Google Scholar 

  • PASCAL (2006). The PASCAL object recognition database collection. http://www.pascal-network.org/challenges/VOC/databases.html.

  • Pawan Kumar, M., Torr, P.H.S., & Zisserman, A. (2005). Obj cut. In Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (Vol. 1, pp. 18–25). Washington, DC, USA, 2005. Los Alamitos: IEEE Computer Society.

    Google Scholar 

  • Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). Semi-supervised selftraining of object detection models. In Seventh IEEE workshop on applications of computer vision.

  • Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005) Labelme: a database and web-based tool for image annotation.

  • Schroff, F., Criminisi, A., & Zisserman, A. (2007). Harvesting image databases from the web. In Computer vision, 2007. ICCV 2007. IEEE 11th international conference.

  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4(2), 639–650.

    MATH  MathSciNet  Google Scholar 

  • Sivic, J., Russell, B. C., Efros, A., Zisserman, A., & Freeman, W. T. (2005). Discovering object categories in image collections. In Proc. international conference on computer vision.

  • Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In Ninth IEEE international conference on computer vision, 2003. Proceedings (pp. 1470–1477).

  • Sudderth, E., Torralba, A., Freeman, W., Willsky, A. (2005a). Describing visual scenes using transformed Dirichlet processes. Advances in Neural Information Processing Systems, 18, 1297–1304.

    Google Scholar 

  • Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, A. (2005b). Learning hierarchical models of scenes, objects, and parts. In Proc. international conference on computer vision.

  • Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association.

  • Wang, G., Zhang, Y., & Fei-Fei, L. (2006). Using dependent regions for object categorization in a generative framework. Computer Vision and Pattern Recognition.

  • Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In Proc. European conference on computer vision (Vol. 2, pp. 101–108).

  • Yanai, K., & Barnard, K. (2005). Probabilistic web image gathering. In ACM SIGMM workshop on multimedia information retrieval (pp. 57–64).

  • Yao, Z.-Y., Yang, X., & Zhu, S.C. (2007). Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks. In 6th int. conf. on EMMCVPR.

  • Zhou, X. S. & Huang, T. S. (2002). Unifying keywords and visual contents in image retrieval. Multimedia, IEEE, 9(2), 23–33.

    Article  MathSciNet  Google Scholar 

  • Zhu, X. (2006). Semi-supervised learning literature survey. Computer Science, University of Wisconsin—Madison.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Jia Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, LJ., Fei-Fei, L. OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning. Int J Comput Vis 88, 147–168 (2010). https://doi.org/10.1007/s11263-009-0265-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-009-0265-6

Keywords

Navigation