Dynamic learning, retrieval, and tracking to augment hundreds of photographs

Pilet, Julien; Saito, Hideo

doi:10.1007/s10055-013-0228-7

Dynamic learning, retrieval, and tracking to augment hundreds of photographs

Original Article
Published: 13 September 2013

Volume 18, pages 89–100, (2014)
Cite this article

Virtual Reality Aims and scope Submit manuscript

Julien Pilet¹ &
Hideo Saito¹

295 Accesses
Explore all metrics

Abstract

Tracking is a major issue of virtual and augmented reality applications. Single object tracking on monocular video streams is fairly well understood. However, when it comes to multiple objects, existing methods lack scalability and can recognize only a limited number of objects. Thanks to recent progress in feature matching, state-of-the-art image retrieval techniques can deal with millions of images. However, these methods do not focus on real-time video processing and cannot track retrieved objects. In this paper, we present a method that combines the speed and accuracy of tracking with the scalability of image retrieval. At the heart of our approach is a bi-layer clustering process that allows our system to index and retrieve objects based on tracks of features, thereby effectively summarizing the information available on multiple video frames. Dynamic learning of new viewpoints as the camera moves naturally yields the kind of robustness and reliability expected from an augmented reality engine. As a result, our system is able to track in real-time multiple objects, recognized with low delay from a database of more than 300 entries. We released the source code of our system in a package called Polyora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

In this text, we define a keypoint as a location of interest on an image, a descriptor as a vector describing a keypoint neighborhood, and a feature as both a keypoint and its descriptor.
https://github.com/jpilet/polyora.

References

Baker S, Matthews I (2004) Lucas-kanade 20 years on: a unifying framework. Int J Comp Vis 56(3):221–255
Google Scholar
Bay H, Tuytelaars T, Gool LV (2006) SURF: speeded up robust features. In: European conference on computer vision
Fiala M (2005) ARTag, a fiducial marker system using digital techniques. In: Conference on computer vision and pattern recognition, pp 590–596
Fischler M, Bolles R (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Article MathSciNet Google Scholar
Harris C, Stephens M (1988) A combined corner and edge detector. In: Fourth alvey vision conference, Manchester
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision, LNCS, vol 1, pp 304–317
Kato H, Billinghurst M, Poupyrev I, Imamoto K, Tachibana K (2000) Virtual object manipulation on a table-top AR environment. In: International symposium on augmented reality, pp 111–119
Lepetit V, Fua P (2005) Monocular model-based 3d tracking of rigid objects: a survey. Found Trends Comp Graph Vis 1(1):1–89
Article Google Scholar
Lepetit V, Pilet J, Fua P (2004) Point matching as a classification problem for fast and robust object pose estimation. In: Conference on computer vision and pattern recognition, Washington, DC
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comp Vis 20(2):91–110
Article Google Scholar
Lucas B, Kanade T (1981) An Iterative Image Registration Technique with an Application to Stereo Vision. In: International joint conference on artificial intelligence, pp 674–679
Matas J, Chum O, Martin U, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: British machine vision conference, London, pp 384–393
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Conference on computer vision and pattern recognition
Obdržálek Š, Matas J (2005) Sub-linear indexing for large scale object recognition. In: British machine vision conference
Ozuysal M, Lepetit V, Fleuret F, Fua P (2006) Feature harvesting for tracking-by-detection. In: European conference on computer vision, Graz
Ozuysal M, Fua P, Lepetit V (2007) Fast keypoint recognition in ten lines of code. In: Conference on computer vision and pattern recognition, Minneapolis, MI
Park Y, Lepetit V, Woo W (2008) Multiple 3d object tracking for augmented reality. In: International symposium on mixed and augmented reality, pp 117–120
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Conference on computer vision and pattern recognition
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: Conference on computer vision and pattern recognition
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: European conference on computer vision
Shi J, Tomasi C (1994) Good features to track. In: Conference on computer vision and pattern recognition, Seattle
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, vol 2, pp 1470–1477
Taylor S, Rosten E, Drummond T (2009) Robust feature matching in 2.3μs. In: IEEE CVPR workshop on feature detectors and descriptors: the state of the art and beyond
Uchiyama H, Saito H (2009) Augmenting text document by on-line learning of local arrangement of keypoints. In: International symposium on mixed and augmented reality, pp 95–98
Wagner D, Reitmayr G, Mulloni A, Drummond T, Schmalstieg D (2008) Pose tracking from natural features on mobile phones. In: International symposium on mixed and augmented reality, Cambridge
Wagner D, Schmalstieg D, Bischof H (2009) Multiple target detection and tracking with guaranteed framerates on mobile phones. In: International symposium on mixed and augmented reality, Orlando
Wu C (2008) A GPU implementation of David Lowe’s scale invariant feature transform

Download references

Author information

Authors and Affiliations

Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan
Julien Pilet & Hideo Saito

Authors

Julien Pilet
View author publications
You can also search for this author in PubMed Google Scholar
Hideo Saito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Pilet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pilet, J., Saito, H. Dynamic learning, retrieval, and tracking to augment hundreds of photographs. Virtual Reality 18, 89–100 (2014). https://doi.org/10.1007/s10055-013-0228-7

Download citation

Received: 19 November 2009
Accepted: 15 November 2010
Published: 13 September 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10055-013-0228-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic learning, retrieval, and tracking to augment hundreds of photographs

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic learning, retrieval, and tracking to augment hundreds of photographs

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation