Skip to main content

Advertisement

Log in

Unified framework for recognition, localization and mapping using wearable cameras

  • Short Report
  • Published:
Cognitive Processing Aims and scope Submit manuscript

Abstract

Monocular approaches to simultaneous localization and mapping (SLAM) have recently addressed with success the challenging problem of the fast computation of dense reconstructions from a single, moving camera. Thus, if these approaches initially relied on the detection of a reduced set of interest points to estimate the camera position and the map, they are currently able to reconstruct dense maps from a handheld camera while the camera coordinates are simultaneously computed. However, these maps of 3-dimensional points usually remain meaningless, that is, with no memorable items and without providing a way of encoding spatial relationships between objects and paths. In humans and mobile robotics, landmarks play a key role in the internalization of a spatial representation of an environment. They are memorable cues that can serve to define a region of the space or the location of other objects. In a topological representation of the space, landmarks can be identified and located according to its structural, perceptive or semantic significance and distinctiveness. But on the other hand, landmarks may be difficult to be located in a metric representation of the space. Restricted to the domain of visual landmarks, this work describes an approach where the map resulting from a point-based, monocular SLAM is annotated with the semantic information provided by a set of distinguished landmarks. Both features are obtained from the image. Hence, they can be linked by associating to each landmark all those point-based features that are superimposed to the landmark in a given image (key-frame). Visual landmarks will be obtained by means of an object-based, bottom-up attention mechanism, which will extract from the image a set of proto-objects. These proto-objects could not be always associated with natural objects, but they will typically constitute significant parts of these scene objects and can be appropriately annotated with semantic information. Moreover, they will be affine covariant regions, that is, they will be invariant to affine transformation, being detected under different viewing conditions (view-point angle, rotation, scale, etc.). Monocular SLAM will be solved using the accurate parallel tracking and mapping (PTAM) framework by Klein and Murray in Proceedings of IEEE/ACM international symposium on mixed and augmented reality, 2007.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Ahn S, Choi M, Choi J, Chung W (2006) Data association using visual object recognition for ekf-slam in home environment. In: Proceedings of IEEE/RSJ international conference on intelligent robots systems

  • Bandera A, Marfil R, Vázquez-Martín R (2010) Incremental learning of visual landmarks for mobile robotics. In: Proceedings of international conference on pattern recognition, pp 4255–4258

  • Caduff D, Timpf S (2008) On the assessment of landmark salience for human navigation. Cogn Process 9:249–267

    Article  PubMed  Google Scholar 

  • Castle RO, Gawley DJ, Klein G, Murray DW (2007) Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In: Proceedings of IEEE international conference on robotics and automation

  • Davison AJ (2003) Real-time simultaneous localization and mapping with a single camera. In: Proceedings of international conference on computer vision, pp 1403–1410

  • Davison AJ, Murray D (2002) Simultaneous localization and map-building using active vision. IEEE Trans Pattern Anal Mach Intell 24(7):865–880

    Article  Google Scholar 

  • Davison AJ, Mayol WW, Murray DW (2003) Real-time localization and mapping with wearable active vision. In: Proceedings of IEEE/ACM international symposium on mixed and augmented reality

  • Heth CD, Cornell EH, Alberts DM (1997) Differential use of landmarks by 8- and 12- year-old children during route reversal navigation. J Environ Psychol 17:199–213

    Article  Google Scholar 

  • Klein G, Murray DW (2007) Parallel tracking and mapping for small AR workspaces. Proceedings of IEEE/ACM international symposium on mixed and augmented reality

  • Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LV (2006) A comparison of affine region detectors. Int J Comput Vis 65:43–72

    Article  Google Scholar 

  • Newcombe R, Lovegrove S, Davison AJ (2011a) DTAM: dense tracking and mapping in real-time. In: Proceedings of international conference on computer vision

  • Newcombe R et al (2011b) KinectFusion: real-time dense surface mapping and tracking. In: Proceedings of IEEE/ACM international symposium on mixed and augmented reality

  • Rensink R (2000) The dynamic representation of scenes. Vis Cogn 7:17–42

    Article  Google Scholar 

  • Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 32:105–119

    Article  PubMed  Google Scholar 

  • Stühmer J, Gumhold S, Cremers D (2010) Real-time dense geometry from a handheld camera. In: Proceedings of the 32nd DAGM conference on pattern recognition, pp 11–20

  • Vázquez-Martín R, Marfil R, Núñez P, Bandera A, Sandoval F (2009) A novel approach for salient image regions detection and description. Pattern Recognit Lett 30(16):1464–1476

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially granted by the Spanish Government project TIN2011-27512-C05-01 and by the Junta de Andalucía project P07-TIC-03106. The source code of the PTAM approach has been downloaded from the Georg Klein Home Page (http://www.robots.ox.ac.uk/~gk/PTAM/).

Conflict of interest

This supplement was not sponsored by outside commercial interests. It was funded entirely by ECONA, Via dei Marsi, 78, 00185 Roma, Italy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Vázquez-Martín.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vázquez-Martín, R., Bandera, A. Unified framework for recognition, localization and mapping using wearable cameras. Cogn Process 13 (Suppl 1), 351–354 (2012). https://doi.org/10.1007/s10339-012-0496-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10339-012-0496-2

Keywords

Navigation