Skip to main content
Log in

Interactive 3D content insertion in images for multimedia applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This article addresses the problem of creating interactive mixed reality applications where virtual objects interact in images of real world scenarios. This is relevant to create games and architectural or space planning applications that interact with visual elements in the images such as walls, floors and empty spaces. These scenarios are intended to be captured by the users with regular cameras or using previously taken photographs. Introducing virtual objects in photographs presents several challenges, such as pose estimation and the creation of a visually correct interaction between virtual objects and the boundaries of the scene. The two main research questions addressed in this article include, the study of the feasibility of creating interactive augmented reality (AR) applications where virtual objects interact in a real world scenario using the image detected high-level features and, also, verifying if untrained users are capable and motivated enough to perform AR initialization steps. The proposed system detects the scene automatically from an image with additional features obtained using basic annotations from the user. This operation is significantly simple to accommodate the needs of non-expert users. The system analyzes one or more photos captured by the user and detects high-level features such as vanishing points, floor and scene orientation. Using these features it will be possible to create mixed and augmented reality applications where the user interactively introduces virtual objects that blend with the picture in real time and respond to the physical environment. To validate the solution several system tests are described and compared using available external image datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. York Urban Line Segment Database, http://www.elderlab.yorku.ca/YorkUrbanDB/.

  2. Flickr, Photo sharing website, http://www.flickr.com.

  3. Eye Pet, Sony Playstation camera game, http://www.eyepet.com/

  4. Atelier Pfister, smartphone application for furniture design, http://www.atelierpfister.ch/app.

  5. Layar, GPS augmented reality application, http://www.layar.com/.

  6. Google Project Tango, http://www.google.com/atap/projecttango/.

  7. Ikea 2014 catalogue, http://www.ikea.com/ms/en_AA/customer_service/catalogue/catalogue_2014.html.

References

  1. ARToolKit (2003) http://www.hitl.washington.edu/artoolkit/. (last access October 2013)

  2. Azuma R (1997) A survey of augmented reality. Presence-Teleoperators and Virtual Environments, MIT Press 4:355–385

    Article  Google Scholar 

  3. Bunnun P, Damen D, Calway A, Mayol-Cuevas W (2012) Integrating 3D object detection, modelling and tracking on a mobile phone. In: Proceedings of the 2012 IEEE international symposium on mixed and augmented reality (ISMAR’12). IEEE Computer Society, Atlanta, pp 273– 274

    Chapter  Google Scholar 

  4. Canny J (1986) A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8(6):679–698

    Article  Google Scholar 

  5. Coughlan JM, Yuille AL (1999) Manhattan world : compass direction from a single Image by bayesian inference. In: Proceedings of the international conference on computer vision (ICCV ’99), vol 2. IEEE Computer Society, Kerkyra, pp 1–10

    Google Scholar 

  6. Criminisi A, Reid I, Zisserman A (2000) Single view metrology. Int J Comput Vis, Springer 40(2):123–148

    Article  MATH  Google Scholar 

  7. Del Pero L, Guan J, Brau E, Schlecht J, Barnard K (2011) Sampling bedrooms. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’11). IEEE Computer Society, Colorado Springs, pp 2009–2016

    Google Scholar 

  8. Delong A, Boykov Y (2009) Globally optimal segmentation of multi-region objects. In: Proceedings of the IEEE 12th international conference on computer vision (ICCV’09). IEEE Computer Society, Kyoto, pp 285–292

    Chapter  Google Scholar 

  9. Fischler MA, Bolles RC (1981) Random sample consensus: a para- digm for model Fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  Google Scholar 

  10. Fite-Georgel P (2011) Is there a reality in industrial augmented reality?. In: Proceedings of the 2011 10th IEEE international symposium on mixed and augmented reality (ISMAR’11). IEEE Computer Society, Basel, pp 201–210

    Chapter  Google Scholar 

  11. Forsyth D (2013) Understanding pictures of rooms. Commun ACM 56(4):91

    Article  Google Scholar 

  12. Furukawa Y, Curless B, Seitz SM, Szeliski R (2009) Manhattan-world stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’09). IEEE Computer Society, Miami, pp 1422–1429

    Chapter  Google Scholar 

  13. Gioi RGV, Jakubowicz J, Morel JM, Randall G (2008) On straight line segment detection. Journal of Mathematical Imaging and Vision, Springer 32(3):1–45

    MathSciNet  Google Scholar 

  14. Gould S, Fulton R, Koller D (2009) Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the IEEE 12th international conference on computer vision (ICCV’09). IEEE Computer Society, Kyoto, pp 1–8

    Chapter  Google Scholar 

  15. Gupta A, Satkin S, Efros A, Hebert M (2011) From 3D scene geometry to human workspace. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’11). IEEE Computer Society, Colorado Springs, pp 1961–1968

    Google Scholar 

  16. Hedau V, Hoiem D (2009) Recovering the spatial layout of cluttered rooms. In: Proceedings of the IEEE 12th international conference on computer vision (ICCV’09). IEEE Computer Society, Kyoto, pp 1849–1856

    Chapter  Google Scholar 

  17. Hedau V, Hoiem D, Forsyth D (2012) Recovering free space of indoor scenes from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’12). IEEE Computer Society, Providence, pp 2807–2814

    Chapter  Google Scholar 

  18. Hoiem D, Efros A.a, Hebert M (2007). Recovering surface layout from an image. International Journal of Computer Vision, Springer 75(1):151–172

    Google Scholar 

  19. Hough PVC Method and means for recognizing complex patterns

  20. Karsch K, Hedau V, Forsyth D (2011) Rendering synthetic objects into legacy photographs. ACM Transactions on Graphics (TOG) 30(6):1–12

    Article  Google Scholar 

  21. Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: Proceedings of the 6th IEEE and ACM international symposium on mixed and augmented reality (ISMAR’07), vol. 07. IEEE Computer Society, Nara, pp 1–10

    Chapter  Google Scholar 

  22. Lee DC, Hebert M, Kanade T (2009) Geometric reasoning for single image structure recovery. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’09). IEEE Computer Society, Miami, pp 2136–2143

    Chapter  Google Scholar 

  23. Lee D, Gupta A, Hebert M (2010) Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. NIPS Foundation 1:1–9

    Google Scholar 

  24. Li B, Peng K, Ying X, Zha H (2012) Vanishing point detection using cascaded 1D Hough Transform from single images. Pattern Recogn Lett, Elsevier 33 (1):1–8

    Article  Google Scholar 

  25. Liu B, Gould S (2010) Single image depth estimation from predicted semantic labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’10). IEEE Computer Society, San Francisco

    Google Scholar 

  26. Liu H, Jiang S, Huang Q, Xu C (2008) A generic virtual content insertion system based on visual attention analysis. In: Proceedings of the 16th ACM international conference on multimedia (MM ’08). ACM, Vancouver, pp 379–388

    Chapter  Google Scholar 

  27. Metaio (2013) http://www.metaio.com/. (last access October 2013)

  28. Milgram P, Takemura H, Ustimi A, Kishino F (1994) Augmented reality: a class of display on the reality-virtuality continuum. Telemanipulator and Telepresence Technologies 2351:282–292

    Article  Google Scholar 

  29. Mulloni A, Seichter H, Schmalstieg D (2012) Indoor navigation with mixed reality world-in-miniature views and sparse localization on mobile devices. In: Proceedings of the 2012 international working conference on advaced visual interfaces (AVI’12). ACM, Capri Island, pp 212–215

    Google Scholar 

  30. Nguyen V, Tran M, Le T, Bui Q, Duong A (2012) Augmented media for traditional magazines. In: Proceedings of the third symposium on information and communication technology (SoICT ’12). ACM, Da Nang, pp 97–106

    Chapter  Google Scholar 

  31. Nóbrega R, Correia N (2012) Magnetic augmented reality: virtual objects in your space. In: Proceedings of the 2012 international working conference on advanced visual interfaces (AVI’12). ACM, Capri Island, pp 332–335

    Google Scholar 

  32. Nóbrega R., Correia N. (2013) Photo-based multimedia applications using image features detection. In: Proceedings of international conference on computer graphics theory and applications (GRAPP’13). INSTICC Press, Barcelona, pp 298–307

    Google Scholar 

  33. Nóbrega R, Correia N (2014) Dynamic Insertion of virtual objects in photographs. Int J Creative Interfaces Comput Graph (IJCICG) 4(2):22–39

  34. OpenCV (2013) http://opencv.org. (last access October 2013)

  35. openFrameworks (2013) http://www.openframeworks.cc. (last access October 2013)

  36. Rother C (2002) A new approach to vanishing point detection in architectural environments. Image Vis Comput, Elsevier 20(1):647–655

    Article  Google Scholar 

  37. Rother C, Kolmogorov V (2004) GrabCut Interactive foreground extraction using iterated Graph Cuts. ACM Transactions on Graphics (TOG)

  38. Saxena A, Sun M, Ng AY (2009) Make3D: learning 3D scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell (PAMI) 31(5):824–40

    Article  Google Scholar 

  39. Simon G. (2006) Automatic online walls detection for immediate use in AR tasks. In: Proceedings IEEE international symposium on mixed and augmented reality (ISMAR’06). IEEE Computer Society, Santa Barbara, pp 4–7

    Google Scholar 

  40. Simon G, Berger MO (2002) Pose estimation for planar structures. IEEE Comput Graph Appl 22(6):46–53

    Article  Google Scholar 

  41. Simon G, Fitzgibbon AW, Zisserman A (2000) Markerless tracking using planar structures in the scene. In: Proceedings IEEE and ACM international symposium on augmented reality (ISAR’00), vol 9. IEEE Computer Society, Munich, pp 120–128

    Chapter  Google Scholar 

  42. StudiertubeTracker (2011) http://handheldar.icg.tugraz.at/stbtracker.php. (last access October 2013)

  43. Tillon AB, Marchal I (2011) Mobile augmented reality in the museum: Can a lace-like technology take you closer to works of art?. In: Proceedings of the IEEE international symposium on mixed and augmented reality - arts, media, and humanities (ISMAR-AMH). IEEE Computer Society, Basel, pp 41–47

    Google Scholar 

  44. Uchiyama H (2011) Toward augmenting everything: Detecting and tracking geometrical features on planar objects. In: Proceedings of the 2011 10th IEEE international symposium on mixed and augmented reality (ISMAR’11). IEEE Computer Society, Basel, pp 17–25

    Chapter  Google Scholar 

  45. Uchiyama H, Teichrieb V, Marchand E (2012) Texture-less planar object detection and pose estimation using depth-assisted rectification of contours. In: Proceedings of the 2012 IEEE international symposium on mixed and augmented reality (ISMAR’12). IEEE Computer Society, Atlanta, pp 297–298

    Google Scholar 

  46. Vallino J (1998) Interactive augmented reality

  47. von Gioi R, Jakubowicz J, Randall G (2007) Multisegment detection. In: Proceedings of the IEEE international conference on image processing (ICIP’07). IEEE Computer Society, San Antonio, pp 1–4

    Google Scholar 

  48. Vuforia (2013) https://www.vuforia.com/. (last access October 2013)

  49. Wagner D, Schmalstieg D, Bischof H (2009) Multiple target detection and tracking with guaranteed framerates on mobile phones. In: Proceedings of the 8th IEEE international symposium on mixed and augmented reality (ISMAR’09). IEEE Computer Society, Orlando, pp 57–64

    Chapter  Google Scholar 

  50. Wagner D, Reitmayr G, Mulloni A, Drummond T, Schmalstieg D (2010) Real-time detection and tracking for augmented reality on mobile phones. IEEE Trans Vis Comput Graph 16(3):355–368

    Article  Google Scholar 

  51. Xiong X, Munoz D, Bagnell JA, Hebert M (2011) 3-D scene analysis via sequenced predictions over points and regions. In: Proceedings of the IEEE international conference on robotics and automation (ICRA’11). IEEE Computer Society, Shanghai, pp 2609–2616

    Chapter  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the support from everyone at IMG and CITI. This work was funded by the Portuguese Science and Technology Foundation, FCT/MEC, through grants SFRH/BD/47511/2008, PEst-OE/EEI/UI0527/2011 (CITI/FCT/UNL now NOVA-LINCS) and to the MAT Project. The Media Arts and Technologies project (MAT), NORTE-07-0124-FEDER-000061, is financed by the North Portugal Regional Operational Programme (ON.2 O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF), and by national funds, through the Portuguese funding agency, Fundao para a Cincia e a Tecnologia (FCT). The authors also thank the Project I-City for Future Mobility: NORTE-07-0124-FEDER-000064, and European Project FP7 Future Cities: FP7-REGPOT-2012-2013-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Nóbrega.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nóbrega, R., Correia, N. Interactive 3D content insertion in images for multimedia applications. Multimed Tools Appl 76, 163–197 (2017). https://doi.org/10.1007/s11042-015-3031-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3031-5

Keywords

Navigation