Skip to main content

Bio-Inspired Architecture for Deriving 3D Models from Video Sequences

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10117))

Abstract

In an everyday context, automatic or interactive 3D reconstruction of objects from one or several videos is not yet possible. Humans, on the contrary, are capable of recognizing the 3D shape of objects even in complex video sequences. To enable machines for doing the same, we propose a bio-inspired processing architecture, which is motivated by the human visual system and converts video data into 3D representations. Similar to the hierarchy of the ventral stream, our process reduces the influence of the position information in the video sequences by object recognition and represents the object of interest as multiple pictorial representations. These multiple pictorial representations are showing 2D projections of the object of interest from different perspectives. Thus, a 3D point cloud can be obtained by multiple view geometry algorithms. In the course of a detailed presentation of this architecture, we additionally highlight existing analogies to the view-combination scheme. The potency of our architecture is shown by reconstructing a car out of two video sequences. In case the automatic processing cannot complete the task, the user is put in the loop to solve the problem interactively. This human-machine interaction facilitates a prototype implementation of the architecture, which can reconstruct 3D objects out of one or several videos. In conclusion, the strengths and limitations of our approach are discussed, followed by an outlook to future work to improve the architecture.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://ikw.uos.de/%7Ecv/publications/3DMA16.

References

  1. Agisoft: Agisoft PhotoScan (2016), http://www.agisoft.ru/

  2. Arikan, M., Schwärzler, M., Flöry, S., Wimmer, M., Maierhofer, S.: O-Snap: optimization-based snapping for modeling architecture. ACM Trans. Graph. 32(1), 6:1–6:15 (2013)

    Article  MATH  Google Scholar 

  3. Autodesk Inc.: Autodesk 123D Catch\(|\)3D model from photos (2016). http://www.123dapp.com/catch

  4. Bernardini, F., Mittleman, J., Rushmeier, H., Silva, C., Taubin, G.: The ball-pivoting algorithm for surface reconstruction. IEEE Trans. Vis. Comput. Graph. 5(4), 349–359 (1999)

    Article  Google Scholar 

  5. Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object deection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)

    Article  MathSciNet  Google Scholar 

  6. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: IEEE International Conference on Computer Vision (ICCV), pp. 105–112 (2001)

    Google Scholar 

  7. Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-Sweep. ACM Trans. Graph. 32(6), 1–10 (2013)

    Google Scholar 

  8. Dasiopoulou, S., Giannakidou, E., Litos, G., Malasioti, P., Kompatsiaris, Y.: A survey of semantic image and video annotation tools. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. LNCS (LNAI), vol. 6050, pp. 196–239. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20795-2_8

    Chapter  Google Scholar 

  9. Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Computer Graphics and Interactive Techniques - SIGGRAPH, pp. 11–20 (1996)

    Google Scholar 

  10. Doermann, D., Mihalcik, D.: Tools and techniques for video performance evaluation. Int. Conf. Recogn. (ICPR) 4, 167–170 (2000)

    Google Scholar 

  11. Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR), pp. 2141–2148 (2010)

    Google Scholar 

  12. van den Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: VideoTrace: rapid interactive scene modelling from video. ACM Trans. Graph. 26(3), 86:1–86:6 (2007)

    Google Scholar 

  13. van den Hengel, A., Hill, R., Ward, B., Dick, A.: In situ image-based modeling. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 107–110 (2009)

    Google Scholar 

  14. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental Robotics, pp. 477–491. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  15. Itseez: OpenCV — OpenCV (2016). http://opencv.org/

  16. Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. 32(3), 1–13 (2013)

    Article  MATH  Google Scholar 

  17. Kholgade, N., Simon, T., Efros, A., Sheikh, Y.: 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graph. 33(4), 127:1–127:13 (2014)

    Article  Google Scholar 

  18. Kowdle, A., Chang, Y.J., Gallagher, A., Batra, D., Chen, T.: Putting the user in the loop for image-based modeling. Int. J. Comput. Vis. 108(1), 30–48 (2014)

    Article  MathSciNet  Google Scholar 

  19. Kurzhals, K., Bopp, C.F., Bässler, J., Ebinger, F., Weiskopf, D.: Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. In: Workshop on BELIV, pp. 54–60 (2014)

    Google Scholar 

  20. Matroska: Matroska media container (2016). https://www.matroska.org/

  21. MeshLab: Meshlab (2016). http://meshlab.sourceforge.net/

  22. Multimedia Knowledge and Social Media Analytics Laboratory: Video image annotation tool (2015). http://mklab.iti.gr/project/via

  23. Musialski, P., Wonka, P., Aliaga, D.G., Wimmer, M., Gool, L., Purgathofer, W.: A survey of urban reconstruction. Comput. Graph. Forum. 32, 146–177 (2013)

    Article  Google Scholar 

  24. Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011)

    Google Scholar 

  25. Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)

    Article  Google Scholar 

  26. Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: probabilistic feature-based on-line rapid model acquisition, pp. 112:1–112:11. British Machine Vision Conference (BMVC) (2009)

    Google Scholar 

  27. Pintore, G., Gobbetti, E.: Effective mobile mapping of multi-room indoor structures. Vis. Comput. 30(6), 707–716 (2014)

    Article  Google Scholar 

  28. Pollefeys, M., Nistér, D., Frahm, J.M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., Towles, H.: Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vis. 78(2), 143–167 (2008)

    Article  Google Scholar 

  29. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. Int. J. Comput. Vis. 59(3), 207–232 (2004)

    Article  Google Scholar 

  30. Rother, C., Kolmogorov, V., Blake, A.: GrabCut. ACM Trans. Graph. 23(3), 309–314 (2004)

    Article  Google Scholar 

  31. Schöning, J.: Interactive 3D reconstruction: new opportunities for getting CAD-ready models. In: Imperial College Computing Student Workshop (ICCSW). OpenAccess Series in Informatics (OASIcs), vol. 49, pp. 54–61. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2015)

    Google Scholar 

  32. Schöning, J., Faion, P., Heidemann, G.: Semi-automatic ground truth annotation in videos: an interactive tool for polygon-based object annotation and segmentation. In: International Conference on Knowledge Capture (K-CAP), pp. 17:1–17:4. ACM, New York (2015)

    Google Scholar 

  33. Schöning, J., Faion, P., Heidemann, G.: Pixel-wise ground truth annotation in videos - an semi-automatic approach for pixel-wise and semantic object annotation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 690–697. SCITEPRESS (2016)

    Google Scholar 

  34. Schöning, J., Faion, P., Heidemann, G., Krumnack, U.: Eye tracking data in multimedia containers for instantaneous visualizations. In: IEEE VIS Workshop on Eye Tracking and Visualization (ETVIS), IEEE (2016)

    Google Scholar 

  35. Schöning, J., Faion, P., Heidemann, G., Krumnack, U.: Providing video annotations in multimedia containers for visualization and research. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)

    Google Scholar 

  36. Schöning, J., Heidemann, G.: Evaluation of multi-view 3D reconstruction software. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9257, pp. 450–461. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23117-4_39

    Chapter  Google Scholar 

  37. Schöning, J., Heidemann, G.: Taxonomy of 3D sensors - a survey of state-of-the-art consumer 3D-reconstruction sensors and their field of applications. In: Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), vol. 3, pp. 194–199. SCITEPRESS (2016)

    Google Scholar 

  38. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)

    Article  Google Scholar 

  39. Solem, J.E.: Programming Computer Vision with Python: Tools and Algorithms for Analyzing Images. O’Reilly Media Inc., Sebastopol (2012)

    Google Scholar 

  40. Sub Station Alpha: Sub station alpha v4.00+ script format (2016). http://moodub.free.fr/video/ass-specs.doc

  41. Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., Pollefeys, M.: Live metric 3D reconstruction on mobile phones. In: IEEE International Conference on Computer Vision (ICCV), pp. 65–72. IEEE (2013)

    Google Scholar 

  42. The MathWorks Inc: MATLAB - MathWorks (2016). http://mathworks.com/products/matlab

  43. Ullman, S.: High-level Vision: Object Recognition and Visual Cognition, 2nd edn. MIT Press, Cambridge (1997)

    MATH  Google Scholar 

  44. Ungerleider, L.: What and where in the human brain. Curr. Opin. Neurobiol. 4(2), 157165 (1994)

    Article  Google Scholar 

  45. Ungerleider, L., Mishkin, M.: Two cortical visual systems. In: Ingle, D., Goodale, M., Mansfield, R. (eds.) Analysis Visual Behavior, pp. 549–586. MIT Press, Boston (1982)

    Google Scholar 

  46. Valentin, J., Torr, P., Vineet, V., Cheng, M.M., Kim, D., Shotton, J., Kohli, P., Niener, M., Criminisi, A., Izadi, S.: Semanticpaint. ACM Trans. Graph. 34(5), 1–17 (2015)

    Article  Google Scholar 

  47. Wu, C.: VisualSfM: a visual structure from motion system (2016). http://ccwu.me/vsfm/

  48. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 75–82 (2014)

    Google Scholar 

  49. Zhang, Y., Gibson, G.M., Hay, R., Bowman, R.W., Padgett, M.J., Edgar, M.P.: A fast 3D reconstruction system with a low-cost camera accessory. Sci. Rep. 5, 10909:1–10909:7 (2015)

    Google Scholar 

  50. Zhang, Z., Tan, T., Huang, K., Wang, Y.: Three-dimensional deformable-model-based localization and recognition of road vehicles. IEEE Trans. Image Process. 21(1), 113 (2012)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julius Schöning .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 27452 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Schöning, J., Heidemann, G. (2017). Bio-Inspired Architecture for Deriving 3D Models from Video Sequences. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10117. Springer, Cham. https://doi.org/10.1007/978-3-319-54427-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54427-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54426-7

  • Online ISBN: 978-3-319-54427-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics