Bio-Inspired Architecture for Deriving 3D Models from Video Sequences

Schöning, Julius; Heidemann, Gunther

doi:10.1007/978-3-319-54427-4_5

Bio-Inspired Architecture for Deriving 3D Models from Video Sequences

Julius Schöning¹⁶ &
Gunther Heidemann¹⁶

Conference paper
First Online: 16 March 2017

2082 Accesses
3 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10117))

Abstract

In an everyday context, automatic or interactive 3D reconstruction of objects from one or several videos is not yet possible. Humans, on the contrary, are capable of recognizing the 3D shape of objects even in complex video sequences. To enable machines for doing the same, we propose a bio-inspired processing architecture, which is motivated by the human visual system and converts video data into 3D representations. Similar to the hierarchy of the ventral stream, our process reduces the influence of the position information in the video sequences by object recognition and represents the object of interest as multiple pictorial representations. These multiple pictorial representations are showing 2D projections of the object of interest from different perspectives. Thus, a 3D point cloud can be obtained by multiple view geometry algorithms. In the course of a detailed presentation of this architecture, we additionally highlight existing analogies to the view-combination scheme. The potency of our architecture is shown by reconstructing a car out of two video sequences. In case the automatic processing cannot complete the task, the user is put in the loop to solve the problem interactively. This human-machine interaction facilitates a prototype implementation of the architecture, which can reconstruct 3D objects out of one or several videos. In conclusion, the strengths and limitations of our approach are discussed, followed by an outlook to future work to improve the architecture.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://ikw.uos.de/%7Ecv/publications/3DMA16.

References

Agisoft: Agisoft PhotoScan (2016), http://www.agisoft.ru/
Arikan, M., Schwärzler, M., Flöry, S., Wimmer, M., Maierhofer, S.: O-Snap: optimization-based snapping for modeling architecture. ACM Trans. Graph. 32(1), 6:1–6:15 (2013)
Article MATH Google Scholar
Autodesk Inc.: Autodesk 123D Catch\(|\)3D model from photos (2016). http://www.123dapp.com/catch
Bernardini, F., Mittleman, J., Rushmeier, H., Silva, C., Taubin, G.: The ball-pivoting algorithm for surface reconstruction. IEEE Trans. Vis. Comput. Graph. 5(4), 349–359 (1999)
Article Google Scholar
Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object deection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)
Article MathSciNet Google Scholar
Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: IEEE International Conference on Computer Vision (ICCV), pp. 105–112 (2001)
Google Scholar
Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-Sweep. ACM Trans. Graph. 32(6), 1–10 (2013)
Google Scholar
Dasiopoulou, S., Giannakidou, E., Litos, G., Malasioti, P., Kompatsiaris, Y.: A survey of semantic image and video annotation tools. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. LNCS (LNAI), vol. 6050, pp. 196–239. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20795-2_8
Chapter Google Scholar
Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Computer Graphics and Interactive Techniques - SIGGRAPH, pp. 11–20 (1996)
Google Scholar
Doermann, D., Mihalcik, D.: Tools and techniques for video performance evaluation. Int. Conf. Recogn. (ICPR) 4, 167–170 (2000)
Google Scholar
Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR), pp. 2141–2148 (2010)
Google Scholar
van den Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: VideoTrace: rapid interactive scene modelling from video. ACM Trans. Graph. 26(3), 86:1–86:6 (2007)
Google Scholar
van den Hengel, A., Hill, R., Ward, B., Dick, A.: In situ image-based modeling. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 107–110 (2009)
Google Scholar
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental Robotics, pp. 477–491. Springer, Heidelberg (2014)
Chapter Google Scholar
Itseez: OpenCV — OpenCV (2016). http://opencv.org/
Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. 32(3), 1–13 (2013)
Article MATH Google Scholar
Kholgade, N., Simon, T., Efros, A., Sheikh, Y.: 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graph. 33(4), 127:1–127:13 (2014)
Article Google Scholar
Kowdle, A., Chang, Y.J., Gallagher, A., Batra, D., Chen, T.: Putting the user in the loop for image-based modeling. Int. J. Comput. Vis. 108(1), 30–48 (2014)
Article MathSciNet Google Scholar
Kurzhals, K., Bopp, C.F., Bässler, J., Ebinger, F., Weiskopf, D.: Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. In: Workshop on BELIV, pp. 54–60 (2014)
Google Scholar
Matroska: Matroska media container (2016). https://www.matroska.org/
MeshLab: Meshlab (2016). http://meshlab.sourceforge.net/
Multimedia Knowledge and Social Media Analytics Laboratory: Video image annotation tool (2015). http://mklab.iti.gr/project/via
Musialski, P., Wonka, P., Aliaga, D.G., Wimmer, M., Gool, L., Purgathofer, W.: A survey of urban reconstruction. Comput. Graph. Forum. 32, 146–177 (2013)
Article Google Scholar
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011)
Google Scholar
Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)
Article Google Scholar
Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: probabilistic feature-based on-line rapid model acquisition, pp. 112:1–112:11. British Machine Vision Conference (BMVC) (2009)
Google Scholar
Pintore, G., Gobbetti, E.: Effective mobile mapping of multi-room indoor structures. Vis. Comput. 30(6), 707–716 (2014)
Article Google Scholar
Pollefeys, M., Nistér, D., Frahm, J.M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., Towles, H.: Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vis. 78(2), 143–167 (2008)
Article Google Scholar
Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. Int. J. Comput. Vis. 59(3), 207–232 (2004)
Article Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: GrabCut. ACM Trans. Graph. 23(3), 309–314 (2004)
Article Google Scholar
Schöning, J.: Interactive 3D reconstruction: new opportunities for getting CAD-ready models. In: Imperial College Computing Student Workshop (ICCSW). OpenAccess Series in Informatics (OASIcs), vol. 49, pp. 54–61. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2015)
Google Scholar
Schöning, J., Faion, P., Heidemann, G.: Semi-automatic ground truth annotation in videos: an interactive tool for polygon-based object annotation and segmentation. In: International Conference on Knowledge Capture (K-CAP), pp. 17:1–17:4. ACM, New York (2015)
Google Scholar
Schöning, J., Faion, P., Heidemann, G.: Pixel-wise ground truth annotation in videos - an semi-automatic approach for pixel-wise and semantic object annotation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 690–697. SCITEPRESS (2016)
Google Scholar
Schöning, J., Faion, P., Heidemann, G., Krumnack, U.: Eye tracking data in multimedia containers for instantaneous visualizations. In: IEEE VIS Workshop on Eye Tracking and Visualization (ETVIS), IEEE (2016)
Google Scholar
Schöning, J., Faion, P., Heidemann, G., Krumnack, U.: Providing video annotations in multimedia containers for visualization and research. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)
Google Scholar
Schöning, J., Heidemann, G.: Evaluation of multi-view 3D reconstruction software. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9257, pp. 450–461. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23117-4_39
Chapter Google Scholar
Schöning, J., Heidemann, G.: Taxonomy of 3D sensors - a survey of state-of-the-art consumer 3D-reconstruction sensors and their field of applications. In: Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), vol. 3, pp. 194–199. SCITEPRESS (2016)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)
Article Google Scholar
Solem, J.E.: Programming Computer Vision with Python: Tools and Algorithms for Analyzing Images. O’Reilly Media Inc., Sebastopol (2012)
Google Scholar
Sub Station Alpha: Sub station alpha v4.00+ script format (2016). http://moodub.free.fr/video/ass-specs.doc
Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., Pollefeys, M.: Live metric 3D reconstruction on mobile phones. In: IEEE International Conference on Computer Vision (ICCV), pp. 65–72. IEEE (2013)
Google Scholar
The MathWorks Inc: MATLAB - MathWorks (2016). http://mathworks.com/products/matlab
Ullman, S.: High-level Vision: Object Recognition and Visual Cognition, 2nd edn. MIT Press, Cambridge (1997)
MATH Google Scholar
Ungerleider, L.: What and where in the human brain. Curr. Opin. Neurobiol. 4(2), 157165 (1994)
Article Google Scholar
Ungerleider, L., Mishkin, M.: Two cortical visual systems. In: Ingle, D., Goodale, M., Mansfield, R. (eds.) Analysis Visual Behavior, pp. 549–586. MIT Press, Boston (1982)
Google Scholar
Valentin, J., Torr, P., Vineet, V., Cheng, M.M., Kim, D., Shotton, J., Kohli, P., Niener, M., Criminisi, A., Izadi, S.: Semanticpaint. ACM Trans. Graph. 34(5), 1–17 (2015)
Article Google Scholar
Wu, C.: VisualSfM: a visual structure from motion system (2016). http://ccwu.me/vsfm/
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 75–82 (2014)
Google Scholar
Zhang, Y., Gibson, G.M., Hay, R., Bowman, R.W., Padgett, M.J., Edgar, M.P.: A fast 3D reconstruction system with a low-cost camera accessory. Sci. Rep. 5, 10909:1–10909:7 (2015)
Google Scholar
Zhang, Z., Tan, T., Huang, K., Wang, Y.: Three-dimensional deformable-model-based localization and recognition of road vehicles. IEEE Trans. Image Process. 21(1), 113 (2012)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Cognitive Science, Osnabrück University, Osnabrück, Germany
Julius Schöning & Gunther Heidemann

Authors

Julius Schöning
View author publications
You can also search for this author in PubMed Google Scholar
Gunther Heidemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julius Schöning .

Editor information

Editors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Chu-Song Chen
Tsinghua University, Beijing, China
Jiwen Lu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Kai-Kuang Ma

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 27452 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schöning, J., Heidemann, G. (2017). Bio-Inspired Architecture for Deriving 3D Models from Video Sequences. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10117. Springer, Cham. https://doi.org/10.1007/978-3-319-54427-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-54427-4_5
Published: 16 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54426-7
Online ISBN: 978-3-319-54427-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics