Bringing 3D Models Together: Mining Video Liaisons in Crowdsourced Reconstructions

Wang, Ke; Dunn, Enrique; Rodriguez, Mikel; Frahm, Jan-Michael

doi:10.1007/978-3-319-54190-7_25

Ke Wang¹⁷,
Enrique Dunn¹⁸,
Mikel Rodriguez¹⁹ &
…
Jan-Michael Frahm¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10114))

Included in the following conference series:

Asian Conference on Computer Vision

1965 Accesses
1 Citations

Abstract

The recent advances in large-scale scene modeling have enabled the automatic 3D reconstruction of landmark sites from crowdsourced photo collections. Here, we address the challenge of leveraging crowdsourced video collections to identify connecting visual observations that enable the alignment and subsequent aggregation, of disjoint 3D models. We denote these connecting image sequences as video liaisons and develop a data-driven framework for fully unsupervised extraction and exploitation. Towards this end, we represent video contents in terms of a histogram representation of iconic imagery contained within existing 3D models attained from a photo collection. We then use this representation to efficiently identify and prioritize the analysis of individual videos within a large-scale video collection, in an effort to determine camera motion trajectories connecting different landmarks. Results on crowdsourced data illustrate the efficiency and effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S.M., Szeliski, R.: Building rome in a day. Commun. ACM 54, 105–112 (2011)
Article Google Scholar
Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org
Ahmed, M.T., Dailey, M.N., Landabaso, J.L., Herrero, N.: Robust key frame extraction for 3d reconstruction from video streams. In: VISAPP (1), pp. 231–236 (2010)
Google Scholar
Ajmal, M., Ashraf, M.H., Shakir, M., Abbas, Y., Shah, F.A.: Video summarization: techniques and classification. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 1–13. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33564-8_1
Chapter Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. TPAMI 24(5), 603–619 (2002)
Article Google Scholar
Frahm, J.-M., et al.: Building rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_27
Chapter Google Scholar
Heinly, J., Schonberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the yahoo 100 million image dataset). In: CVPR (2015)
Google Scholar
Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 41(6), 797–819 (2011)
Article Google Scholar
Klingner, B., Martin, D., Roseborough, J.: Street view motion-from-structure-from-motion. In: ICCV (2013)
Google Scholar
Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M.: Modeling and recognition of landmark image collections using iconic scene graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88682-2_33
Chapter Google Scholar
Lou, Y., Snavely, N., Gehrke, J.: MatchMiner: efficient spanning structure mining in large image collections. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 45–58. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_4
Chapter Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Meeker, M.: Internet trends (2016). http://www.kpcb.com/internet-trends
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994)
Google Scholar
Smith, C.: By the numbers: 135 amazing youtube statistics. http://expandedramblings.com/index.php/youtube-statistics/
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM TOG (2006)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. IJCV 80, 189–210 (2008)
Article Google Scholar
Tompkin, J., Kim, K.I., Kautz, J., Theobalt, C.: Videoscapes: exploring sparse, unstructured video collections. In: ACM TOG (2012)
Google Scholar
Zach, C., Gallup, D., Frahm, J.M.: Fast gain-adaptive KLT tracking on the GPU. In: CVPR Workshops (2008)
Google Scholar
Zheng, E., Wang, K., Dunn, E., Frahm, J.-M.: Joint object class sequencing and trajectory triangulation (JOST). In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 599–614. Springer, Cham (2014). doi:10.1007/978-3-319-10584-0_39
Google Scholar

Download references

Acknowledgement

Supported in part by the NSF No. IIS-1349074, No. CNS-1405847. Partially funded by MITRE Corp.

Author information

Authors and Affiliations

Department of Computer Science, University of North Carolina, Chapel Hill, USA
Ke Wang & Jan-Michael Frahm
Department of Computer Science, Stevens Institute of Technology, Hoboken, USA
Enrique Dunn
Mitre Corporation, Mclean, USA
Mikel Rodriguez

Authors

Ke Wang
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Dunn
View author publications
You can also search for this author in PubMed Google Scholar
Mikel Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Michael Frahm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Wang .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo, Tokyo, Japan
Yoichi Sato

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 9781 KB)

Supplementary material 1 (pdf 597 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, K., Dunn, E., Rodriguez, M., Frahm, JM. (2017). Bringing 3D Models Together: Mining Video Liaisons in Crowdsourced Reconstructions. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10114. Springer, Cham. https://doi.org/10.1007/978-3-319-54190-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-54190-7_25
Published: 12 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54189-1
Online ISBN: 978-3-319-54190-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics