Skip to main content

Bringing 3D Models Together: Mining Video Liaisons in Crowdsourced Reconstructions

  • Conference paper
  • First Online:
Computer Vision – ACCV 2016 (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10114))

Included in the following conference series:

Abstract

The recent advances in large-scale scene modeling have enabled the automatic 3D reconstruction of landmark sites from crowdsourced photo collections. Here, we address the challenge of leveraging crowdsourced video collections to identify connecting visual observations that enable the alignment and subsequent aggregation, of disjoint 3D models. We denote these connecting image sequences as video liaisons and develop a data-driven framework for fully unsupervised extraction and exploitation. Towards this end, we represent video contents in terms of a histogram representation of iconic imagery contained within existing 3D models attained from a photo collection. We then use this representation to efficiently identify and prioritize the analysis of individual videos within a large-scale video collection, in an effort to determine camera motion trajectories connecting different landmarks. Results on crowdsourced data illustrate the efficiency and effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S.M., Szeliski, R.: Building rome in a day. Commun. ACM 54, 105–112 (2011)

    Article  Google Scholar 

  2. Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org

  3. Ahmed, M.T., Dailey, M.N., Landabaso, J.L., Herrero, N.: Robust key frame extraction for 3d reconstruction from video streams. In: VISAPP (1), pp. 231–236 (2010)

    Google Scholar 

  4. Ajmal, M., Ashraf, M.H., Shakir, M., Abbas, Y., Shah, F.A.: Video summarization: techniques and classification. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 1–13. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33564-8_1

    Chapter  Google Scholar 

  5. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. TPAMI 24(5), 603–619 (2002)

    Article  Google Scholar 

  6. Frahm, J.-M., et al.: Building rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_27

    Chapter  Google Scholar 

  7. Heinly, J., Schonberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the yahoo 100 million image dataset). In: CVPR (2015)

    Google Scholar 

  8. Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 41(6), 797–819 (2011)

    Article  Google Scholar 

  9. Klingner, B., Martin, D., Roseborough, J.: Street view motion-from-structure-from-motion. In: ICCV (2013)

    Google Scholar 

  10. Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M.: Modeling and recognition of landmark image collections using iconic scene graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88682-2_33

    Chapter  Google Scholar 

  11. Lou, Y., Snavely, N., Gehrke, J.: MatchMiner: efficient spanning structure mining in large image collections. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 45–58. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_4

    Chapter  Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  13. Meeker, M.: Internet trends (2016). http://www.kpcb.com/internet-trends

  14. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)

    Google Scholar 

  15. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  16. Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994)

    Google Scholar 

  17. Smith, C.: By the numbers: 135 amazing youtube statistics. http://expandedramblings.com/index.php/youtube-statistics/

  18. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM TOG (2006)

    Google Scholar 

  19. Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. IJCV 80, 189–210 (2008)

    Article  Google Scholar 

  20. Tompkin, J., Kim, K.I., Kautz, J., Theobalt, C.: Videoscapes: exploring sparse, unstructured video collections. In: ACM TOG (2012)

    Google Scholar 

  21. Zach, C., Gallup, D., Frahm, J.M.: Fast gain-adaptive KLT tracking on the GPU. In: CVPR Workshops (2008)

    Google Scholar 

  22. Zheng, E., Wang, K., Dunn, E., Frahm, J.-M.: Joint object class sequencing and trajectory triangulation (JOST). In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 599–614. Springer, Cham (2014). doi:10.1007/978-3-319-10584-0_39

    Google Scholar 

Download references

Acknowledgement

Supported in part by the NSF No. IIS-1349074, No. CNS-1405847. Partially funded by MITRE Corp.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 9781 KB)

Supplementary material 1 (pdf 597 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wang, K., Dunn, E., Rodriguez, M., Frahm, JM. (2017). Bringing 3D Models Together: Mining Video Liaisons in Crowdsourced Reconstructions. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10114. Springer, Cham. https://doi.org/10.1007/978-3-319-54190-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54190-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54189-1

  • Online ISBN: 978-3-319-54190-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics