Skip to main content

Application of Large-Scale Classification Techniques for Simple Location Estimation Experiments

  • Chapter
  • First Online:
Multimodal Location Estimation of Videos and Images

Abstract

This chapter describes an application using established classification techniques for performing simple multimodal location estimation experiments. It demonstrates the use of Gaussian Mixture Model (GMM)—and language model-based approaches for verifying the cities from which Flickr videos are taken based on the videos’ audio and textual metadata. The methods used in most of the approaches are described in detail, allowing people with no background in location estimation to perform simple experiments. The city-verification results for the approaches are not eye-popping by any means, but are above-random and present opportunities for future work in the development of better approaches. The techniques may also be suitable for class projects, for students who wish to gain hands-on experience in performing location estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. H. Lei, J. Choi, G. Friedland, Multimodal city-verification on Flickr videos using acoustic and textual features, in Proceedings of ICASSP, Kyoto, Japan, (2012)

    Google Scholar 

  2. D.A. Reynolds, T.F. Quatieri, R. Dunn, Speaker Verification using Adapted Gaussian Mixture Models. Digit. Signal Process. 10, 19–41 (2000)

    Article  Google Scholar 

  3. S. Davis, P. Mermelstein, Comparison of Parametric Representations of Monosyllabic Word Recognition in Continuously Spoken Sentences, in Proceedings of ICASSP (1980)

    Google Scholar 

  4. D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, in IEEE Transactions on Speech and Audio Process, vol. 3, pp. 72–83 (1995)

    Google Scholar 

  5. J. Pelecanos, S. Sridharan, Feature Warping for Robust Speaker Verification, in Speaker Odyssey: The Speaker Recognition Workshop, Crete, Greece, (2001)

    Google Scholar 

  6. W. Campbell, D. Sturim, D. Reynolds, Support Vector Machines using GMM Supervectors for Speaker Verification. IEEE Signal Process. Lett. 13, 308–311 (2006)

    Article  Google Scholar 

  7. J.F. Bonastre, F. Wils, S. Meignier, ALIZE, a free Toolkit for Speaker Recognition, in ICASSP, vol. 1, pp. 737–740 (2005)

    Google Scholar 

  8. HMM Toolkit (HTK), http://htk.eng.cam.ac.uk

  9. T. Joachims, Making Large Scale SVM Learning Practical, in Advances in Kernel Methods—Support Vector Learning, ed. by B. Schoelkopf, C. Burges, A. Smola (MIT-press, Cambridge, 1999)

    Google Scholar 

  10. A. Stolcke, SRILM—An Extensible Language Modeling Toolkit in Proceedings of the International Conference Spoken Language Processing, Denver, Colorado, (2002)

    Google Scholar 

  11. G. Schindler, M. Brown, R. Szeliski, City-scale Location Recognition, in IEEE Conference on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  12. W. Zhang, J. Kosecka, Image based Localization in Urban Environments in 3rd International Symposium on 3D Data Processing, Visualization, and Transmission (2006)

    Google Scholar 

  13. J. Hays, A. Efros, IM2GPS: Estimating Geographic Information from a Single Image, in IEEE Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  14. N. Jacobs, S. Satkin, N. Roman, R. Speyer, R. Pless, Geolocation Static Cameras, in IEEE International Conference on Computer Vision (2007)

    Google Scholar 

  15. A. Rae, V. Murdock, P. Serdyukov, P. Kelm, Working Notes for the Placing Task at MediaEval 2011, in Proceedings of MediaEval (2011)

    Google Scholar 

  16. P. Kelm, S. Schmiedeke, J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran, T. Sikora, A Novel Fusion Method for Integrating Multiple Modalities and Knowledge for Multimodal Location Estimation, in GeoMM’13, Barcelona, Spain, (2013)

    Google Scholar 

  17. J. Choi, H. Lei, V. Ekambaram, P. Kelm, L. Gottlieb, T. Sikora, K. Ramchandran, G. Friedland, Human vs Machine: Establishing a Human Baseline for Multimodal Location Estimation, in ACM SIGMM International Conference on Multimedia (2013)

    Google Scholar 

  18. P. Ipeirotis, Analyzing the Amazon Mechanical Turk Marketplace, in ACM XRDS (Crossroads), vol. 17, No. 2, (2010)

    Google Scholar 

  19. MediaEval Web Site, http://www.multimediaeval.org

  20. R.P. Lippmann, L.C. Kukolich, E. Singer, LNKnet: Neural Network, Machine Learning, and Statistical Software for Pattern Classification. Linc. Lab. J. 6, 249–268 (1993)

    Google Scholar 

Download references

Acknowledgments

The experiments described in this work were supported by NGA NURI grant number HM11582-10-1-0008, NSF EAGER grant IIS-1138599, and NSF Award CNS-1065240. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Howard Lei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Lei, H., Choi, J., Friedland, G. (2015). Application of Large-Scale Classification Techniques for Simple Location Estimation Experiments. In: Choi, J., Friedland, G. (eds) Multimodal Location Estimation of Videos and Images. Springer, Cham. https://doi.org/10.1007/978-3-319-09861-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09861-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09860-9

  • Online ISBN: 978-3-319-09861-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics