Abstract
This chapter describes an application using established classification techniques for performing simple multimodal location estimation experiments. It demonstrates the use of Gaussian Mixture Model (GMM)—and language model-based approaches for verifying the cities from which Flickr videos are taken based on the videos’ audio and textual metadata. The methods used in most of the approaches are described in detail, allowing people with no background in location estimation to perform simple experiments. The city-verification results for the approaches are not eye-popping by any means, but are above-random and present opportunities for future work in the development of better approaches. The techniques may also be suitable for class projects, for students who wish to gain hands-on experience in performing location estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
H. Lei, J. Choi, G. Friedland, Multimodal city-verification on Flickr videos using acoustic and textual features, in Proceedings of ICASSP, Kyoto, Japan, (2012)
D.A. Reynolds, T.F. Quatieri, R. Dunn, Speaker Verification using Adapted Gaussian Mixture Models. Digit. Signal Process. 10, 19–41 (2000)
S. Davis, P. Mermelstein, Comparison of Parametric Representations of Monosyllabic Word Recognition in Continuously Spoken Sentences, in Proceedings of ICASSP (1980)
D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, in IEEE Transactions on Speech and Audio Process, vol. 3, pp. 72–83 (1995)
J. Pelecanos, S. Sridharan, Feature Warping for Robust Speaker Verification, in Speaker Odyssey: The Speaker Recognition Workshop, Crete, Greece, (2001)
W. Campbell, D. Sturim, D. Reynolds, Support Vector Machines using GMM Supervectors for Speaker Verification. IEEE Signal Process. Lett. 13, 308–311 (2006)
J.F. Bonastre, F. Wils, S. Meignier, ALIZE, a free Toolkit for Speaker Recognition, in ICASSP, vol. 1, pp. 737–740 (2005)
HMM Toolkit (HTK), http://htk.eng.cam.ac.uk
T. Joachims, Making Large Scale SVM Learning Practical, in Advances in Kernel Methods—Support Vector Learning, ed. by B. Schoelkopf, C. Burges, A. Smola (MIT-press, Cambridge, 1999)
A. Stolcke, SRILM—An Extensible Language Modeling Toolkit in Proceedings of the International Conference Spoken Language Processing, Denver, Colorado, (2002)
G. Schindler, M. Brown, R. Szeliski, City-scale Location Recognition, in IEEE Conference on Computer Vision and Pattern Recognition (2007)
W. Zhang, J. Kosecka, Image based Localization in Urban Environments in 3rd International Symposium on 3D Data Processing, Visualization, and Transmission (2006)
J. Hays, A. Efros, IM2GPS: Estimating Geographic Information from a Single Image, in IEEE Conference on Computer Vision and Pattern Recognition (2008)
N. Jacobs, S. Satkin, N. Roman, R. Speyer, R. Pless, Geolocation Static Cameras, in IEEE International Conference on Computer Vision (2007)
A. Rae, V. Murdock, P. Serdyukov, P. Kelm, Working Notes for the Placing Task at MediaEval 2011, in Proceedings of MediaEval (2011)
P. Kelm, S. Schmiedeke, J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran, T. Sikora, A Novel Fusion Method for Integrating Multiple Modalities and Knowledge for Multimodal Location Estimation, in GeoMM’13, Barcelona, Spain, (2013)
J. Choi, H. Lei, V. Ekambaram, P. Kelm, L. Gottlieb, T. Sikora, K. Ramchandran, G. Friedland, Human vs Machine: Establishing a Human Baseline for Multimodal Location Estimation, in ACM SIGMM International Conference on Multimedia (2013)
P. Ipeirotis, Analyzing the Amazon Mechanical Turk Marketplace, in ACM XRDS (Crossroads), vol. 17, No. 2, (2010)
MediaEval Web Site, http://www.multimediaeval.org
R.P. Lippmann, L.C. Kukolich, E. Singer, LNKnet: Neural Network, Machine Learning, and Statistical Software for Pattern Classification. Linc. Lab. J. 6, 249–268 (1993)
Acknowledgments
The experiments described in this work were supported by NGA NURI grant number HM11582-10-1-0008, NSF EAGER grant IIS-1138599, and NSF Award CNS-1065240. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Lei, H., Choi, J., Friedland, G. (2015). Application of Large-Scale Classification Techniques for Simple Location Estimation Experiments. In: Choi, J., Friedland, G. (eds) Multimodal Location Estimation of Videos and Images. Springer, Cham. https://doi.org/10.1007/978-3-319-09861-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-09861-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09860-9
Online ISBN: 978-3-319-09861-6
eBook Packages: EngineeringEngineering (R0)