Abstract
Geographical Information System (GIS) databases contain information about many objects, such as traffic signals, road signs, fire hydrants, etc. in urban areas. This wealth of information can be utilized for assisting various computer vision tasks. In this paper, we propose a method for improving object detection using a set of priors acquired from GIS databases. Given a database of object locations from GIS and a query image with metadata, we compute the expected spatial location of the visible objects in the image. We also perform object detection in the query image (e.g., using DPM) and obtain a set of candidate bounding boxes for the objects. Then, we fuse the GIS priors with the potential detections to find the final object bounding boxes. To cope with various inaccuracies and practical complications, such as noisy metadata, occlusion, inaccuracies in GIS, and poor candidate detections, we formulate our fusion as a higher-order graph matching problem which we robustly solve using RANSAC. We demonstrate that this approach outperforms well established object detectors, such as DPM, with a large margin.
Furthermore, we propose that the GIS objects can be used as cues for discovering the location where an image was taken. Our hypothesis is based on the idea that the objects visible in one image, along with their relative spatial location, provide distinctive cues for the geo-location. In order to estimate the geo-location based on the generic objects, we perform a search on a dense grid of locations over the covered area. We assign a score to each location based on the similarity of its GIS objects and the imperfect object detections in the image. We demonstrate that over a broad urban area of >10 square kilometers, this semantic approach can significantly narrow down the localization search space, and occasionally, even find the correct location.
Chapter PDF
Similar content being viewed by others
Keywords
- Geographical Information System
- Object Detection
- Query Image
- Graph Match
- Geographical Information System Database
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bansal, M., Sawhney, H.S., Cheng, H., Daniilidis, K.: Geo-localization of street views with aerial image databases. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1125–1128. ACM (2011)
Bioret, N., Moreau, G., Servieres, M.: Towards outdoor localization from gis data and 3D content extracted from videos. In: IEEE International Symposium on Industrial Electronics (ISIE), pp. 3613–3618. IEEE (2010)
Crandall, D., Backstrom, L., Huttenlocher, D., Kleinberg, J.: Mapping the world’s photos. In: International World Wide Web Conference (2009)
Dasiopoulou, S., Mezaris, V., Kompatsiaris, I., Papastathis, V.K., Strintzis, M.: Knowledge-assisted semantic video object detection. IEEE Transactions on Circuits and Systems for Video Technology 15(10), 1210–1224 (2005)
Duchenne, O., Bach, F., Kweon, I.S., Ponce, J.: A tensor-based algorithm for high-order graph matching. Pattern Analysis and Machine Intelligence (PAMI) 33(12), 2383–2395 (2011)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. Pattern Analysis and Machine Intelligence (PAMI) 32(9), 1627–1645 (2010)
Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Discriminatively trained deformable part models, release 5, http://people.cs.uchicago.edu/~rbg/latent-release5/
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: International Conference on Computer Vision (ICCV) (2008)
Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 748–761. Springer, Heidelberg (2010)
Lee, Y.J., Efros, A.A., Hebert, M.: Style-aware mid-level representation for discovering visual connections in space and time. In: International Conference on Computer Vision (ICCV) (2013)
Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012)
Lin, T.Y., Belongie, S., Hays, J.: Cross-view image geolocalization. In: Computer Vision and Pattern Recognition (CVPR) (2013)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: International Journal of Computer Vision (IJCV) (2004)
Matzen, K., Snavely, N.: Nyc3dcars: A dataset of 3D vehicles in geographic context. In: International Conference on Computer Vision (ICCV) (2013)
Park, M., Chen, Y., Shafique, K.: Tag configuration matcher for geo-tagging. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 374–377. ACM (2013)
Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2d-to-3D matching. In: International Conference on Computer Vision (ICCV) (2010)
Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012)
Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: Computer Vision and Pattern Recognition (CVPR) (2007)
Torralba, A.: Contextual priming for object detection. International Journal of Computer Vision (IJCV) 53(2), 169–191 (2003)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: International Conference on Computer Vision (ICCV), pp. 273–280. IEEE (2003)
Uchiyama, H., Saito, H., Servieres, M., Moreau, G., Ecole Centrale de Nantes - CERMA IRSTV: AR GIS on a physical map based on map image retrieval using llah tracking. In: Machine Vision and Application (MVA), pp. 382–385 (2009)
Wang, L., Neumann, U.: A robust approach for automatic registration of aerial images with untextured aerial lidar data. In: Computer Vision and Pattern Recognition (CVPR), pp. 2623–2630 (June 2009)
Zamir, A.R., Ardeshir, S., Shah, M.: GPS-Tag renement using random walks with an adaptive damping factor. In: Computer Vision and Pattern Recognition (CVPR) (2014)
Zamir, A.R., Shah, M.: Accurate image localization based on google maps street view. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 255–268. Springer, Heidelberg (2010)
Zamir, A.R., Shah, M.: Image geo-localization based on multiple nearest neighbor feature matching using generalized graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) (2014)
Zass, R., Shashua, A.: Probabilistic graph and hypergraph matching. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (June 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ardeshir, S., Zamir, A.R., Torroella, A., Shah, M. (2014). GIS-Assisted Object Detection and Geospatial Localization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-10599-4_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)