Skip to main content
Log in

Abstract

Maps are one of the most valuable documents for gathering geospatial information about a region. Yet, finding a collection of diverse, high-quality maps is a significant challenge because there is a dearth of content-specific metadata available to identify them from among other images on the Web. For this reason, it is desirous to analyze the content of each image. The problem is further complicated by the variations between different types of maps, such as street maps and contour maps, and also by the fact that many high-quality maps are embedded within other documents such as PDF reports. In this paper, we present an automatic method to find high-quality maps for a given geographic region. Not only does our method find documents that are maps, but also those that are embedded within other documents. We have developed a Content-Based Image Retrieval (CBIR) approach that uses a new set of features for classification in order to capture the defining characteristics of a map. This approach is able to identify all types of maps irrespective of their subject, scale, and color in a highly scalable and accurate way. Our classifier achieves an F1-measure of 74%, which is an 18% improvement over the previous work in the area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Canny J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–714 (1986)

    Article  Google Scholar 

  2. Chen C.C., Knoblock C.A., Shahabi C.: Automatically conflating road vector data with orthoimagery. GeoInformatica 10(4), 495–530 (2006)

    Article  Google Scholar 

  3. Chen C.C., Knoblock C.A., Shahabi C.: Automatically and accurately conflating raster maps with orthoimagery. GeoInformatica 12(3), 377–410 (2008)

    Article  Google Scholar 

  4. Chiang Y.Y., Knoblock C.A., Shahabi C., Chen C.C.: Accurate and automatic extraction of road intersections from raster maps. Geoinformatica 13(2), 121–157 (2009)

    Article  Google Scholar 

  5. Chiang, Y.Y., Knoblock, C.A: Classification of line and character pixels on raster maps using discrete cosine transformation coeffients and support vector machines. In: Proceedings of the 18th International Conference on Pattern Recognition, pp. 1034–1037 (2006)

  6. Chiang, Y.Y., Knoblock, C.A., Chen, C.C.: Automatic extraction of road intersections from raster maps. In: Proceedings of the 13th ACM International Symposium on Advances in Geographic Information Systems, pp. 267–276 (2005)

  7. Csillaghy A., Hinterberger H., Benz A.O.: Content based image retrieval in astronomy. Inform. Retriev. 3(3), 229–241 (2000)

    Article  MATH  Google Scholar 

  8. Dasarathy, B.V.: Nearest Neighbor (NN) norms: NN pattern classification techniques ISBN 0-8186-8930-7 (1991)

  9. Deng Y., Manjunath B.S., Kenney C., Moore M.S., Shin H.: An efficient color representation for image retrieval. IEEE Trans. Image Process. 10(1), 140–147 (2001)

    Article  MATH  Google Scholar 

  10. Desai, S., Knoblock, C.A., Chiang, Y.Y., Desai, K., Chen, C.C.: Automatically identifying and georeferencing street maps on the web. In: Proceedings of the 2nd International Workshop on Geographic Information Retrieval, pp. 35–38 (2005)

  11. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of IEEE CVPR Workshop on Generative-Model Based Vision (2004)

  12. Fix, E., Hodges, J.L.: Discriminatory analysis, nonparametric discrimination: consistency properties. Technical report 4, USAF School of Aviation Medicine, Randolph Field, TX (1951)

  13. Latecki L.J., Lakamper R.: Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1185–1190 (2000)

    Article  Google Scholar 

  14. Laws, K.: Textured image segmentation. Ph.D. Dissertation, University of Southern California (January 1980)

  15. Lehmann T.M., Guld M.O., Deselaers T., Keysers D., Schubert H., Spitzer K., Ney H., Wein B.B.: Automatic categorization of medical images for content-based retrieval and data mining. Comput. Med. Imag. Graph. 29, 143–155 (2005)

    Article  Google Scholar 

  16. Lux, M., Becker, J., Krottmaier, H.: Caliph emir: semantic annotation and retrieval in personal digital photo libraries. In: Proceedings of CAiSE ’03 Forum at 15th Conference on Advanced Information Systems Engineering, pp. 85–89 (2003)

  17. Michelson, M., Goel, A., Knoblock, C.A.: Identifying Maps on the World Wide Web. In: Proceedings of the 5th International Conference on Geographic Information Science, pp. 249–260 (2008)

  18. Müller H., Michoux N., Bandon D., Geissbuhler A.: A review of content-based image retrieval systems in medical applications. Clinical benefits and future directions. Int. J. Med. Inform. 73, 1–23 (2004)

    Article  Google Scholar 

  19. Tan, Q., Mitra, P., Lee Giles, C.: Effectively searching maps in web documents. In: Proceedings of European Conference on Information Retrieval, pp. 162–176 (2009)

  20. Smeulders A.W.M., Worring M., Santini S., Gupta A., Jain R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1349–1380 (2000)

    Article  Google Scholar 

  21. Tian Q., Sebe N., Lew M.S., Loupias E., Huang T.S.: Image retrieval using wavelet-based salient points. J. Electr. Imag. 10(4), 835–849 (2001)

    Article  Google Scholar 

  22. Vapnik, V.: The nature of statistical learning theory. Springer, Berlin (1995); ISBN 0-387-98780-0

  23. Wang, Z., Chi, Z., Feng, D.: Fuzzy integral for leaf image retrieval. In: Proceedings of IEEE International Conference on Fuzzy Systems, pp. 372–377 (2002)

  24. Zhou, X.S., Rui, Y., Huang, T.S.: Water-filling: a novel way for image structural feature extraction. In: Proceedings of the International Conference on Image Processing, pp. 570–574 (1999)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aman Goel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goel, A., Michelson, M. & Knoblock, C.A. Harvesting maps on the web. IJDAR 14, 349–372 (2011). https://doi.org/10.1007/s10032-010-0136-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-010-0136-2

Keywords

Navigation