Abstract
Maps are one of the most valuable documents for gathering geospatial information about a region. Yet, finding a collection of diverse, high-quality maps is a significant challenge because there is a dearth of content-specific metadata available to identify them from among other images on the Web. For this reason, it is desirous to analyze the content of each image. The problem is further complicated by the variations between different types of maps, such as street maps and contour maps, and also by the fact that many high-quality maps are embedded within other documents such as PDF reports. In this paper, we present an automatic method to find high-quality maps for a given geographic region. Not only does our method find documents that are maps, but also those that are embedded within other documents. We have developed a Content-Based Image Retrieval (CBIR) approach that uses a new set of features for classification in order to capture the defining characteristics of a map. This approach is able to identify all types of maps irrespective of their subject, scale, and color in a highly scalable and accurate way. Our classifier achieves an F1-measure of 74%, which is an 18% improvement over the previous work in the area.
Similar content being viewed by others
References
Canny J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–714 (1986)
Chen C.C., Knoblock C.A., Shahabi C.: Automatically conflating road vector data with orthoimagery. GeoInformatica 10(4), 495–530 (2006)
Chen C.C., Knoblock C.A., Shahabi C.: Automatically and accurately conflating raster maps with orthoimagery. GeoInformatica 12(3), 377–410 (2008)
Chiang Y.Y., Knoblock C.A., Shahabi C., Chen C.C.: Accurate and automatic extraction of road intersections from raster maps. Geoinformatica 13(2), 121–157 (2009)
Chiang, Y.Y., Knoblock, C.A: Classification of line and character pixels on raster maps using discrete cosine transformation coeffients and support vector machines. In: Proceedings of the 18th International Conference on Pattern Recognition, pp. 1034–1037 (2006)
Chiang, Y.Y., Knoblock, C.A., Chen, C.C.: Automatic extraction of road intersections from raster maps. In: Proceedings of the 13th ACM International Symposium on Advances in Geographic Information Systems, pp. 267–276 (2005)
Csillaghy A., Hinterberger H., Benz A.O.: Content based image retrieval in astronomy. Inform. Retriev. 3(3), 229–241 (2000)
Dasarathy, B.V.: Nearest Neighbor (NN) norms: NN pattern classification techniques ISBN 0-8186-8930-7 (1991)
Deng Y., Manjunath B.S., Kenney C., Moore M.S., Shin H.: An efficient color representation for image retrieval. IEEE Trans. Image Process. 10(1), 140–147 (2001)
Desai, S., Knoblock, C.A., Chiang, Y.Y., Desai, K., Chen, C.C.: Automatically identifying and georeferencing street maps on the web. In: Proceedings of the 2nd International Workshop on Geographic Information Retrieval, pp. 35–38 (2005)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of IEEE CVPR Workshop on Generative-Model Based Vision (2004)
Fix, E., Hodges, J.L.: Discriminatory analysis, nonparametric discrimination: consistency properties. Technical report 4, USAF School of Aviation Medicine, Randolph Field, TX (1951)
Latecki L.J., Lakamper R.: Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1185–1190 (2000)
Laws, K.: Textured image segmentation. Ph.D. Dissertation, University of Southern California (January 1980)
Lehmann T.M., Guld M.O., Deselaers T., Keysers D., Schubert H., Spitzer K., Ney H., Wein B.B.: Automatic categorization of medical images for content-based retrieval and data mining. Comput. Med. Imag. Graph. 29, 143–155 (2005)
Lux, M., Becker, J., Krottmaier, H.: Caliph emir: semantic annotation and retrieval in personal digital photo libraries. In: Proceedings of CAiSE ’03 Forum at 15th Conference on Advanced Information Systems Engineering, pp. 85–89 (2003)
Michelson, M., Goel, A., Knoblock, C.A.: Identifying Maps on the World Wide Web. In: Proceedings of the 5th International Conference on Geographic Information Science, pp. 249–260 (2008)
Müller H., Michoux N., Bandon D., Geissbuhler A.: A review of content-based image retrieval systems in medical applications. Clinical benefits and future directions. Int. J. Med. Inform. 73, 1–23 (2004)
Tan, Q., Mitra, P., Lee Giles, C.: Effectively searching maps in web documents. In: Proceedings of European Conference on Information Retrieval, pp. 162–176 (2009)
Smeulders A.W.M., Worring M., Santini S., Gupta A., Jain R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1349–1380 (2000)
Tian Q., Sebe N., Lew M.S., Loupias E., Huang T.S.: Image retrieval using wavelet-based salient points. J. Electr. Imag. 10(4), 835–849 (2001)
Vapnik, V.: The nature of statistical learning theory. Springer, Berlin (1995); ISBN 0-387-98780-0
Wang, Z., Chi, Z., Feng, D.: Fuzzy integral for leaf image retrieval. In: Proceedings of IEEE International Conference on Fuzzy Systems, pp. 372–377 (2002)
Zhou, X.S., Rui, Y., Huang, T.S.: Water-filling: a novel way for image structural feature extraction. In: Proceedings of the International Conference on Image Processing, pp. 570–574 (1999)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Goel, A., Michelson, M. & Knoblock, C.A. Harvesting maps on the web. IJDAR 14, 349–372 (2011). https://doi.org/10.1007/s10032-010-0136-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-010-0136-2