Abstract
Text labels in maps provide valuable geographic information by associating place names with locations. This information from historical maps is especially important since historical maps are very often the only source of past information about the earth. Recognizing the text labels is challenging because heterogeneous raster maps have varying image quality and complex map contents. In addition, the labels within a map do not follow a fixed orientation and can have various font types and sizes. Previous approaches typically handle a specific type of map or require intensive manual work. This paper presents a general approach that requires a small amount of user effort to semi-automatically recognize text labels in heterogeneous raster maps. Our approach exploits a few examples of text areas to extract text pixels and employs cartographic labeling principles to locate individual text labels. Each text label is then rotated automatically to horizontal and processed by conventional OCR software for character recognition. We compared our approach to a state-of-art commercial OCR product using 15 raster maps from 10 sources. Our evaluation shows that our approach enabled the commercial OCR product to handle raster maps and together produced significant higher text recognition accuracy than using the commercial OCR alone.
Notes
The information for obtaining the test maps can be found on: http://www.isi.edu/integration/data/maps/prj_map_extract_data.html
References
Adam S, Ogier J, Cariou C, Mullot R, Labiche J, Gardes J (2000) Symbol and character recognition: application to engineering drawings. Int J Doc Anal Recog 3(2):89–101
Cao R, Tan CL (2002) Text/graphics separation in maps. In: Proceedings of the 4th IAPR international workshop on graphics recognition, pp 167–177
Chen C-C, Knoblock CA, Shahabi C (2008) Automatically and accurately conflating raster maps with orthoimagery. GeoInformatica 12(3):377–410
Chen L-H, Wang J-Y (1997) A system for extracting and recognizing numeral strings on maps. In: Proceedings of the 4th international conference on document analysis and recognition, vol 1, pp 337–341
Chiang Y-Y, Knoblock CA, Shahabi C, Chen C-C (2009) Accurate and automatic extraction of road intersections from raster maps. GeoInformatica 13(2):121–157
Chiang Y-Y, Knoblock CA (2010) An approach for recognizing text labels in raster maps. In: Proceedings of the 20th international conference on pattern recognition, pp 3199–3202
Chiang Y-Y, Knoblock CA (2011) Recognition of multi-oriented, multi-sized, and curved text. In: Proceedings of the 11th international conference of document analysis and recognition, pp 1399–1403
Chiang Y-Y, Knoblock CA (2013) A general approach for extracting road vector data from raster maps. Int J Doc Anal Recog 16(1):55–81
Chiang Y-Y, Knoblock CA (2012) Generating named road vector data from raster maps. Geographic information science, lecture notes in computer science, vol 7478/2012, pp 57–71
Deseilligny MP, Mena HL, Stamonb G (1995) Character string recognition on maps, a rotation-invariant recognition method. Pattern Recog Lett 16(12):1297–1310
Edmondson S, Christensen J, Marks J, Shieber SM (1996) A general cartographic labelling algorithm. Cartographica Int J Geogr Inf Geovisualization 33(4):13–24
Fletcher LA, Kasturi R (1988) A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans Pattern Anal Mach Intell 10(6):910–918
Gelbukh A, Levachkine S, Han S-Y (2004) Resolving ambiguities in toponym recognition in cartographic maps. In: Proceedings of the 5th IAPR international workshop on graphics recognition, pp 104–112
Goto H, Aso H (1998) Extracting curved text lines using local linearity of the text line. Int J Doc Anal Recognit 2(2–3):111–119
Kanai J, Rice SV, Nartker TA, Nagy G (1995) Automated evaluation of OCR zoning. IEEE Trans Pattern Anal Mach Intell 17(1):86–90
Leyk S, Boesch R (2010) Colors of the past: color image segmentation in historical topographic maps based on homogeneity. GeoInformatica 14(1):1–21
Li L, Nagy G, Samal A, Seth SC, Xu Y (2000) Integrated text and line-art extraction from a topographic map. Int J Doc Anal Recog 2(4):177–185
Li Y, Sun J, Tang C-K, Shum H-Y (2004) Lazy snapping. ACM Trans Graph 23(3):303–308
Mao S, Rosenfeld A, Kanungo T (2003) Document structure analysis algorithms: a literature survey. In: Proceedings of the SPIE conference on document recognition and retrieval X, vol 5010, pp 197–207
Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058. doi:10.1109/5.156468
Myers GK, Mulgaonkar PG, Chen C-H, DeCurtins JL, Chen E (1996) Verification-based approach for automated text and feature extraction from raster-scanned maps. In: Lecture notes in computer science, vol 1072. Springer, pp 190–203
Nagy G, Samal A, Seth S, Fisher T, Guthmann E, Kalafala K, Li L, Sivasubramaniam S, Xu Y (1997) Reading street names from maps - technical challenges. In: GIS/LIS conference, pp 89–97
Nagy GL, Nartker TA, Rice SV (2000) Optical character recognition: An illustrated guide to the frontier. In: Proceedings of the SPIE international symposium on electronic imaging science and technology, vol 3967, pp 58–69
Najman L (2004) Using mathematical morphology for document skew estimation. In: Proceedings of the SPIE conference on document recognition and retrieval IX, pp 182–191
Pal U, Sinha S, Chaudhuri BB (2003) Multi-oriented english text line identification. In: Proceedings of the 13th scandinavian conference on image analysis, pp 1146–1153
Pouderoux J, Gonzato JC, Pereira A, Guitton P (2007) Toponym recognition in scanned color topographic maps. In: Proceedings of the 9th international conference on document analysis and recognition, vol 1, pp 531–535
Rother C, Kolmogorov V, Blake A (2004) GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314
Roy PP, Pal U, Lladós J, Kimura F (2008) Multi-oriented english text line extraction using background and foreground information. In: The eighth IAPR international workshop on document analysis systems, DAS ’08, pp 315–322. doi:10.1109/DAS.2008.83
Roy PP, Pal U, Lladós J, Delalandre M (2009) Multi-oriented and multi-sized touching character segmentation using dynamic programming. In: Proceedings of the 10th international conference on document analysis and recognition, pp 11–15
Velázquez A, Levachkine S (2004) Text/graphics separation and recognition in raster-scanned color cartographic maps. In: Lladós J, Kwon Y-B (eds) Graphics recognition of lecture notes in computer science, vol 3088. Springer, pp 63–74
Wong KY, Wahl FM (1982) Document analysis system. IBM J Res Dev 26:647–656
Acknowledgment
This research is based upon work supported in part by the University of Southern California under the Viterbi School of Engineering Doctoral Fellowship.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chiang, YY., Knoblock, C.A. Recognizing text in raster maps. Geoinformatica 19, 1–27 (2015). https://doi.org/10.1007/s10707-014-0203-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-014-0203-9