Skip to main content
Log in

Recognizing text in raster maps

GeoInformatica Aims and scope Submit manuscript

Abstract

Text labels in maps provide valuable geographic information by associating place names with locations. This information from historical maps is especially important since historical maps are very often the only source of past information about the earth. Recognizing the text labels is challenging because heterogeneous raster maps have varying image quality and complex map contents. In addition, the labels within a map do not follow a fixed orientation and can have various font types and sizes. Previous approaches typically handle a specific type of map or require intensive manual work. This paper presents a general approach that requires a small amount of user effort to semi-automatically recognize text labels in heterogeneous raster maps. Our approach exploits a few examples of text areas to extract text pixels and employs cartographic labeling principles to locate individual text labels. Each text label is then rotated automatically to horizontal and processed by conventional OCR software for character recognition. We compared our approach to a state-of-art commercial OCR product using 15 raster maps from 10 sources. Our evaluation shows that our approach enabled the commercial OCR product to handle raster maps and together produced significant higher text recognition accuracy than using the commercial OCR alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Notes

  1. The information for obtaining the test maps can be found on: http://www.isi.edu/integration/data/maps/prj_map_extract_data.html

References

  1. Adam S, Ogier J, Cariou C, Mullot R, Labiche J, Gardes J (2000) Symbol and character recognition: application to engineering drawings. Int J Doc Anal Recog 3(2):89–101

    Article  Google Scholar 

  2. Cao R, Tan CL (2002) Text/graphics separation in maps. In: Proceedings of the 4th IAPR international workshop on graphics recognition, pp 167–177

  3. Chen C-C, Knoblock CA, Shahabi C (2008) Automatically and accurately conflating raster maps with orthoimagery. GeoInformatica 12(3):377–410

    Article  Google Scholar 

  4. Chen L-H, Wang J-Y (1997) A system for extracting and recognizing numeral strings on maps. In: Proceedings of the 4th international conference on document analysis and recognition, vol 1, pp 337–341

  5. Chiang Y-Y, Knoblock CA, Shahabi C, Chen C-C (2009) Accurate and automatic extraction of road intersections from raster maps. GeoInformatica 13(2):121–157

    Article  Google Scholar 

  6. Chiang Y-Y, Knoblock CA (2010) An approach for recognizing text labels in raster maps. In: Proceedings of the 20th international conference on pattern recognition, pp 3199–3202

  7. Chiang Y-Y, Knoblock CA (2011) Recognition of multi-oriented, multi-sized, and curved text. In: Proceedings of the 11th international conference of document analysis and recognition, pp 1399–1403

  8. Chiang Y-Y, Knoblock CA (2013) A general approach for extracting road vector data from raster maps. Int J Doc Anal Recog 16(1):55–81

    Article  Google Scholar 

  9. Chiang Y-Y, Knoblock CA (2012) Generating named road vector data from raster maps. Geographic information science, lecture notes in computer science, vol 7478/2012, pp 57–71

  10. Deseilligny MP, Mena HL, Stamonb G (1995) Character string recognition on maps, a rotation-invariant recognition method. Pattern Recog Lett 16(12):1297–1310

    Article  Google Scholar 

  11. Edmondson S, Christensen J, Marks J, Shieber SM (1996) A general cartographic labelling algorithm. Cartographica Int J Geogr Inf Geovisualization 33(4):13–24

    Article  Google Scholar 

  12. Fletcher LA, Kasturi R (1988) A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans Pattern Anal Mach Intell 10(6):910–918

    Article  Google Scholar 

  13. Gelbukh A, Levachkine S, Han S-Y (2004) Resolving ambiguities in toponym recognition in cartographic maps. In: Proceedings of the 5th IAPR international workshop on graphics recognition, pp 104–112

  14. Goto H, Aso H (1998) Extracting curved text lines using local linearity of the text line. Int J Doc Anal Recognit 2(2–3):111–119

    Google Scholar 

  15. Kanai J, Rice SV, Nartker TA, Nagy G (1995) Automated evaluation of OCR zoning. IEEE Trans Pattern Anal Mach Intell 17(1):86–90

    Article  Google Scholar 

  16. Leyk S, Boesch R (2010) Colors of the past: color image segmentation in historical topographic maps based on homogeneity. GeoInformatica 14(1):1–21

    Article  Google Scholar 

  17. Li L, Nagy G, Samal A, Seth SC, Xu Y (2000) Integrated text and line-art extraction from a topographic map. Int J Doc Anal Recog 2(4):177–185

    Article  Google Scholar 

  18. Li Y, Sun J, Tang C-K, Shum H-Y (2004) Lazy snapping. ACM Trans Graph 23(3):303–308

    Article  Google Scholar 

  19. Mao S, Rosenfeld A, Kanungo T (2003) Document structure analysis algorithms: a literature survey. In: Proceedings of the SPIE conference on document recognition and retrieval X, vol 5010, pp 197–207

  20. Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058. doi:10.1109/5.156468

  21. Myers GK, Mulgaonkar PG, Chen C-H, DeCurtins JL, Chen E (1996) Verification-based approach for automated text and feature extraction from raster-scanned maps. In: Lecture notes in computer science, vol 1072. Springer, pp 190–203

  22. Nagy G, Samal A, Seth S, Fisher T, Guthmann E, Kalafala K, Li L, Sivasubramaniam S, Xu Y (1997) Reading street names from maps - technical challenges. In: GIS/LIS conference, pp 89–97

  23. Nagy GL, Nartker TA, Rice SV (2000) Optical character recognition: An illustrated guide to the frontier. In: Proceedings of the SPIE international symposium on electronic imaging science and technology, vol 3967, pp 58–69

  24. Najman L (2004) Using mathematical morphology for document skew estimation. In: Proceedings of the SPIE conference on document recognition and retrieval IX, pp 182–191

  25. Pal U, Sinha S, Chaudhuri BB (2003) Multi-oriented english text line identification. In: Proceedings of the 13th scandinavian conference on image analysis, pp 1146–1153

  26. Pouderoux J, Gonzato JC, Pereira A, Guitton P (2007) Toponym recognition in scanned color topographic maps. In: Proceedings of the 9th international conference on document analysis and recognition, vol 1, pp 531–535

  27. Rother C, Kolmogorov V, Blake A (2004) GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314

    Article  Google Scholar 

  28. Roy PP, Pal U, Lladós J, Kimura F (2008) Multi-oriented english text line extraction using background and foreground information. In: The eighth IAPR international workshop on document analysis systems, DAS ’08, pp 315–322. doi:10.1109/DAS.2008.83

  29. Roy PP, Pal U, Lladós J, Delalandre M (2009) Multi-oriented and multi-sized touching character segmentation using dynamic programming. In: Proceedings of the 10th international conference on document analysis and recognition, pp 11–15

  30. Velázquez A, Levachkine S (2004) Text/graphics separation and recognition in raster-scanned color cartographic maps. In: Lladós J, Kwon Y-B (eds) Graphics recognition of lecture notes in computer science, vol 3088. Springer, pp 63–74

  31. Wong KY, Wahl FM (1982) Document analysis system. IBM J Res Dev 26:647–656

    Article  Google Scholar 

Download references

Acknowledgment

This research is based upon work supported in part by the University of Southern California under the Viterbi School of Engineering Doctoral Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao-Yi Chiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiang, YY., Knoblock, C.A. Recognizing text in raster maps. Geoinformatica 19, 1–27 (2015). https://doi.org/10.1007/s10707-014-0203-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-014-0203-9

Keywords

Navigation