Graphic-based character grouping in topographic maps
Introduction
Text recognition is active in both academic researches and commercial software development; yet effective text recognition techniques have been widely used. In classic text recognition systems, most methods aim to extract and recognize the single character, which fail to express the meaning of the whole words [1]. Especially in topographic maps, only the entire text strings can express the accurate meaning of the geographic elements. Besides, recognizing the grouped characters can take advantage of the word contexts. Thus, character grouping is helpful for text understanding and text recognition.
At present, there are several works about character grouping. Most texts are on horizontal or straight text lines, and their background is simple, so some morphological operations [2] or clustering methods [3], [4] can achieve these tasks. These previous methods apply to the homogeneous texts or some others in specific cases, such as straight text lines, multi-oriented but similar-sized characters. However, in some other images, especially for the texts in topographic maps, there exists various complex distribution of texts, the character color and size diversity, the distance and directions of different strings discrepancy, as well as lots of non-texts remained after text extraction. For example, the map shown in Fig. 1 contains multi-oriented, multi-sized, and curved texts. All these facts adversely affect character grouping, and it is possible for most previous methods that some characters are mistakenly grouped into other text strings or leaved out, typically when some text strings are on curved lines.
In order to address the texts with complex distribution, Chiang has done much work for text processing in topographic maps [2]. He proposed a conditional dilation algorithm for character grouping [1] in 2011, which can deal with the multi-oriented, curved and straight text lines of multi-sized characters. But it still has some problems, for example, it cannot deal with the beginning or the end characters of the two adjacent texts due to the disadvantages of the string curvature condition, and the color information is not considered. Besides, Chiang said that his method and other previous methods typically handle characters with narrow spacing, but the text strings containing wide spacing characters can not be identified correctly [1], as the string “Hindu Kush” shown in Fig. 2. However, there often are characters with wide spacing in most topographic maps, so we need to handle these texts by some new methods.
To solve the problems mentioned above, various character features such as character distribution, color, size, and orientation in topographic maps are analyzed carefully combined with the relative merits of the previous methods. And inspired by the ideas of the construction and the simplification of undirected graphs, we present a novel character grouping method by introducing the graph model into the grouping process, which is named as graphic-based character grouping (GCG). GCG can deal with the characters in various complex cases effectively, especially for the characters with significant wide spacing.
Section snippets
Related work
Text recognition of the non-homogeneous texts from topographic maps is a difficult task, and hence much of the previous research only works on specific cases. In 1988, Fletcher and Kasturi [5] presented a text string separation algorithm, and they used Hough transform to group components into logical character strings. This method is robust to the changes in text font style, size, and orientation, but the Hough transformation detects straight lines only, so their method cannot be applied on
The feasibility analysis of introducing graph model
Graph model composed of nodes and edges is a kind of topological graph to describe the relationships between things. When it needs to solve problems, we take advantage of the properties of graphs to construct a respective model. And this thought provides efficient approaches to research various things especially the complex systems made of things. In the graph model, each node contains respective properties and each edge is given weights to generate weighted graphs. Various weight values can
The overview of our character grouping
According to aforementioned analysis, character grouping can be transformed into the construction and simplification of the undirected graph. Therefore, this algorithm mainly contains two stages: ① An undirected graph is constructed on the basis of the properties of nodes. Here, each node represents a single character, whose properties are the primary color features and the size of these characters. ② The constructed graph is simplified according to the weights of the edges. Here, each edge
Experiment and analysis
In this section, several experiments are done to report the efficiency of GCG using five topographic maps. As shown in Fig. 5, the characters have different features and complex distribution in the maps, which are mainly shown in the following aspects. ① The characters have different colors and the strings are in different orientations, as shown in Fig. 5(a) and (b). ② The character size is different, the spacing between the strings is changeable, and some strings are curved, as shown in Fig. 5
Conclusion
This paper presents a character grouping method based on graph model. The characters in topographic maps are regarded as the nodes in undirected graphs, of which the properties are the character color and size. The relationships between characters are regarded as the edges in undirected graphs, of which the weights are the distances between nodes and the angles between edges. And the character grouping is accomplished by constructing and simplifying the undirected graphs. This method can
Acknowledgments
The work was jointly supported by the National Natural Science Foundations of China under Grant no. 61472302, 61272280, U1404620, 41271447, 61373177, 61502387, 61501372 and 61272195; The Program for New Century Excellent Talents in University under Grant no. NCET-12-0919; The Fundamental Research Funds for the Central Universities under Grant no. K5051203020, K50513100006, K5051303018, JB150313, and BDY081422; Natural Science Foundation of Shaanxi Province, under Grant no. 2014JM8310; The
Pengfei Xu: He is lecturer at Information Science and Technology School, Northwest University in China. His main research interests include: image processing and pattern recognition.
References (23)
- Y.Y. Chiang, C.A. Knoblock,Recognition of multi-oriented, multi-sized, and curved text. 2011 International Conference...
- et al.
Harvesting Geographic Features From Heterogeneous Raster Maps. Ph.D. thesis
(2010) - et al.
Clustering-guided sparse structural learning for unsupervised feature selection
IEEE Trans. Knowl. Data Eng.
(2014) - et al.
Robust structured subspace learning for data representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2015) - et al.
A robust algorithm for text string separation from mixed text/graphics images
IEEE Trans. Pattern Anal. Mach. Intell.
(1988) - L. Li, G. Nagy, A. Samal, et al., Cooperative text and line-art extraction from a topographic map, in: Proceedings of...
- et al.
Integrated text and line-art extraction from a topographic map
Int. J. Doc. Anal. Recognit.
(2000) Detection of text regions from digital engineering drawings
IEEE Trans. Pattern Anal. Mach. Intell.
(1998)- M. Caprioli, P. Gamba Detecting and grouping words in topographic maps by means of perceptual concepts, European Signal...
- et al.
Extracting curved text lines using local linearity of the text line
Int. J. Doc. Anal. Recognit.
(1999)
Cited by (5)
A review of recent advances in scanned topographic map processing
2019, NeurocomputingCitation Excerpt :It solved the challenges caused by multi-oriented, curved and multi-sized characters. After that, the method based on graph model, which was their previous work [94], was employed for character grouping. In this grouping method, characters were regarded as nodes therefore the grouping method became constructing and simplifying of the undirected graph.
Automatic vectorization of fluvial corridor features on historical maps to assess riverscape changes
2022, Cartography and Geographic Information ScienceAligning geographic entities from historical maps for building knowledge graphs
2021, International Journal of Geographical Information ScienceNew tools for the classification and filtering of historical maps
2019, ISPRS International Journal of Geo-Information
Pengfei Xu: He is lecturer at Information Science and Technology School, Northwest University in China. His main research interests include: image processing and pattern recognition.
Qiguang Miao Professor at School of Computer Science and Technology, Xidian University. His research interests include: image processing, and multiscale geometric representations for image.
Tian׳ge Liu He is currently pursuing the Doctor degree in Computer Application Technology at xidian University in China. His main research interests include: the Intelligent image processing.
Xiaojiang Chen Professor at Information Science and Technology School, Northwest University in China. His research interests include: image processing and mobile internet.
Weike Nie received the B.S. degree, the M.S degree and the Ph.D degree from the Xidian University. He is now an associate professor with the Northwest University. His research interesting including array signal processing and wireless sensor network.
- 1
Tel.: +86 18792549398.
- 2
Tel.: +86 13474009678; fax: +86 29 88202427.
- 3
Tel.: +86 29 88308273.
- 4
Tel.: +86 138 9193 3536.