Farsi font recognition based on Sobel–Roberts features

https://doi.org/10.1016/j.patrec.2009.09.002Get rights and content

Abstract

A new approach for the recognition of Farsi fonts is proposed. Font type of individual lines with any font size is recognized based on a new feature. Previous methods proposed for font recognition are mostly based on Gabor filters and recognize font type of a block of text rather than a line or a phrase. Usually all text lines of the same block or paragraph do not have the same font, e.g. titles usually have different fonts. On the other hand although the Gabor filter does this task fairly, but it is very time consuming, so that feature extraction of a texture of size 128 * 128 takes about 178 ms on a 2.4 GHz PC. In this paper we perform font recognition in line level using a new feature based on Sobel and Roberts gradients in 16 directions, called SRF. We break each line of text into several small parts and construct a texture. Then SRF is extracted as texture features for the recognition. This feature requires much less computation and therefore it can be extracted very faster than common textural features like Gabor filter, wavelet transform or momentum features. Our experiments show that it is about 50 times faster than an 8-channel Gabor filter. At the same time, SRF can represent the font characteristics very well, so that we achieved the recognition rate of 94.16% on a dataset of 10 popular Farsi fonts. This is about 14% better than what an 8-channel Gabor filter can perform. If we ignore the errors between very similar fonts, the recognition rate of about 96.5% will be achieved.

Introduction

Today, with advancement in character recognition technology, OCR1 systems are well-known by a lot of peoples. There are several commercial OCR products available for popular languages in the world and everyone who buys a scanner usually receives an OCR product as well. An OCR system consists of several modules containing image acquisition, preprocess, layout analysis, character recognition and document regeneration. Usually, character recognition is the most difficult part of the OCR systems. It is clear that OCR of multi-font documents is more difficult than OCR of single-font documents. Design of an OCR engine which can recognize characters independent of their font types, is not impossible, but certainly it is very difficult and inefficient, because characters take different shapes in different fonts. Here a font recognition module can help the OCR system by recognizing font types used in the document. Font recognition makes advantages for both end-user and system developer. User benefits from seeing a regenerated document very similar to the input document image, and developer can rely on the result of font recognition module to train a special engine for each font and improve the OCR accuracy.

There are two common approaches toward font recognition. Font recognition based on typographical features (Chaudhuri and Garain, 1998, Zramdini and Ingold, 1998, Jeong et al., 2003) and using textural features (Zhu et al., 2001, Ma and Doermann, 2003, Avile´s-Cruz et al., 2005, Rashedi et al., 2007).

Typographical features, like character skews, character weights, space width, projection in upper, center and lower zones of the line (Fig. 1), usually represent font type characteristics very well, but extracting these features requires the document to be clear of noise and scanned at high resolution, e.g. 300 dpi (Zramdini and Ingold, 1998, Avile´s-Cruz et al., 2005, Villegas-Cortez and Avile´s-Cruz, 2005).

The more common features used for font recognition are textural features which can be extracted using Gabor filter, wavelet transform or other techniques. In this case, font recognition is basically performed in a block of text rather than a line or a word. Here a block of text is processed and after removing extra whitespaces a uniform texture is constructed (Fig. 2). Then some textural features are extracted and font of the text block is recognized.

In recent years, a few other works have also been reported which are different in approach (Abuhaiba, 2005, Sun, 2006, Yang et al., 2006, Ben Moussa et al., 2008). However, the field of font recognition is still young and needs more attentions, especially in the case of Farsi language in which there are a very few papers (Borji and Hamidi, 2004, Rashedi et al., 2007). Except (Rashedi et al., 2007) which used correlation coefficients, two other works are based on Gabor filters which in this paper we will compare our work with it. In fact Rashedi et al. (2007) used first, second and third order moments of the input image as described in (Avile´s-Cruz et al. (2005)): Their work has several practical limitations as follows:

  • 1.

    The resolution of input images assumed to be 300 (as in Avile´s-Cruz et al., 2005) or 400 dpi which is too high for font recognition.

  • 2.

    Feature extraction is too slow: they perform several convolutions on the input image with size 512 * 512 which is very time consuming.

  • 3.

    Feature length is too big: 644 features per each image.

  • 4.

    Small and clean dataset: They prepared 50 samples of each class for training and 50 samples for test. Samples are generated simply by printing with laser printer and then scanning.

On the other hand, as they reported themselves, Gabor filter outperforms theirs method (98.7% vs. 95.7%). So here we compare our method only with Gabor filter.

Except typographical-based methods which can recognize font type in line or phrase level, all other works, recognize font type in a block of text, i.e. a paragraph or several number of lines together.

Another important issue in font recognition is computational complexity which directly affects speed of the OCR systems. Except Yang et al. (2006) which has a brief discussion on computational complexity, no one has talked about the speed of his method. Some textural methods like Gabor filter which is frequently used in the literature (Zhu et al., 2001, Allier and Emptoz, 2003, Ma and Doermann, 2003, Rashedi et al., 2007) are very time consuming and using them in an OCR system will decrease the overall speed significantly. For example, we implemented 8-channel Gabor filter in C++, with optimization consideration, but the required time for extracting features from a single block of size 128 * 128 pixels was 178 ms. If we use this feature to recognize font of individual lines, it will take about 5 s for a typical A4 document.

In this paper we propose a font recognition algorithm in line level which uses a gradient based feature. This feature which is a combination of Sobel and Roberts gradients is named SRF2. This feature is extremely fast, so that it can be extracted in 3.78 ms for a texture of size 128 * 128. This is about 50 times faster than 8-channel Gabor filter. Furthermore this feature can represent font characteristics very well so that it can recognize fonts of individual lines with a recognition rate of 94.16%.

We focus on Farsi language which is a right to left script. Some fonts are more popular in Farsi and used frequently in books, journals and official letters. These are Lotus, Mitra, Nazanin, Traffic, Yaghut, Zar, Homa and Titr. Farsi blogs usually use Tahoma font which is a system font and exists in any windows based PC. Some novice computer users and those that do not have Farsi fonts on their computers use Times New Roman which is the default font of windows OS. So we focused on these 10 fonts in this paper (Fig. 6).

Some of these fonts are very similar and the others have completely different nature, therefore if we perform OCR independent of font type, its performance will be decreased. But if we know what font is used in each line of text, the recognition rate will be improved significantly, because we can train a special engine for each font.

The rest of this paper is organized as follows. In Section 2 the process of line detection and normalization will be described. This is to remove the effect of font size on feature extraction. Texture construction from text lines is discussed in Section 3. Section 4 gives a description of the dataset provided to test our algorithm on it. Font recognition procedure is explained in Section 5 and finally conclusion will appear in Section 6.

Section snippets

Text line detection and normalization

To find text lines, we used horizontal projection. In this way, every peak in projection corresponds to a text line (Fig. 3). Here we assumed that the document has been deskewed and text lines are fairly horizontal.

Having text lines detected, they should be normalized. If font size and image resolution is fixed, normalization is not required, but usually in a typical Farsi document, font size may vary from 9 to 20 points and image resolution also may be variable. To diminish the effect of these

Texture construction

To construct a texture from normalized lines, we should remove large whitespaces between words of each line. This is performed using vertical projection. Then we break the input line into several parts of 128 pixels width. These line-segments are placed into a 128 * 128 texture bitmap, sequentially from top to bottom. If we reach the end of the line, but the window is not completely filled, we repeat the algorithm starting from the beginning of the line. Fig. 5 shows some lines of different fonts

Dataset description

Any recognition algorithm requires a dataset for its evaluation. Unfortunately there were no public dataset of Farsi document images. So we decided to provide such dataset to test our algorithm on it. To construct a useful dataset of Farsi font samples, we first gathered several Farsi documents. Then we reformat them to have almost uniform distribution of all 10 fonts in both regular and bold styles, with font sizes of 11–16 points. About 500 document pages were provided. Then we print them with

Gradient features for font recognition

Having some experiences on gradient features (Khosravi and Kabir, 2007) for character recognition, we developed a gradient based feature for font recognition. This feature can be extracted for both binary and grayscale images. Our experiments showed that better results can be achieved when dealing with grayscale images. Generally it is due to the fact that usually noise and broken characters are introduced during binarization. This feature has low complexity and therefore is very fast. It can

Conclusion

In this paper we proposed a new feature for the recognition of Farsi fonts. This feature is based on image gradients in 16 directions using Roberts and Sobel operators. As our experiments showed these operators extract the textural information of Farsi fonts very well. On the other hand since these two operators are naturally different, i.e. one is horizontal/vertical and the other is diagonal, we expected their combination to improve the font recognition rate significantly. Therefore we

References (18)

  • C. Avile´s-Cruz

    High-order statistical texture analysis – font recognition applied

    Pattern Recognition Lett.

    (2005)
  • M. Shi

    Handwritten numeral recognition using gradient and curvature of grayscale image

    Pattern Recognition

    (2002)
  • Z. Yang

    An EMD-based recognition method for Chinese fonts and styles

    Pattern Recognition Lett.

    (2006)
  • I.S.I. Abuhaiba

    Arabic font recognition using decision trees built from common words

    J. Comput. Inform. Technol.

    (2005)
  • Allier, B., Emptoz, H., 2003. Font type extraction and character prototyping using Gabor filters. In: Proc. Internat....
  • Ben Moussa, S., Charfi, M., Alimi, A.M., 2008. Neural and fractal-based Arabic fonts recognition system. In: ACS/IEEE...
  • A. Borji et al.

    Support vector machine for Persian font recognition

    Internat. J. Intell. Technol.

    (2004)
  • Chaudhuri, B.B., Garain, U., 1998. Automatic detection of italic, bold and all-capital words in document images. In:...
  • J.G. Daugman

    Uncertainty relations for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters

    J. Opt. Soc. Am.

    (1985)
There are more references available in the full text version of this article.

Cited by (0)

This work was supported in part by Iran Telecommunication Research Center, ITRC, under Contract Number T-500-20590.

View full text