Transform invariant text extraction

Zhang, Xin; Lin, Zhouchen; Sun, Fuchun; Ma, Yi

doi:10.1007/s00371-013-0864-7

Transform invariant text extraction

Original Article
Published: 30 August 2013

Volume 30, pages 401–415, (2014)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xin Zhang¹,
Zhouchen Lin²,
Fuchun Sun¹ &
…
Yi Ma³

1467 Accesses
8 Citations
Explore all metrics

Abstract

Automatically extracting texts from natural images is very useful for many applications such as augmented reality. Most of the existing text detection systems require that the texts to be detected (and recognized) in an image are taken from a nearly frontal viewpoint. However, texts in most images taken naturally by a camera or a mobile phone can have a significant affine or perspective deformation, making the existing text detection and the subsequent OCR engines prone to failures. In this paper, based on stroke width transform and texture invariant low-rank transform, we propose a framework that can detect and rectify texts in arbitrary orientations in the image against complex backgrounds, so that the texts can be correctly recognized by common OCR engines. Extensive experiments show the advantage of our method when compared to the state of art text detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review on Text Recognition in Natural Scene Images

Scene text detection and recognition: a survey

Article 11 March 2022

Robust Text Detection in Natural Scene Images

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Although −40^∘, −20^∘, 0^∘, and +20^∘ are more natural choices of rotation angles, as in natural scenes there are rarely texts that are exactly horizontal, from our experience these angles are not as effective for detecting texts in all directions.
Notice that this labeling is somewhat different from the ground truth labeling used in the ICDAR dataset in which each word is labeled separately. The dataset will be released for public evaluation if our paper is accepted for publication.

References

Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)
Google Scholar
Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: transform invariant low-rank textures. In: ACCV (2010)
Google Scholar
Dinh, V.C., Chun, S.S., Cha, S., Ryu, H., Sull, S.: An efficient method for text detection in video based on stroke width similarity. In: ACCV, vol. 1, pp. 200–209 (2007)
Google Scholar
Subramanian, K., Natarajan, P., Decerbo, M., Castañón, D.: Character-stroke detection for text-localization and extraction. In: ICDAR, pp. 33–37 (2007)
Google Scholar
Jung, C., Liu, Q., Kim, J.: A stroke filter and its application to text localization. Phys. Rev. Lett. 30(2), 114–122 (2009)
Google Scholar
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)
Article Google Scholar
Liang, J., Doermann, D.S., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)
Article Google Scholar
Daniilidis, K., Maragos, P., Paragios, N. (eds.): Proceedings of Computer vision—ECCV 2010, 11th European conference on computer vision, Part I, Heraklion, Crete, Greece, September 5–11, 2010. Lecture Notes in Computer Science, vol. 6311. Springer, Berlin (2010)
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR, pp. 440–445 (2011)
Google Scholar
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV, vol. 3, pp. 770–783 (2010)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Google Scholar
Shivakumara, P., Phan, T.Q., Tan, C.L.: New wavelet and color features for text detection in video. In: ICPR, pp. 3996–3999 (2010)
Google Scholar
Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)
Article Google Scholar
Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. IEEE TPAMI 22(4), 385–392 (2000)
Article Google Scholar
Shivakumara, P., Huang, W., Tan, C.L.: Efficient video text detection using edge features. In: ICPR, pp. 1–4 (2008)
Google Scholar
Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE TPAMI 21(11), 1224–1229 (1999)
Article Google Scholar
Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE TPAMI 25(12), 1631–1639 (2003)
Article MathSciNet Google Scholar
Li, H., Doermann, D.S., Kia, O.E.: Automatic text detection and tracking in digital video. IEEE TIP 9(1), 147–156 (2000)
Google Scholar
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, vol. 2, pp. 366–373 (2004)
Google Scholar
Zhang, J., Kasturi, R.: Text detection using edge gradient and graph spectrum. In: ICPR, pp. 3979–3982 (2010)
Google Scholar
Shivakumara, P., Phan, T.Q., Tan, C.L.: Video text detection based on filters and edge features. In: ICME, pp. 514–517 (2009)
Google Scholar
Yi, J., Peng, Y., Xiao, J.: Color-based clustering for text detection and extraction in image. In: ACM Multimedia, pp. 847–850 (2007)
Google Scholar
Liang, J., DeMenthon, D., Doermann, D.: Geometric rectification of camera-captured document images. IEEE TPAMI 30(4), 591–605 (2008)
Article Google Scholar
Pastor, M., Toselli, A., Vidal, E.: Projection profile based algorithm for slant removal. In: Image Analysis and Understanding. Lecture Notes in Computer Science, vol. 3212, pp. 183–190 (2004)
Google Scholar
Singha, C., Bhatiab, N., Kaur, A.: Hough transform based fast skew detection and accurate skew correction methods. Pattern Recogn. 14(12), 3528–3546 (2008)
Article Google Scholar
Yuan, B., Tan, C.L.: Convex hull based skew estimation. Pattern Recogn. 40(2), 456–475 (2007)
Article MATH MathSciNet Google Scholar
Yan, H.: Skew correction of document images using interline cross-correlation. CVGIP, Graph. Models Image Process., 55(6), 538–543 (1993)
Article Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: robust alignment by sparse and low-rank decomposition for linearly correlated images. In: CVPR, pp. 763–770 (2010)
Google Scholar
Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. 2009, uIUC Technical Report UILU-ENG-09-2215. arXiv:1009.5055
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687 (2003)
Google Scholar
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)
Article MathSciNet Google Scholar
Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge 2006 (voc 2006) results
ABBYY Corp.: Abbyy finereader
HP Labs.: “Tesseract-ocr,” 1985–1995. Now available at: http://code.google.com/p/tesseract-ocr/
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Article Google Scholar
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
He, B., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 23, 151–161 (1998)
Article MATH MathSciNet Google Scholar
Cai, J., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MATH MathSciNet Google Scholar
Yin, W., Hale, E., Zhang, Y.: Fixed-point continuation for ℓ ₁-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

X. Zhang and F. Sun are supported by the National Natural Science Foundation of China (NNSFC) under Grant No. 2013CB329403. Z. Lin is supported by the NNSFC under Grant Nos. 61272341, 61231002, and 61121002. Y. Ma is partially supported by the funding of ONR N00014-09-1-0230, NSF CCF 09-64215, NSF IIS 11-16012.

Author information

Authors and Affiliations

Tsinghua University, Haidian District, Beijing, China
Xin Zhang & Fuchun Sun
Peking University, Haidian District, Beijing, China
Zhouchen Lin
Microsoft Research Asia, Beijing, China
Yi Ma

Authors

Xin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhouchen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Fuchun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yi Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Zhang.

Appendix: Detailed derivation of the ADM for multi-component TILT

1.1 A.1 ALM and ADM methods

The alternating direction method (ADM, also called inexact augmented Lagrange multiplier (IALM) method in [29]) is a variant of the augmented Lagrange multiplier (ALM) method. The usual ALM method for solving the following model problem:

$$ \min f(x),\quad\mbox{s.t.}\quad h(x)=0, $$

(9)

where $f:\mathbb{R}^{n}\rightarrow \mathbb{R}$ and $h:\mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$, is to minimize over its augmented Lagrangian function:

$$ L(x,y,\mu) = f(x) + \bigl\langle y, h(x)\bigr\rangle + \frac{\mu}{2}\bigl\|h(x)\bigr\|_F^2, $$

(10)

where μ is a positive scalar and $y\in \mathbb{R}^{m}$ is the Lagrange multiplier. The ALM method is outlined as Algorithm 1 (see [36] for more details).

Solving the subproblem

$$\min_{x} L(x,y_k,\mu_k) $$

is actually not always easy. When f(x) is separable, e.g., f(x)=f ₁(a)+f ₂(b), where x=(a ^T,b ^T)^T, we may solve this subproblem by minimizing over a and b alternately until convergence. However, in the case that f(x) is convex and h(x) is a linear function of x, the ADM requires updating a and b only once and it can still be proven that the iteration converges to the optimal solution [37]. The ADM for two variables is outlined as Algorithm 2. The generalization to the multi-variable and multi-constraint case is straightforward.

1.2 A.2 Subproblems of multi-component TILT

When applying ADM to the linearized multi-component TILT, whose augmented Lagrangian function is

$$\begin{aligned} \begin{aligned} &L\bigl(\{A_i\},A,E,\Delta \tau,\{Y_i \},Y,\mu \bigr) \\ &\quad = \sum_{i = 1}^n {{\Vert {{A_i}} \Vert }_*} + \Vert A \Vert _* + \lambda \Vert E \Vert _1 \\ &\qquad {}+ \bigl\langle {Y,D \circ \tau + J\Delta \tau - A - E} \bigr\rangle + \sum _{i = 1}^n \bigl\langle {{Y_i},{A_i}- \mathcal{P}_i(A)} \bigr\rangle \\ &\qquad {}+ \frac{\mu }{2} \Biggl( \Vert {D \circ \tau + J\Delta \tau -A- E} \Vert _F^2 \\ &\qquad {}+ \sum_{i = 1}^n {\bigl \Vert {{A_i} -\mathcal{P}_i(A)} \bigr \Vert _F^2} \Biggr), \end{aligned} \end{aligned}$$

where $\mathcal{P}_{i}(A)$ extracts the ith block of A, we have to solve multiple subproblems:

$$\begin{aligned} A_i^{(k+1)} =&\arg\min_{A_i} L \bigl(A_1^{(k+1)},\ldots,A_{i-1}^{(k+1)},A_i, \\ &{}A_{i+1}^{(k)},\ldots, A_n^{(k)},A^{(k)},E^{(k)}, \Delta \tau^{(k)}, \\ &{}\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr),\quad i=1,\ldots,n, \end{aligned}$$

(11)

$$\begin{aligned} A^{(k+1)} =& \arg\min_{A} L\bigl(\bigl \{A_j^{(k+1)}\bigr\},A,E^{(k)},\Delta \tau^{(k)}, \\ &{}\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr), \end{aligned}$$

(12)

$$\begin{aligned} E^{(k+1)} =& \arg\min_{E} L\bigl(\bigl \{A_j^{(k+1)}\bigr\},A^{(k+1)},E,\Delta \tau^{(k)}, \\ &{}\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr), \end{aligned}$$

(13)

$$\begin{aligned} \Delta \tau^{(k+1)} =& \arg\min_{E} L\bigl(\bigl \{A_j^{(k+1)}\bigr\},A^{(k+1)},E^{(k+1)}, \\ &{}\Delta \tau,\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr). \end{aligned}$$

(14)

When solving (11), by leaving out the terms in L that are independent of A _i, we see

$$\begin{aligned} A_i^{(k+1)} =&\arg\min_{A_i} \|A_i\|_*+ \bigl\langle Y_i^{(k)},{A_i} - \mathcal{P}_i\bigl(A^{(k)}\bigr) \bigr\rangle \\ &{}+\frac{\mu^{(k)}}{2}\bigl \Vert {A_i} -\mathcal{P}_i \bigl(A^{(k)}\bigr) \bigr \Vert _F^2 \\ =&\arg\min_{A_i} \|A_i\|_*+\frac{\mu^{(k)}}{2} \bigl \Vert {A_i} - \hat{A}_i^{(k+1)}\bigr \Vert _F^2, \end{aligned}$$

where

$$\hat{A}_i^{(k+1)} = \mathcal{P}_i \bigl(A^{(k)}\bigr) - \bigl(\mu^{(k)}\bigr)^{-1}Y_i^{(k)}. $$

Then by Theorem 2.1 of [38], $A_{i}^{(k+1)}$ can be obtained by thresholding the singular values of $\hat{A}_{i}^{(k+1)}$. Namely, if the singular value decomposition (SVD) of $\hat{A}_{i}^{(k+1)}$ is $U_{ik}\varSigma_{ik}V_{ik}^{T}$, then

$$A_i^{(k+1)} = U_{ik}\mathcal{S}_{(\mu^{(k)})^{-1}}( \varSigma_{ik})V_{ik}^T, $$

where

$$\mathcal{S}_{\varepsilon}(x)=\max\bigl(|x|-\varepsilon,0\bigr)\operatorname {sgn}(x) $$

is the shrinkage operator.

Similarly, when solving (12) with some elaboration we have

$$\begin{aligned} A^{(k+1)} =&\arg\min_{A} \|A\|_* \\ &{} + \bigl\langle {Y^{(k)},D \circ \tau + J\Delta \tau^{(k)} - A - E^{(k)}} \bigr\rangle \\ & {}+ \sum_{i = 1}^n \bigl\langle {Y_i^{(k)}},{A_i^{(k+1)}} - \mathcal{P}_i(A) \bigr\rangle \\ & {}+ \frac{\mu^{(k)}}{2} \Biggl( \bigl \Vert D \circ \tau + J\Delta \tau^{(k)} - A - E^{(k)} \bigr \Vert _F^2 \\ &{}+ \sum_{i = 1}^n \bigl \Vert {A_i^{(k+1)}} -\mathcal{P}_i(A) \bigr \Vert _F^2 \Biggr) \\ =&\arg\min_{A} \|A\|_*+\mu^{(k)}\bigl \Vert A- \hat{A}^{(k+1)}\bigr \Vert _F^2, \end{aligned}$$

where

$$\begin{aligned} \hat{A}^{(k+1)} = &\frac{1}{2} \bigl(D \circ \tau + J\Delta \tau^{(k)} - E^{(k)}+\bigl(\mu^{(k)}\bigr)^{-1}Y^{(k)} \\ &{}+\tilde{A}^{(k+1)} \bigr), \end{aligned}$$

in which $\tilde{A}^{(k+1)}$ is a matrix whose ith block is $A_{i}^{(k+1)}+(\mu^{(k)})^{-1}Y_{i}^{(k)}$. Then again we can apply Theorem 2.1 of [38] to solve for A ^(k+1) as we did above for $A_{i}^{(k+1)}$.

When solving (13), we have

$$\begin{aligned} E^{(k+1)} =&\arg\min_{E} \lambda {\Vert E \Vert _1} \\ &{}+ \bigl\langle {Y^{(k)},D \circ \tau + J\Delta \tau^{(k)} - A^{(k+1)} - E} \bigr\rangle \\ &{}+\frac{\mu^{(k)}}{2}\bigl \Vert {D \circ \tau + J\Delta \tau^{(k)} - A^{(k+1)} - E} \bigr \Vert _F^2 \\ =&\lambda \Vert E \Vert _1 +\frac{\mu^{(k)}}{2}\bigl \Vert E - \hat{E}^{(k+1)}\bigr \Vert _F^2, \end{aligned}$$

where

$$\hat{E}^{(k+1)}= D \circ \tau + J\Delta \tau^{(k)} - A^{(k+1)}+\bigl(\mu^{(k)}\bigr)^{-1}Y^{(k)}. $$

Then its well known [39] that the solution to the above problem is

$$E^{(k+1)} = \mathcal{S}_{\lambda(\mu^{(k)})^{-1}}\bigl(\hat{E}^{(k+1)}\bigr), $$

where $\mathcal{S}$ is the shrinkage operator defined above.

When solving (14), we have

$$\begin{aligned} \Delta \tau^{(k+1)}= \arg\min_{\Delta \tau} \bigl\|J\Delta \tau- \widehat{\Delta \tau}^{(k+1)} \bigr\|_F^2, \end{aligned}$$

(15)

where

$$\widehat{\Delta \tau}^{(k+1)} = -D \circ \tau + A^{(k+1)} + E^{(k+1)} - \bigl(\mu^{(k)}\bigr)^{-1}Y^{(k)}. $$

So

$$\begin{aligned} \begin{array}{c} \Delta \tau^{(k+1)}= J^{\dagger}\widehat{\Delta \tau}^{(k+1)}, \end{array} \end{aligned}$$

where J ^† is the pseudo-inverse of J and $\widehat{\Delta \tau}^{(k+1)}$ is rearranged into a vector.

1.3 A.3 The complete algorithm

Now we can present the ADM for multi-component TILT as Algorithm 3.

Note that in Algorithm 3, the transformed image D∘τ has to be normalized to prevent the region of interest from shrinking into a black pixel [2]. For more details of implementation issues, such as maintaining the area and aspect ratio, please refer to [2]. We will also post our code online if the paper is accepted.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Lin, Z., Sun, F. et al. Transform invariant text extraction. Vis Comput 30, 401–415 (2014). https://doi.org/10.1007/s00371-013-0864-7

Download citation

Published: 30 August 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s00371-013-0864-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transform invariant text extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Review on Text Recognition in Natural Scene Images

Scene text detection and recognition: a survey

Robust Text Detection in Natural Scene Images

Notes

References

Acknowledgements