Skip to main content
Log in

Transform invariant text extraction

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Automatically extracting texts from natural images is very useful for many applications such as augmented reality. Most of the existing text detection systems require that the texts to be detected (and recognized) in an image are taken from a nearly frontal viewpoint. However, texts in most images taken naturally by a camera or a mobile phone can have a significant affine or perspective deformation, making the existing text detection and the subsequent OCR engines prone to failures. In this paper, based on stroke width transform and texture invariant low-rank transform, we propose a framework that can detect and rectify texts in arbitrary orientations in the image against complex backgrounds, so that the texts can be correctly recognized by common OCR engines. Extensive experiments show the advantage of our method when compared to the state of art text detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Although −40, −20, 0, and +20 are more natural choices of rotation angles, as in natural scenes there are rarely texts that are exactly horizontal, from our experience these angles are not as effective for detecting texts in all directions.

  2. Notice that this labeling is somewhat different from the ground truth labeling used in the ICDAR dataset in which each word is labeled separately. The dataset will be released for public evaluation if our paper is accepted for publication.

References

  1. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)

    Google Scholar 

  2. Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: transform invariant low-rank textures. In: ACCV (2010)

    Google Scholar 

  3. Dinh, V.C., Chun, S.S., Cha, S., Ryu, H., Sull, S.: An efficient method for text detection in video based on stroke width similarity. In: ACCV, vol. 1, pp. 200–209 (2007)

    Google Scholar 

  4. Subramanian, K., Natarajan, P., Decerbo, M., Castañón, D.: Character-stroke detection for text-localization and extraction. In: ICDAR, pp. 33–37 (2007)

    Google Scholar 

  5. Jung, C., Liu, Q., Kim, J.: A stroke filter and its application to text localization. Phys. Rev. Lett. 30(2), 114–122 (2009)

    Google Scholar 

  6. Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)

    Article  Google Scholar 

  7. Liang, J., Doermann, D.S., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)

    Article  Google Scholar 

  8. Daniilidis, K., Maragos, P., Paragios, N. (eds.): Proceedings of Computer vision—ECCV 2010, 11th European conference on computer vision, Part I, Heraklion, Crete, Greece, September 5–11, 2010. Lecture Notes in Computer Science, vol. 6311. Springer, Berlin (2010)

  9. Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR, pp. 440–445 (2011)

    Google Scholar 

  10. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV, vol. 3, pp. 770–783 (2010)

    Google Scholar 

  11. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)

    Google Scholar 

  12. Shivakumara, P., Phan, T.Q., Tan, C.L.: New wavelet and color features for text detection in video. In: ICPR, pp. 3996–3999 (2010)

    Google Scholar 

  13. Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)

    Article  Google Scholar 

  14. Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. IEEE TPAMI 22(4), 385–392 (2000)

    Article  Google Scholar 

  15. Shivakumara, P., Huang, W., Tan, C.L.: Efficient video text detection using edge features. In: ICPR, pp. 1–4 (2008)

    Google Scholar 

  16. Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE TPAMI 21(11), 1224–1229 (1999)

    Article  Google Scholar 

  17. Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE TPAMI 25(12), 1631–1639 (2003)

    Article  MathSciNet  Google Scholar 

  18. Li, H., Doermann, D.S., Kia, O.E.: Automatic text detection and tracking in digital video. IEEE TIP 9(1), 147–156 (2000)

    Google Scholar 

  19. Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, vol. 2, pp. 366–373 (2004)

    Google Scholar 

  20. Zhang, J., Kasturi, R.: Text detection using edge gradient and graph spectrum. In: ICPR, pp. 3979–3982 (2010)

    Google Scholar 

  21. Shivakumara, P., Phan, T.Q., Tan, C.L.: Video text detection based on filters and edge features. In: ICME, pp. 514–517 (2009)

    Google Scholar 

  22. Yi, J., Peng, Y., Xiao, J.: Color-based clustering for text detection and extraction in image. In: ACM Multimedia, pp. 847–850 (2007)

    Google Scholar 

  23. Liang, J., DeMenthon, D., Doermann, D.: Geometric rectification of camera-captured document images. IEEE TPAMI 30(4), 591–605 (2008)

    Article  Google Scholar 

  24. Pastor, M., Toselli, A., Vidal, E.: Projection profile based algorithm for slant removal. In: Image Analysis and Understanding. Lecture Notes in Computer Science, vol. 3212, pp. 183–190 (2004)

    Google Scholar 

  25. Singha, C., Bhatiab, N., Kaur, A.: Hough transform based fast skew detection and accurate skew correction methods. Pattern Recogn. 14(12), 3528–3546 (2008)

    Article  Google Scholar 

  26. Yuan, B., Tan, C.L.: Convex hull based skew estimation. Pattern Recogn. 40(2), 456–475 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  27. Yan, H.: Skew correction of document images using interline cross-correlation. CVGIP, Graph. Models Image Process., 55(6), 538–543 (1993)

    Article  Google Scholar 

  28. Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: robust alignment by sparse and low-rank decomposition for linearly correlated images. In: CVPR, pp. 763–770 (2010)

    Google Scholar 

  29. Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. 2009, uIUC Technical Report UILU-ENG-09-2215. arXiv:1009.5055

  30. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687 (2003)

    Google Scholar 

  31. Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)

    Article  MathSciNet  Google Scholar 

  32. Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge 2006 (voc 2006) results

  33. ABBYY Corp.: Abbyy finereader

  34. HP Labs.: “Tesseract-ocr,” 1985–1995. Now available at: http://code.google.com/p/tesseract-ocr/

  35. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  36. Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  37. He, B., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 23, 151–161 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  38. Cai, J., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  39. Yin, W., Hale, E., Zhang, Y.: Fixed-point continuation for 1-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

X. Zhang and F. Sun are supported by the National Natural Science Foundation of China (NNSFC) under Grant No. 2013CB329403. Z. Lin is supported by the NNSFC under Grant Nos. 61272341, 61231002, and 61121002. Y. Ma is partially supported by the funding of ONR N00014-09-1-0230, NSF CCF 09-64215, NSF IIS 11-16012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Zhang.

Appendix: Detailed derivation of the ADM for multi-component TILT

Appendix: Detailed derivation of the ADM for multi-component TILT

1.1 A.1 ALM and ADM methods

The alternating direction method (ADM, also called inexact augmented Lagrange multiplier (IALM) method in [29]) is a variant of the augmented Lagrange multiplier (ALM) method. The usual ALM method for solving the following model problem:

$$ \min f(x),\quad\mbox{s.t.}\quad h(x)=0, $$
(9)

where \(f:\mathbb{R}^{n}\rightarrow \mathbb{R}\) and \(h:\mathbb{R}^{n} \rightarrow \mathbb{R}^{m}\), is to minimize over its augmented Lagrangian function:

$$ L(x,y,\mu) = f(x) + \bigl\langle y, h(x)\bigr\rangle + \frac{\mu}{2}\bigl\|h(x)\bigr\|_F^2, $$
(10)

where μ is a positive scalar and \(y\in \mathbb{R}^{m}\) is the Lagrange multiplier. The ALM method is outlined as Algorithm 1 (see [36] for more details).

Algorithm 1
figure 16

(General Augmented Lagrange Multiplier Method)

Solving the subproblem

$$\min_{x} L(x,y_k,\mu_k) $$

is actually not always easy. When f(x) is separable, e.g., f(x)=f 1(a)+f 2(b), where x=(a T,b T)T, we may solve this subproblem by minimizing over a and b alternately until convergence. However, in the case that f(x) is convex and h(x) is a linear function of x, the ADM requires updating a and b only once and it can still be proven that the iteration converges to the optimal solution [37]. The ADM for two variables is outlined as Algorithm 2. The generalization to the multi-variable and multi-constraint case is straightforward.

Algorithm 2
figure 17

(General Alternating Direction Method)

1.2 A.2 Subproblems of multi-component TILT

When applying ADM to the linearized multi-component TILT, whose augmented Lagrangian function is

$$\begin{aligned} \begin{aligned} &L\bigl(\{A_i\},A,E,\Delta \tau,\{Y_i \},Y,\mu \bigr) \\ &\quad = \sum_{i = 1}^n {{\Vert {{A_i}} \Vert }_*} + \Vert A \Vert _* + \lambda \Vert E \Vert _1 \\ &\qquad {}+ \bigl\langle {Y,D \circ \tau + J\Delta \tau - A - E} \bigr\rangle + \sum _{i = 1}^n \bigl\langle {{Y_i},{A_i}- \mathcal{P}_i(A)} \bigr\rangle \\ &\qquad {}+ \frac{\mu }{2} \Biggl( \Vert {D \circ \tau + J\Delta \tau -A- E} \Vert _F^2 \\ &\qquad {}+ \sum_{i = 1}^n {\bigl \Vert {{A_i} -\mathcal{P}_i(A)} \bigr \Vert _F^2} \Biggr), \end{aligned} \end{aligned}$$

where \(\mathcal{P}_{i}(A)\) extracts the ith block of A, we have to solve multiple subproblems:

$$\begin{aligned} A_i^{(k+1)} =&\arg\min_{A_i} L \bigl(A_1^{(k+1)},\ldots,A_{i-1}^{(k+1)},A_i, \\ &{}A_{i+1}^{(k)},\ldots, A_n^{(k)},A^{(k)},E^{(k)}, \Delta \tau^{(k)}, \\ &{}\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr),\quad i=1,\ldots,n, \end{aligned}$$
(11)
$$\begin{aligned} A^{(k+1)} =& \arg\min_{A} L\bigl(\bigl \{A_j^{(k+1)}\bigr\},A,E^{(k)},\Delta \tau^{(k)}, \\ &{}\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr), \end{aligned}$$
(12)
$$\begin{aligned} E^{(k+1)} =& \arg\min_{E} L\bigl(\bigl \{A_j^{(k+1)}\bigr\},A^{(k+1)},E,\Delta \tau^{(k)}, \\ &{}\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr), \end{aligned}$$
(13)
$$\begin{aligned} \Delta \tau^{(k+1)} =& \arg\min_{E} L\bigl(\bigl \{A_j^{(k+1)}\bigr\},A^{(k+1)},E^{(k+1)}, \\ &{}\Delta \tau,\bigl\{Y_j^{(k)}\bigr\},Y^{(k)}, \mu^{(k)}\bigr). \end{aligned}$$
(14)

When solving (11), by leaving out the terms in L that are independent of A i , we see

$$\begin{aligned} A_i^{(k+1)} =&\arg\min_{A_i} \|A_i\|_*+ \bigl\langle Y_i^{(k)},{A_i} - \mathcal{P}_i\bigl(A^{(k)}\bigr) \bigr\rangle \\ &{}+\frac{\mu^{(k)}}{2}\bigl \Vert {A_i} -\mathcal{P}_i \bigl(A^{(k)}\bigr) \bigr \Vert _F^2 \\ =&\arg\min_{A_i} \|A_i\|_*+\frac{\mu^{(k)}}{2} \bigl \Vert {A_i} - \hat{A}_i^{(k+1)}\bigr \Vert _F^2, \end{aligned}$$

where

$$\hat{A}_i^{(k+1)} = \mathcal{P}_i \bigl(A^{(k)}\bigr) - \bigl(\mu^{(k)}\bigr)^{-1}Y_i^{(k)}. $$

Then by Theorem 2.1 of [38], \(A_{i}^{(k+1)}\) can be obtained by thresholding the singular values of \(\hat{A}_{i}^{(k+1)}\). Namely, if the singular value decomposition (SVD) of \(\hat{A}_{i}^{(k+1)}\) is \(U_{ik}\varSigma_{ik}V_{ik}^{T}\), then

$$A_i^{(k+1)} = U_{ik}\mathcal{S}_{(\mu^{(k)})^{-1}}( \varSigma_{ik})V_{ik}^T, $$

where

$$\mathcal{S}_{\varepsilon}(x)=\max\bigl(|x|-\varepsilon,0\bigr)\operatorname {sgn}(x) $$

is the shrinkage operator.

Similarly, when solving (12) with some elaboration we have

$$\begin{aligned} A^{(k+1)} =&\arg\min_{A} \|A\|_* \\ &{} + \bigl\langle {Y^{(k)},D \circ \tau + J\Delta \tau^{(k)} - A - E^{(k)}} \bigr\rangle \\ & {}+ \sum_{i = 1}^n \bigl\langle {Y_i^{(k)}},{A_i^{(k+1)}} - \mathcal{P}_i(A) \bigr\rangle \\ & {}+ \frac{\mu^{(k)}}{2} \Biggl( \bigl \Vert D \circ \tau + J\Delta \tau^{(k)} - A - E^{(k)} \bigr \Vert _F^2 \\ &{}+ \sum_{i = 1}^n \bigl \Vert {A_i^{(k+1)}} -\mathcal{P}_i(A) \bigr \Vert _F^2 \Biggr) \\ =&\arg\min_{A} \|A\|_*+\mu^{(k)}\bigl \Vert A- \hat{A}^{(k+1)}\bigr \Vert _F^2, \end{aligned}$$

where

$$\begin{aligned} \hat{A}^{(k+1)} = &\frac{1}{2} \bigl(D \circ \tau + J\Delta \tau^{(k)} - E^{(k)}+\bigl(\mu^{(k)}\bigr)^{-1}Y^{(k)} \\ &{}+\tilde{A}^{(k+1)} \bigr), \end{aligned}$$

in which \(\tilde{A}^{(k+1)}\) is a matrix whose ith block is \(A_{i}^{(k+1)}+(\mu^{(k)})^{-1}Y_{i}^{(k)}\). Then again we can apply Theorem 2.1 of [38] to solve for A (k+1) as we did above for \(A_{i}^{(k+1)}\).

When solving (13), we have

$$\begin{aligned} E^{(k+1)} =&\arg\min_{E} \lambda {\Vert E \Vert _1} \\ &{}+ \bigl\langle {Y^{(k)},D \circ \tau + J\Delta \tau^{(k)} - A^{(k+1)} - E} \bigr\rangle \\ &{}+\frac{\mu^{(k)}}{2}\bigl \Vert {D \circ \tau + J\Delta \tau^{(k)} - A^{(k+1)} - E} \bigr \Vert _F^2 \\ =&\lambda \Vert E \Vert _1 +\frac{\mu^{(k)}}{2}\bigl \Vert E - \hat{E}^{(k+1)}\bigr \Vert _F^2, \end{aligned}$$

where

$$\hat{E}^{(k+1)}= D \circ \tau + J\Delta \tau^{(k)} - A^{(k+1)}+\bigl(\mu^{(k)}\bigr)^{-1}Y^{(k)}. $$

Then its well known [39] that the solution to the above problem is

$$E^{(k+1)} = \mathcal{S}_{\lambda(\mu^{(k)})^{-1}}\bigl(\hat{E}^{(k+1)}\bigr), $$

where \(\mathcal{S}\) is the shrinkage operator defined above.

When solving (14), we have

$$\begin{aligned} \Delta \tau^{(k+1)}= \arg\min_{\Delta \tau} \bigl\|J\Delta \tau- \widehat{\Delta \tau}^{(k+1)} \bigr\|_F^2, \end{aligned}$$
(15)

where

$$\widehat{\Delta \tau}^{(k+1)} = -D \circ \tau + A^{(k+1)} + E^{(k+1)} - \bigl(\mu^{(k)}\bigr)^{-1}Y^{(k)}. $$

So

$$\begin{aligned} \begin{array}{c} \Delta \tau^{(k+1)}= J^{\dagger}\widehat{\Delta \tau}^{(k+1)}, \end{array} \end{aligned}$$

where J is the pseudo-inverse of J and \(\widehat{\Delta \tau}^{(k+1)}\) is rearranged into a vector.

1.3 A.3 The complete algorithm

Now we can present the ADM for multi-component TILT as Algorithm 3.

Algorithm 3
figure 18

(Solving Multi-Component TILT by ADM)

Note that in Algorithm 3, the transformed image Dτ has to be normalized to prevent the region of interest from shrinking into a black pixel [2]. For more details of implementation issues, such as maintaining the area and aspect ratio, please refer to [2]. We will also post our code online if the paper is accepted.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Lin, Z., Sun, F. et al. Transform invariant text extraction. Vis Comput 30, 401–415 (2014). https://doi.org/10.1007/s00371-013-0864-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-013-0864-7

Keywords

Navigation