Skip to main content
Log in

An AI-based approach to auto-analyzing historical handwritten business documents:

As applied to the Kanebo database

  • Research Article
  • Published:
Journal of Computational Social Science Aims and scope Submit manuscript

Abstract

Matching salient points is a key step in visual tasks. However, many of the existing feature representation methods that are widely applied to these tasks, such as scale invariant feature transform (SIFT), suffer from a lack of representation invariance. This shortcoming limits the image representation stability and salient-point matching performance, particularly when images with a great deal of noise information are being processed (e.g., historical documents). We propose a general and effective transformation approach called RIFT (reversal-invariant feature transformation) for feature-robust representation. RIFT achieves gradient binning invariance for feature extraction by transforming the conventional gradient into a polar one. Experimental results on the Kanebo database and three fine-grained reference classification datasets demonstrated that RIFT can robustly improve the performance of local descriptors for image classification without sacrificing computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Angelova, A., Zhu, S. (2013) Efficient object detection and segmentation for fine-grained recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 811–818. https://doi.org/10.1109/CVPR.2013.110.

  2. Bay, H., Ess, A., Tuytelaars, T., & Gool, L. V. (2008). Speeded-Up robust features (SURF). Comput Vis Image Underst (CVIU), 110(3), 346–359. https://doi.org/10.1016/j.cviu.2007.09.014.

    Article  Google Scholar 

  3. Bosch, A., Zisserman, A., Muoz, X. (2006) Scene classification via plsa. In: A. Leonardis, H. Bischof, A. Pinz (eds.) Proc. Eur. Conf. Comput. Vis. (ECCV), Lecture notes in computer science, vol. 3954, p 517–530. Springer, Heidelberg. https://doi.org/10.1007/11744085_40.

  4. Chai, Y., Lempitsky, V., Zisserman, A. (2013) Symbiotic segmentation and part localization for fine-grained categorization. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), p 321–328. https://doi.org/10.1109/ICCV.2013.47

  5. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C. (2004) Visual categorization with bags of keypoints. In: Proc. Eur. Conf. Comput. Vis. (ECCV) Workshop on statistical learning in computer vision, p 1–22. https://doi.org/10.1007/978-3-319-10599-4_8

  6. Dalal, N., Triggs, B. (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1, p 886–893. https://doi.org/10.1109/CVPR.2005.177.

  7. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. J Mach Learn Res, 9, 1871–1874. https://doi.org/10.1145/1390681.1442794.

    Google Scholar 

  8. Feng, J., Ni, B., Tian, Q., Yan, S. (2011) Geometric \(l_{p}\)-norm feature pooling for image classification. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 2609–2704. https://doi.org/10.1109/CVPR.2011.5995370.

  9. Gavves, E., Fernando, B., Snoek, C., Smeulders, A., & Tuytelaars, T. (2015). Local alignments for fine-grained categorization. Int J Comput Vis (IJCV), 111(2), 191–212. https://doi.org/10.1007/s11263-014-0741-5.

    Article  Google Scholar 

  10. Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 580–587. https://doi.org/10.1109/CVPR.2014.81.

  11. Gosselin, P.H., Murray, N., Jgou, H., Perronnin, F (2014) Revisiting the fisher vector for fine-grained classification. Pattern recognition letters 49, 92 – 98. https://doi.org/10.1016/j.patrec.2014.06.011.

  12. Jegou, H., Douze, M., Schmid, C. (2009) On the burstiness of visual elements. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 1169–1176. https://doi.org/10.1109/CVPR.2009.5206609

  13. Leordeanu, M., Hebert, M. (2005) A spectral technique for correspondence problems using pairwise constraints. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), vol. 2, p 1482–1489. https://doi.org/10.1109/ICCV.2005.20

  14. Li, X., Larson, M., Hanjalic, A. (2015) Pairwise geometric matching for large-scale object retrieval. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 5153–5161. https://doi.org/10.1109/CVPR.2015.7299151.

  15. Lin, H. T., Lin, C. J., & Weng, R. C. (2007). A note on platts probabilistic outputs for support vector machines. Mach Learn, 68(3), 267–276.

    Article  Google Scholar 

  16. Lowe, D. (1999) Object recognition from local scale-invariant features. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), p 1150–1157. https://doi.org/10.1109/ICCV.1999.790410

  17. Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A. (2013) Fine-grained visual classification of aircraft. Tech Rep

  18. Murray, N., Perronnin, F. (2014) Generalized max pooling. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 2473–2480. https://doi.org/10.1109/CVPR.2014.317

  19. Parkhi, O., Vedaldi, A., Zisserman, A., Jawahar, C. (2012) Cats and dogs. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 3498–3505. https://doi.org/10.1109/CVPR.2012.6248092

  20. Perronnin, F., Sánchez, J., Mensink, T. (2010) Improving the fisher kernel for large-scale image classification. In: Proc. Eur. Conf. Comput. Vis. (ECCV), p 143–156. http://dl.acm.org/citation.cfm?id=1888089.1888101

  21. Platt, J. (2000) Probabilities for SV machines. In: Advances in large margin classifiers

  22. Sivic, J., Zisserman, A. (2003) Video google: a text retrieval approach to object matching in videos. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), vol. 2, p 1470–1477. https://doi.org/10.1109/ICCV.2003.1238663

  23. Snchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. Int J Comput Vis (IJCV), 105(3), 222–245. https://doi.org/10.1007/s11263-013-0636-x.

    Article  Google Scholar 

  24. Sun, Z., Ampornpunt, N., Varma, M., Vishwanathan, S. Multiple kernel learning and the SMO algorithm. In: Advances in neural information processing systems 23

  25. Takacs, G., Chandrasekhar, V., Tsai, S., Chen, D., Grzeszczuk, R., & Girod, B. (2013). Fast computation of rotation-invariant image features by an approximate radial gradient transform. IEEE Trans Image Proc (TIP), 22(8), 2970–2982. https://doi.org/10.1109/TIP.2012.2230011.

    Article  Google Scholar 

  26. Tuytelaars, T. (2010) Dense interest points. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 2281–2288. https://doi.org/10.1109/CVPR.2010.5539911

  27. Vedaldi, A., Fulkerson, B. (2010) Vlfeat: An open and portable library of computer vision algorithms. In: Proc. ACM the International Conference on Multimedia (ACM MM), ACM, New York, p 1469–1472. https://doi.org/10.1145/1873951.1874249.

  28. Vedaldi, A., Mahendran, S., Tsogkas, S., Maji, S., Girshick, R., Kannala, J., Rahtu, E., Kokkinos, I., Blaschko, M., Weiss, D., Taskar, B., Simonyan, K., Saphra, N., Mohamed, S. (2014) Understanding objects in detail with fine-grained attributes. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), p 3622–3629. https://doi.org/10.1109/CVPR.2014.463

  29. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S. (2011) The caltech-UCSD Birds-200-2011 dataset. Tech Rep

  30. Wang, Z., Feng, J., & Yan, S. (2015). Collaborative linear coding for robust image classification. Int J Comput Vis, 114(2–3), 322–333. https://doi.org/10.1007/s11263-014-0739-z.

    Article  Google Scholar 

  31. Xie, L., Tian, Q., Wang, J., Zhang, B. (2015) Image classification with max-sift descriptors. In: Proc. Int. Conf. Acoustics, Speech and Signal Proc. (ICASSP)

  32. Xie, L., Tian, Q., Wang, M., & Zhang, B. (2014). Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Proc (TIP), 23, 1994–2008.

    Article  Google Scholar 

  33. Xie, L., Tian, Q., Zhang, B. (2014) Max-sift: Flipping invariant descriptors for web logo search. In: Proc. IEEE Int. Conf. Image Proc. (ICIP), pp. 5716–5720. https://doi.org/10.1109/ICIP.2014.7026156.

  34. Xie, L., Wang, J., Lin, W., Zhang, B., Tian, Q. (2015) Ride: reversal invariant descriptor enhancement. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), p 100–108

  35. Zhang, N., Farrell, R., Iandola, F., Darrell, T. (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), p 729–736. https://doi.org/10.1109/ICCV.2013.96

  36. Zhao, W. L., & Ngo, C. W. (2013). Flip-invariant sift for copy and object detection. IEEE Trans Image Proc (TIP), 22(3), 980–991. https://doi.org/10.1109/TIP.2012.2226043.

    Article  Google Scholar 

Download references

Acknowledgements

This project was supported in part by the project funder of OAIR at Kobe University (NO. JINSYA3), PRESTO, JST (Grant No. JPMJPR15D2) and JSPS KAKENHI (Grant No. JP17H01995 and JP16H02032).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinhui Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Takiguchi, T., Takatsuki, Y. et al. An AI-based approach to auto-analyzing historical handwritten business documents:. J Comput Soc Sc 1, 167–185 (2018). https://doi.org/10.1007/s42001-017-0009-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42001-017-0009-2

Keywords

Navigation