A comparison of local features for camera-based document image retrieval and spotting

Dang, Quoc Bao; Coustaty, Mickaël; Luqman, Muhammad Muzzamil; Ogier, Jean-Marc

doi:10.1007/s10032-019-00329-w

A comparison of local features for camera-based document image retrieval and spotting

Special Issue Paper
Published: 12 July 2019

Volume 22, pages 247–263, (2019)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Quoc Bao Dang ORCID: orcid.org/0000-0001-5376-6972¹,
Mickaël Coustaty¹,
Muhammad Muzzamil Luqman¹ &
…
Jean-Marc Ogier¹

515 Accesses
5 Citations
Explore all metrics

Abstract

This paper aims at comparing robustness of local features for camera-based document image retrieval and spotting system. We present a literature review of the state of the art of local features extraction that includes keypoint detectors and keypoint descriptors. We also present a dataset and evaluation protocol for camera-based document image retrieval and spotting systems. This dataset is composed of three subparts: The first dataset represents the images with textual content only; the second dataset represents images with graphical content mainly; the third dataset contains text plus graphical elements. Along with the datasets, we present the protocol that describes measurements to evaluate the accuracy and processing time of camera-based document image retrieval and spotting systems. The latter is employed for presenting a detailed evaluation of local features from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Historical Document Image Binarization: A Review

Article 16 May 2020

A Comparative Study of Local and Global Feature Detector and Descriptors for Image and Video Retrieval

Impact of Distortions on the Performance of Feature Extraction and Matching Techniques

Notes

References

Liu, Q., Liao, C.: Paperui. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR), pp. 83–100. Springer, Berlin (2012)
Takeda, K., Kise, K., Iwamura, M.: Real-time document image retrieval on a smartphone. In: 10th IAPR International Workshop on Document Analysis Systems (DAS) 2012, pp. 225–229. IEEE, New York (2012)
Hull, J.J., Erol, B., Graham, J., Ke, Q., Kishi, H., Moraleda, J., Van Olst, D.G.: Paper-based augmented reality. In: 17th International Conference on Artificial Reality and Telexistence, pp. 205–209. IEEE, New York (2007)
Electronic Content Management: https://www.imagenetconsulting.com
Liu, X., Doermann, D.: Mobile retriever-finding document with a snapshot. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR), pp. 29–34 (2007)
Google Goggles in Action: http://www.google.com/mobile/
Kooaba: http://kooaba.com/
Smeaton, A.F., Spitz, A.L.: Using character shape coding for information retrieval. In: Proceedings of the fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 974–978. IEEE, New York (1997)
Shijian, L., Tan, C.L.: Script and language identification in noisy and degraded document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 14–24 (2008)
Article Google Scholar
Lu, S., Tan, C.L.: Keyword spotting and retrieval of document images captured by a digital camera. In: 9th International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 994–998. IEEE, New York (2007)
Spitz, A.L.: Determination of the script and language content of document images. IEEE Trans. Pattern Anal. Mach. Intell. 19(3), 235–245 (1997)
Article Google Scholar
Lu, S., Li, L., Tan, C.L.: Document image retrieval through word shape coding. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1913–1918 (2008)
Article Google Scholar
Spitz, A.L.: Using character shape codes for word spotting in document images. In: Dori D., Bruckstein, A. (eds.) Shape, Structure and Pattern Recognition, pp. 382–389. World Scientific (1995)
Lu, S., Tan, C.L.: Retrieval of machine-printed latin documents through word shape coding. Pattern Recognit. 41, 1799–1809 (2008)
Article MATH Google Scholar
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends\(^{\textregistered }\) Comput. Graph. Vis. 3, 177–280 (2008)
Rusinol, M., Karatzas, D., Lladós, J.: Spotting graphical symbols in camera-acquired documents in real time. In: Proceedings of the 10th IAPR International Workshop on Graphics Recognition (GREC), 2013 (2013)
Liu, Q., Kimber, D., Liao, C., Wilcox, L., et al.: High accuracy and language independent document retrieval with a fast invariant transform. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 386–389. IEEE, New York (2009)
Li, J., Allinson, N.M.: A comprehensive review of current local features for computer vision. Neurocomputing 71, 1771–1787 (2008)
Article Google Scholar
Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 7, 84–104 (2005)
Article Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, p. 50, Manchester (1988)
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer Vision (ECCV), 2006, pp. 430–443. Springer, Berlin (2006)
Moravec, H.P.: Towards automatic visual obstacle avoidance. In: Proceedings of the 5th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI 1977 (1977)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. J. Comput. Vis. 60, 63–86 (2004)
Article Google Scholar
Smith, S.M., Brady, J.M.: Susan—a new approach to low level image processing. Int. J. Comput. Vis. 23, 45–78 (1997)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 IEEE International Conference on Proceedings of the 2011 International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE, New York (2011)
Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: Binary robust invariant scalable keypoints. In: Proceedings of the 2011 International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE, New York (2011)
Mair, E., Hager, G.D., Burschka, D., Suppa, M., Hirzinger, G.: Adaptive and generic corner detection based on the accelerated segment test. In: Proceedings of the 11th European Conference on Computer Vision (ECCV), pp. 183–196. Springer, Berlin (2010)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). In: Computer Vision and Image Understanding, pp. 346–359 (2008)
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: Kaze features. In: European Conference on Computer Vision, pp. 214–227. Springer, Berlin (2012)
Alcantarilla, P.F., Solutions, T.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Patt. Anal. Mach. Intell. 34(7), 1281–1298 (2011)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761–767 (2004)
Article Google Scholar
Nakai, T., Kise, K., Iwamura, M.: Camera based document image retrieval with more time and memory efficient LLAH. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR), pp. 21–28 (2007)
Nakai, T., Kise, K., Iwamura, M.: Use of affine invariants in locally likely arrangement hashing for camera-based document image retrieval. In: International Workshop on Document Analysis Systems (DAS) 2006, pp. 541–552. Springer, Berlin (2006)
Kise, K., Chikano, M., Iwata, K., Iwamura, M., Uchida, S., Omachi, S.: Expansion of queries and databases for improving the retrieval accuracy of document portions: an application to a camera-pen system. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS) 2010, pp. 309–316. ACM, New York (2010)
Desolneux, A., Moisan, L., Morel, J.M.: From Gestalt Theory to Image Analysis: A Probabilistic Approach. Springer, Berlin (2007)
Google Scholar
Panetta, K.A., Wharton, E.J., Agaian, S.S.: Human visual system-based image enhancement and logarithmic contrast measure. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38, 174–188 (2008)
Article Google Scholar
Beghdadi, A., Larabi, M.C., Bouzerdoum, A., Iftekharuddin, K.M.: A survey of perceptual image processing methods. Sig. Process. Image Commun. 28, 811–831 (2013)
Article Google Scholar
Fan, B., Wang, Z., Wu, F.: Local Image Descriptor: Modern Approaches. Springer, Berlin (2015)
Book MATH Google Scholar
Rosin, P.L.: Measuring corner properties. Comput. Vis. Image Underst. 73, 291–307 (1999)
Article Google Scholar
Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–517. IEEE, New York (2012)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: European Conference on Computer Vision (ECCV), pp. 778–792. Springer, Berlin (2010)
Agrawal, M., Konolige, K., Blas, M.R.: Censure: center surround extremas for realtime feature detection and matching. In: European Conference on Computer Vision, pp. 102–115. Springer, Berlin (2008)
Trzcinski, T., Christoudias, M., Fua, P., Lepetit, V.: Boosting binary keypoint descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2874–2881 (2013)
Nakai, T., Kise, K., Iwamura, M.: Use of affine invariants in locally likely arrangement hashing for camera-based document image retrieval. In: Proceedings of International Workshop on Document Analysis Systems(DAS), pp. 541–552. Springer, Berlin (2006)
Nakai, T., Kise, K., Iwamura, M.: Hashing with local combinations of feature points and its application to camera-based document image retrieval. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR) 2005, pp. 87–94 (2005)
Iwamura, M., Nakai, T., Kise, K.: Improvement of retrieval speed and required amount of memory for geometric hashing by combining local invariants. In: Proceedings 18th British Machine Vision Conference (BMVC), pp. 1010–1019 (2007)
Takeda, K., Kise, K., Iwamura, M.: Real-time document image retrieval for a 10 million pages database with a memory efficient and stability improved LLAH. In: International Conference on Document Analysis and Recognition (ICDAR), 2011, pp. 1054–1058 (2011)
Nakai, T., Kise, K., Iwamura, M.: Real-time retrieval for images of documents in various languages using a web camera. In: 10th International Conference on Document Analysis and Recognition (ICDAR) 2009, pp. 146–150. IEEE, New York (2009)
Dang, Q., Luqman, M., Coustaty M.N., Tran, C., Ogier, J.: Srif: scale and rotation invariant features for camera-based document image retrieval. In: ICDAR’15. 13th International Conference on Document Analysis and Recognition, 2015, pp. 601–605. IEEE, New York (2015)
Dang, Q.B., Coustaty, M., Luqman, M.M., Ogier, J.M., De Tran, C.: New spatial-organization-based scale and rotation invariant features for heterogeneous-content camera-based document image retrieval. Pattern Recogn. Lett. 112, 153–160 (2018)
Article Google Scholar
Zheng, Q.F., Wang, W.Q., Gao, W.: Effective and efficient object-based image retrieval using visual phrases. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 77–80. ACM, New York (2006)
Nowozin, S., Lampert, C.H.: Structured learning and prediction in computer vision. Found. Trends\(^{\textregistered }\) Comput. Graph. Vis. 6, 185–365 (2011)
Blaschko, M.B., Lampert, C.H.: Learning to localize objects with structured output regression. In: European Conference on Computer Vision, pp. 2–15. Springer, Berlin (2008)
Tu, Z.: Auto-context and its application to high-level vision tasks. In: 2008. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, New York (2008)
Kontschieder, P., Bulo, S.R., Bischof, H., Pelillo, M.: Structured class-labels in random forests for semantic image labelling. In: 2011 International Conference on Computer Vision, pp. 2190–2197. IEEE, New York (2011)
Yang, Y., Li, Z., Zhang, L., Murphy, C., Ver Hoeve, J., Jiang, H.: Local label descriptor for example based semantic image labeling. In: European Conference on Computer Vision, pp. 361–375. Springer, Berlin (2012)
Maestri, M., Odel, J., Hegdé, J.: Semantic descriptor ranking: a quantitative method for evaluating qualitative verbal reports of visual cognition in the laboratory or the clinic. Front. Psychol. 5, 160 (2014)
Article Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Agam, G., Argamon, S., Frieder, O., Grossman, D., Lewis, D.: The complex document image processing (CDIP) test collection project. Illinois Institute of Technology (2006). http://ir.iit.edu/projects/CDIP.html
University of California, San Francisco: The Legacy Tobacco Document Library (LTDL) (2007). http://legacy.library.ucsf.edu/
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
Article MathSciNet Google Scholar
Valenzuela, R.E.G., Schwartz, W.R., Pedrini, H.: Dimensionality reduction through PCA over SIFT and SURF descriptors. In: 2012 IEEE 11th International Conference on Cybernetic Intelligent Systems (CIS), pp. 58–63. IEEE, New York (2012)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: The 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP) p. 2 (2009)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 950–961. VLDB Endowment, New York (2007)
Fitzgibbon, A.W., Fisher, R.B., et al.: A buyer’s guide to conic fitting. DAI Research Paper (1996)
Ricaurte, P., Chilán, C., Aguilera-Carrasco, C.A., Vintimilla, B.X., Sappa, A.D.: Feature point descriptors: infrared and visible spectra. Sensors 14, 3690–3701 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

L3i Laboratory, University of La Rochelle, Avenue Michel Crépeau, 17042, La Rochelle Cedex 1, France
Quoc Bao Dang, Mickaël Coustaty, Muhammad Muzzamil Luqman & Jean-Marc Ogier

Authors

Quoc Bao Dang
View author publications
You can also search for this author in PubMed Google Scholar
Mickaël Coustaty
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Muzzamil Luqman
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Ogier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Bao Dang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dang, Q.B., Coustaty, M., Luqman, M.M. et al. A comparison of local features for camera-based document image retrieval and spotting. IJDAR 22, 247–263 (2019). https://doi.org/10.1007/s10032-019-00329-w

Download citation

Received: 15 November 2018
Revised: 07 June 2019
Accepted: 18 June 2019
Published: 12 July 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10032-019-00329-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of local features for camera-based document image retrieval and spotting

Abstract

Access this article

Similar content being viewed by others

Historical Document Image Binarization: A Review

A Comparative Study of Local and Global Feature Detector and Descriptors for Image and Video Retrieval

Impact of Distortions on the Performance of Feature Extraction and Matching Techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparison of local features for camera-based document image retrieval and spotting

Abstract

Access this article

Similar content being viewed by others

Historical Document Image Binarization: A Review

A Comparative Study of Local and Global Feature Detector and Descriptors for Image and Video Retrieval

Impact of Distortions on the Performance of Feature Extraction and Matching Techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation