Skip to main content
Log in

Abstract

This paper explores the use of attributes for document image querying and retrieval. Existing document image retrieval techniques present several drawbacks: textual searches are limited to text, query-by-example searches require a sample query document on hand, and layout-based searches rigidly assign documents to one of several preset classes. Attributes have yet to be fully exploited in document image analysis. We describe document images based on attributes and utilize those descriptions to form a new querying paradigm for document image retrieval that addresses the above limitations: attribute-based document image retrieval (ABDIR). We create attribute-based descriptions of the documents using an expandable set of individual, independent attribute classifiers built on convolutional neural network architectures. We combine the descriptions to form queries of variable complexity which retrieve a ranked list of document images. ABDIR allows users to search for documents based on memorable visual features of their contents in a flexible way, with queries like “Find documents that have a one-column layout, are table dominant, and are colorful”, or “Find historical documents that are illuminated and have see-through artifacts”. Experiments on the recent PubLayNet and HisIR19 datasets demonstrate the system’s ability to extract various document image attributes with high accuracy, with Darknet-53 performing best, and show very promising results for document image retrieval. ABDIR is scalable and versatile: it is easy to change, add, and remove attributes, and easy to adapt queries to new domains. It provides for document image retrieval capabilities that are not possible or are impractical with other paradigms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The authors used the publicly available PubLayNet and HisIR19 datasets.

Notes

  1. Here, we consider mixed content as a mixture of text, tables, and figures.

  2. Can be applied or pendent.

  3. Illuminated manuscripts: Handwritten books with painted flourishes, such as borders and miniature illustrations, that typically include precious metals (gold or silver).

  4. See-through: One of the most common degradations affecting historical documents that are written or printed on both sides of the page, an undesired pattern in the background caused by the text/ink in the reverse side of the page [44].

References

  1. Feris, R.S., et al.: Introduction to visual attributes. In: Feris, R.S., Lampert, C., Parikh, D. (eds.) Visual Attributes. Advances in Computer Vision and Pattern Recognition, pp. 1–7. Springer, Cham (2017)

    Google Scholar 

  2. Hwang, S.J., et al.: Sharing features between objects and their attributes. In: CVPR, IEEE, pp 1761–8 (2011)

  3. Zhang, F., et al.: Grouped attribute strength-based image retrieval. J . Electron. Imaging 28(1), 013048 (2019)

    Article  ADS  Google Scholar 

  4. Lampert, C.H., et al.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Patt. Anal. Mach. Intell. 36(3), 453–65 (2013)

    Article  Google Scholar 

  5. Liu, J., et al.: Recognizing human actions by attributes. In: CVPR, IEEE, pp 3337–44 (2011)

  6. Yan, X., et al.: Attribute2Image: Conditional image generation from visual attributes. In: ECCV, pp. 776–91. Springer, Cham (2016)

  7. Almazán, J., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Patt. Anal. Mach. Intell. 36(12), 2552–66 (2014)

    Article  Google Scholar 

  8. Ferrari, V., Zisserman, A.: Learning visual attributes. Adv. Neural Inf. Process Syst. 433–40 (2007)

  9. Engelkamp, J., Zimmer, H.D.: Human Memory: A Multimodal Approach. Hogrefe & Huber Publishers, Seattle (1994)

    Google Scholar 

  10. Blanc-Brude, T., Scapin, D.L.: What do people recall about their documents? Implications for desktop search tools. In: IUI, ACM, pp 102–11 (2007)

  11. Borkin, M.A., et al.: What makes a visualization memorable? IEEE Trans. Vis. Comput. Gr. 19(12), 2306–15 (2013)

    Article  Google Scholar 

  12. Giotis, A.P., et al.: A survey of document image word spotting techniques. Patt. Recognit. 68, 310–32 (2017)

    Article  ADS  Google Scholar 

  13. Duan, L.Y., et al.: Towards mobile document image retrieval for digital library. IEEE Trans. Multimed. 16(2), 346–59 (2013)

    Article  Google Scholar 

  14. Roy, S.D., et al.: Camera-based document image matching using multi-feature probabilistic information fusion. Patt. Recognit. Lett. 58, 42–50 (2015)

    Article  ADS  Google Scholar 

  15. Sharma, N., et al.: Signature and logo detection using deep CNN for document image retrieval. In: ICFHR, IEEE, pp 416–22 (2018)

  16. Zhu, G., Doermann, D.: Logo matching for document image retrieval. In: ICDAR’09, IEEE, pp 606–10 (2009)

  17. Ubeda, I., et al.: Improving pattern spotting in historical documents using feature pyramid networks. Patt. Recognit. Lett. 131, 398–404 (2020)

    Article  ADS  Google Scholar 

  18. Marinai, S., et al.: Layout based document image retrieval by means of XY tree reduction. In: ICDAR, IEEE, pp 432–6 (2005)

  19. Kumar, J., et al.: Structural similarity for document image classification and retrieval. Patt. Recognit. Lett. 43, 119–26 (2014)

    Article  ADS  Google Scholar 

  20. Marinai, S., et al.: Digital libraries and document image retrieval techniques: A survey. In: Biba, M., Xhafa, F. (eds.) Learning Structure and Schemas from Documents, Studies in Computational Intelligence, vol. 375, pp. 181–204. Springer, Berlin (2011)

    Google Scholar 

  21. Siddiquie, B., et al.: Image ranking and retrieval based on multi-attribute queries. In: CVPR, IEEE, pp 801–8 (2011)

  22. Liu, Z., et al.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: CVPR, IEEE, pp 1096–104 (2016)

  23. Zhao, B., et al.: Memory-augmented attribute manipulation networks for interactive fashion search. In: CVPR, IEEE, pp 1520–8 (2017)

  24. Kumar, N., et al.: Describable visual attributes for face verification and image search. IEEE Trans. Patt. Anal. Mach. Intell. 33(10), 1962–77 (2011)

    Article  Google Scholar 

  25. An, L., et al.: Scalable attribute-driven face image retrieval. Neurocomput. 172, 215–24 (2016)

    Article  Google Scholar 

  26. Fang, Y., Yuan, Q.: Attribute-enhanced metric learning for face retrieval. EURASIP J. Image Video Process. 2018, 44 (2018)

    Article  Google Scholar 

  27. Sandeep, R.N., et al.: Relative parts: Distinctive parts for learning relative attributes. In: CVPR, IEEE, pp 3614–21 (2014)

  28. Kovashka, A., et al.: Whittlesearch: Interactive image search with relative attribute feedback. Int. J. Comput. Vis. 115(2), 185–210 (2015)

    Article  MathSciNet  Google Scholar 

  29. Yu, Z., Kovashka, A.: Syntharch: Interactive image search with attribute-conditioned synthesis. In: CVPRW, IEEE/CVF, pp 170–1 (2020)

  30. Albu, A.B., Nagy, G.: Imaging reality and abstraction an exploration of natural and symbolic patterns. In: VISIGRAPP (VISAPP), SCITEPRESS, pp 415–22 (2021)

  31. Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 29(9), 2352–449 (2017)

    Article  MathSciNet  PubMed  Google Scholar 

  32. He, K., et al.: Deep residual learning for image recognition. In: CVPR, IEEE, pp 770–8 (2016)

  33. Huang, G., et al.: Densely connected convolutional networks. In: CVPR, IEEE, pp 4700–8 (2017)

  34. Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: CVPR, IEEE, pp 2818–26 (2016)

  35. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: CVPR, IEEE, pp 1251–8 (2017)

  36. Szegedy, C., et al.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: AAAI-17, pp 4278–84 (2017)

  37. Redmon, C., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  38. Zoph, B., et al.: Learning transferable architectures for scalable image recognition. In: CVPR, IEEE, pp 8697–710 (2018)

  39. Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: ICML, PMLR, pp 6105–14 (2019)

  40. Zhang, C., et al.: ResNet or DenseNet? Introducing dense shortcuts to ResNet. In: WACV, IEEE/CVF, pp 3550–9 (2021)

  41. Jiao, L., Zhao, J.: A survey on the new generation of deep learning in image processing. IEEE Access 7, 172231–63 (2019)

    Article  Google Scholar 

  42. Zhong, X., et al.: PubLayNet: Largest dataset ever for document layout analysis. In: ICDAR, IEEE, pp 1015–22 (2019)

  43. Christlein, V., et al.: ICDAR 2019 competition on image retrieval for historical handwritten documents. In: ICDAR, IEEE, pp 1505–9 (2019)

  44. Tonazzini, A., Bedini, L.: Restoration of recto-verso colour documents using correlated component analysis. EURASIP J. Adv. Sign. Process. 2013, 58 (2013)

    Article  ADS  Google Scholar 

  45. Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In: CVPR’09, IEEE, pp 248–55 (2009)

  46. Manning, C.D., et al.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  47. US National Archives (2022) Project BLUE BOOK: Unidentified Flying Objects. https://www.archives.gov/research/military/air-force/ufos. Accessed 18 Jan 2022

Download references

Acknowledgements

The authors would like to thank Mike Mabey at QuirkLogic Inc. for his valuable input on use cases and usability.

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada and QuirkLogic Inc. through the CRD Grants program (No. CRDPJ 525586-18).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by M. Cote. The first draft of the manuscript was written by M. Cote, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Melissa Cote.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cote, M., Branzan Albu, A. Attribute-based document image retrieval. IJDAR 27, 57–71 (2024). https://doi.org/10.1007/s10032-023-00447-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-023-00447-6

Keywords

Navigation