An automatic histogram detection and information extraction from document images

Anagha, P. H.; Baskar, A.

doi:10.1007/s10772-020-09756-1

An automatic histogram detection and information extraction from document images

Published: 13 October 2020

Volume 24, pages 77–85, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

P. H. Anagha¹ &
A. Baskar¹

213 Accesses
5 Citations
Explore all metrics

Abstract

Histogram is an important data chart that is commonly present in scientific documents. In this paper, an automatic histogram detection and information extraction methodology, based on Hough line detector and Morphological operator, is proposed. The proffered system is comprised of three steps: pre-processing, axis detection, and chart pattern extraction. In the pre-processing step, the RGB image pattern of a histogram is converted into a binary image. Next, in the axis detection step, horizontal axis, vertical axis and title of the histogram are extracted. In this step Hough line detector methodology was applied to detect horizontal and vertical lines in the image patterns. From the set of identified vertical lines, both the endpoints of a line, having the same minimum values of x co-ordinate was considered as a vertical axis. Similarly, from the set of identified horizontal lines, the two endpoints of a line having the same maximum values of y co-ordinate were considered as a horizontal axis. With respect to the dimensions of the horizontal axis and vertical axis, a rectangular region containing horizontal axis values and label, vertical axis values and label and title are extracted. In the final chart pattern extraction step, using morphological operations, the frequency of data present in the histogram was identified. Verification and validation tests of the propounded system yielded promising results, indicative of efficient approach for extraction of histogram information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Fig. 11

Fig. 22

Fig. 23

Fig. 32

Fig. 40

Fig. 41

Binarization with the Local Otsu Filter

Speedy Character Line Detection Algorithm Using Image Block-Based Histogram Analysis

Binary Line Oriented Histogram

References

Al-Zaidy, R. A. & Giles, C. L. (2015). Automatic extraction of data from bar charts. In Proceedings of the 8th International Conference on Knowledge Capture, ACM, p. 30.
Al-Zaidy, R. A., Choudhury, S., & Giles, C. L. (2016). Automatic summary generation for scientific data charts. In AAAI 2016 Workshop on Scholarly Big Data.
Demir, S., Carberry, S., & McCoy, K. F. (2008). Generating textual summaries of bar charts. In Proceedings of the Fifth International Natural Language Generation Conference. Association for Computational Linguistics, pp. 7–15.
Dhanalakshmy, D. M. & Menon, H. P. (2017). Curved document image rectification. In: Advances in Computing, Communications and Informatics (ICACCI), 2017 International Conference on. IEEE, pp. 783–786, 2017.
Duda, R. O., & Hart, P. E. (1972). Use of the Hough transforms to detect lines and curves in pictures. Communications of ACM, 15(1), 11–15.
Article Google Scholar
Elzer, S., Green, N., Carberry, S., & Hoffman, J. (2006). A model of perceptual task effort for bar charts and its role in recognizing intention. User Modeling and User-Adapted Interaction, 16(1), 1–30.
Article Google Scholar
Elzer, S., Carberry, S., & Zukerman, I. (2011). The automated understanding of simple bar charts. Artificial Intelligence, 175(2), 526–555.
Article MathSciNet Google Scholar
Fang, J., Mitra, P., Tang, Z., & Giles, C. L. (2012). Table header detection and classification. In AAAI.
Kafle, K., Price, B., Cohen, S., & Kanan, C. (2018). DVQA: Understanding data visualizations via question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656.
Kallimani, J. S., Srinivasa, K., & Eswara, R. B. (2013). Extraction and interpretation of charts in technical documents. In Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on. IEEE, , pp. 382–387, 2013.
Nevetha, M. P. & Baskar, A. (2015). Applications of text detection and its challenges: A review. In: ACM International Conference Proceeding Series, 10–13 August 2015, pp. 712–721.
Raid, A. M., Khedr, W. M., El-Dosuky, M. A., & Aoud, M. (2014). Image restoration based on morphological operations. International Journal of Computer Science, Engineering and Information Technology. https://doi.org/10.5121/ijcseit.2014.4302.
Article Google Scholar
Sindhuja, S., & Baskar, A. (2017). An automatic table detection and cell extraction using image morphological operations. Journal of Advanced Research in Dynamical and Control Systems, 9(Special issue 11), 184–193.
Google Scholar
Sreedhar, K., & Panlal, B. (2012). Enhancement of images using morphological transformation. arXiv preprint arXiv: 1203.2514.

Download references

Author information

Authors and Affiliations

Dept of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
P. H. Anagha & A. Baskar

Authors

P. H. Anagha
View author publications
You can also search for this author in PubMed Google Scholar
A. Baskar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Baskar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anagha, P.H., Baskar, A. An automatic histogram detection and information extraction from document images. Int J Speech Technol 24, 77–85 (2021). https://doi.org/10.1007/s10772-020-09756-1

Download citation

Received: 12 December 2019
Accepted: 25 September 2020
Published: 13 October 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10772-020-09756-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automatic histogram detection and information extraction from document images

Abstract

Access this article

Similar content being viewed by others

Binarization with the Local Otsu Filter

Speedy Character Line Detection Algorithm Using Image Block-Based Histogram Analysis

Binary Line Oriented Histogram

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An automatic histogram detection and information extraction from document images

Abstract

Access this article

Similar content being viewed by others

Binarization with the Local Otsu Filter

Speedy Character Line Detection Algorithm Using Image Block-Based Histogram Analysis

Binary Line Oriented Histogram

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation