Loading [MathJax]/extensions/MathMenu.js
Information Extraction from Arabic and Latin scanned invoices | IEEE Conference Publication | IEEE Xplore

Information Extraction from Arabic and Latin scanned invoices


Abstract:

The relevant entity extraction from scanned document image is a very challenging task due to highly heterogeneous templates, and several structure layouts. These problems...Show More

Abstract:

The relevant entity extraction from scanned document image is a very challenging task due to highly heterogeneous templates, and several structure layouts. These problems lead to inaccuracy for document image recognized by OCR. In this paper, we propose an effective solution for these problems, in which the relevant entities are extracted from Arabic and Latin scanned invoices. The input of the system is an invoice image which is submitted to an OCR without layout analysis. After, invoices are labeled in the text recognized by the OCR. By combining the logical and physical structures, a local graph model is built for extraction entity. Finally, we implement a correction module which requires the mislabeling correction by eliminating the superfluous parts detected by labeling step. We evaluate the obtained results with 1050 real invoices as reported in experimental section.
Date of Conference: 12-14 March 2018
Date Added to IEEE Xplore: 04 October 2018
ISBN Information:
Conference Location: London, UK

Contact IEEE to Subscribe

References

References is not available for this document.