Conferences >2018 IEEE 2nd International W...

Information Extraction from Arabic and Latin scanned invoices

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The relevant entity extraction from scanned document image is a very challenging task due to highly heterogeneous templates, and several structure layouts. These problems...Show More

Metadata

Abstract:

The relevant entity extraction from scanned document image is a very challenging task due to highly heterogeneous templates, and several structure layouts. These problems lead to inaccuracy for document image recognized by OCR. In this paper, we propose an effective solution for these problems, in which the relevant entities are extracted from Arabic and Latin scanned invoices. The input of the system is an invoice image which is submitted to an OCR without layout analysis. After, invoices are labeled in the text recognized by the OCR. By combining the logical and physical structures, a local graph model is built for extraction entity. Finally, we implement a correction module which requires the mislabeling correction by eliminating the superfluous parts detected by labeling step. We evaluate the obtained results with 1050 real invoices as reported in experimental section.

Published in: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)

Date of Conference: 12-14 March 2018

Date Added to IEEE Xplore: 04 October 2018

ISBN Information:

DOI: 10.1109/ASAR.2018.8480221

Conference Location: London, UK