skip to main content
10.1145/1815330.1815373acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

smartFIX statistics: towards systematic document analysis performance evaluation and optimization

Published: 09 June 2010 Publication History

Abstract

Before buying a document analysis system, companies typically perform an intensive evaluation project whereby several candidates are invited to process a test set of selected representative documents. Moreover, after a system was purchased, it should continuously be optimized regarding the given customer-specific document appearance.
In this paper we discuss a set of metrics for measuring the performance of common document analysis systems as base for comparing the systems, on the one hand, and for pair-wise comparing the performance of different configurations of one system as the base of system optimization, on the other hand. Finally, we present the smartFIX Statistics tool that is used daily by our customers to tune their smartFIX document analysis system towards higher economic efficiency and, thus, holds as proof of concept of the presented approach.

References

[1]
J. Hu, R. S. Kashi, D. P. Lopresti, G. T. Wilfong, "Evaluating the Performance of Table Processing Algorithms" In: Document Analysis and Recognition, 4(3):140--153, 2002
[2]
J. J. Hull, "Performance Evaluation for Document Analysis" In: Imaging Systems and Technology, 7(4):357--362, 1996
[3]
M. Junker, R. Hoch, A. Dengel, "On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy" In: Proc. ICDAR, 713--716, 1999
[4]
J. Kanai, G. Nagy, "Performance Metrics for Document Understanding Systems" In: Proc. SDAIR, 424--427, 1993
[5]
B. Klein, A. Dengel, "Problem-adaptable document analysis and understanding for high-volume applications" In: Document Analysis and Recognition, 6(3):167--180, 2004
[6]
B. Klein, A. Dengel, S. Agne, "On Benchmarking of Invoice Analysis Systems" In: Proc. DAS, 312--323, 2006
[7]
W. Lehnert, B. Sundheim, "A Performance Evaluation of Text Analysis Technologies" In: AI Magazine, 12(3):81--94, 1991
[8]
V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals" In: Soviet Physics Doklady, 10(8):707--710, 1966
[9]
Y. Wang, I. T. Phillips, R. M. Haralick, "Table Structure Understanding and its Performance Evaluation" In: Pattern Recognition, 37(7):1479--1497, 2004

Cited By

View all
  • (2020)A method for document image enhancement to improve template-based classificationProceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence10.1145/3409501.3409531(87-91)Online publication date: 3-Jul-2020
  • (2014)Business Forms Classification Using Earth Mover's Distance2014 11th IAPR International Workshop on Document Analysis Systems10.1109/DAS.2014.59(11-15)Online publication date: Apr-2014
  • (2012)Towards Understandable Explanations for Document Analysis SystemsProceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems10.1109/DAS.2012.92(6-10)Online publication date: 27-Mar-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
June 2010
490 pages
ISBN:9781605587738
DOI:10.1145/1815330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. document analysis
  3. document separation
  4. evaluation
  5. extraction
  6. key data
  7. metrics
  8. suggestions

Qualifiers

  • Research-article

Conference

DAS '10

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)A method for document image enhancement to improve template-based classificationProceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence10.1145/3409501.3409531(87-91)Online publication date: 3-Jul-2020
  • (2014)Business Forms Classification Using Earth Mover's Distance2014 11th IAPR International Workshop on Document Analysis Systems10.1109/DAS.2014.59(11-15)Online publication date: Apr-2014
  • (2012)Towards Understandable Explanations for Document Analysis SystemsProceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems10.1109/DAS.2012.92(6-10)Online publication date: 27-Mar-2012
  • (2011)Semantic LoggingProceedings of the 2011 International Conference on Document Analysis and Recognition10.1109/ICDAR.2011.230(1140-1144)Online publication date: 18-Sep-2011
  • (2011)Table Content Understanding in SmartFIXProceedings of the 2011 International Conference on Document Analysis and Recognition10.1109/ICDAR.2011.104(488-492)Online publication date: 18-Sep-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media