Continuous User Feedback Learning for Data Capture from Business Documents

Hanke, Marcel; Muthmann, Klemens; Schuster, Daniel; Schill, Alexander; Aliyev, Kamil; Berger, Michael

doi:10.1007/978-3-642-28931-6_51

Marcel Hanke²⁵,
Klemens Muthmann²⁵,
Daniel Schuster²⁵,
Alexander Schill²⁵,
Kamil Aliyev²⁶ &
…
Michael Berger²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7209))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

1758 Accesses
1 Citations
3 Altmetric

Abstract

Automatically processing production documents requires document type detection as well as data capture to find appropriate index data from a post-OCR representation of the document. While current learning-based methods perform quite well due to many similar documents created with the same template, their machine learning models require intense training and are hard to update frequently. We provide a method for continuously incorporating user feedback in a layout-based extraction process taking care of both immediate learning as well as limiting the size of the model. The method is evaluated on a tagged corpus of more than 5,000 business documents. It allows not only continuous re-training of the model thus adapting it to new document templates, but also starting from scratch with an empty model requiring less than 10% of the corpus as training documents to reach an accuracy measure of more than 80%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madisson, WI, USA, pp. 92–100 (1998)
Google Scholar
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Advances in Neural Information Processing Systems, vol. 13, pp. 409–415. MIT Press (2001)
Google Scholar
Culotta, A., Kristjansson, T., McCallum, A., Viola, P.: Corrective feedback and persistent learning for information extraction. Artif. Intell. 170, 1101–1122 (2006)
Article MathSciNet MATH Google Scholar
Esser, D., Schuster, D., Muthmann, K., Berger, M., Schill, A.: Automatic Indexing of Scanned Documents - a Layout-based Approach. In: Document Recognition and Retrieval XIX (DRR), San Francisco, CA, USA (2012)
Google Scholar
Huang, Y., Mitchell, T.M.: Text clustering with extended user feedback. In: Proc. of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, Seattle, WA, USA, pp. 413–420 (2006)
Google Scholar
Jia, Y., Yan, S., Zhang, C.: Semi-supervised classification on evolutionary data. In: Proceedings of the 21st International Jont Conference on Artifical intelligence, pp. 1083–1088. Morgan Kaufmann Publishers Inc., San Francisco (2009)
Google Scholar
Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on features and instances. J. Mach. Learn. Res. 7, 1655–1686 (2006)
MathSciNet MATH Google Scholar
Saund, E.: Scientific challenges underlying production document processing. In: Document Recognition and Retrieval XVIII, DRR 2011, San Francisco, CA, USA (2011)
Google Scholar
Stumpf, S., Rajaram, V., Li, L., Burnett, M., Dietterich, T., Sullivan, E., Drummond, R., Herlocker, J.: Toward harnessing user feedback for machine learning. In: Proceedings of the 12th International Conference on Intelligent User Interfaces, IUI 2007, Honolulu, HI, USA, pp. 82–91 (2007)
Google Scholar
Stumpf, S., Rajaram, V., Li, L., Wong, W.K., Burnett, M., Dietterich, T., Sullivan, E., Herlocker, J.: Interacting meaningfully with machine learning systems: Three experiments. Int. J. Hum.-Comput. Stud. 67, 639–662 (2009)
Article Google Scholar
Wong, W.K., Oberst, I., Das, S., Moore, T., Stumpf, S., McIntosh, K., Burnett, M.: End-user feature labeling: a locally-weighted regression approach. In: Proceedings of the 16th International Conference on Intelligent User Interfaces, IUI 2011, Palo Alto, CA, USA, pp. 115–124 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Networks, Dept. of Computer Science, TU Dresden, Dresden, Germany
Marcel Hanke, Klemens Muthmann, Daniel Schuster & Alexander Schill
DocuWare AG, Germering, Germany
Kamil Aliyev & Michael Berger

Authors

Marcel Hanke
View author publications
You can also search for this author in PubMed Google Scholar
Klemens Muthmann
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schill
View author publications
You can also search for this author in PubMed Google Scholar
Kamil Aliyev
View author publications
You can also search for this author in PubMed Google Scholar
Michael Berger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad de Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado
VŠB-TU Ostrava 17, Listopadu 15, 70833, Ostrava, Czech Republic
Václav Snášel
Machine Intelligence Research Labs Machine Intelligence Research Labs(MIR Labs),, Scientific Network for Innovation and Research Excellence, P.O. Box 2259, 98071, Auburn, Washington, USA
Ajith Abraham
Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Michał Woźniak
University of the Basque Country, Pº Manuel Lardizabal 1, 20018, San Sebastian, Spain
Manuel Graña
Yonsei University, 134 Shinchon-dong, 120-749, Sudaemoon-ku, Seoul, Korea
Sung-Bae Cho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanke, M., Muthmann, K., Schuster, D., Schill, A., Aliyev, K., Berger, M. (2012). Continuous User Feedback Learning for Data Capture from Business Documents. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28931-6_51

Download citation

DOI: https://doi.org/10.1007/978-3-642-28931-6_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28930-9
Online ISBN: 978-3-642-28931-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics