Skip to main content

Quality Assurance Tool Suite for Error Detection in Digital Repositories

  • Conference paper
The Emergence of Digital Libraries – Research and Practices (ICADL 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8839))

Included in the following conference series:

  • 2004 Accesses

Abstract

Digitization workflows for automatic acquisition of image collections are susceptible to errors and require quality assurance. This paper presents the automated quality assurance tools aiming at detection of possible quality issues that supports decision making for document image collections. The main contribution of this research is the implementation of various image processing tools for different error detection scenarios and their combination in to a single tool suite. The tool suite includes: (1) The matchbox tool for accurate near-duplicate detection in document image collections, based on SIFT feature extraction. (2) The finger detection tool aims at automatic detection of fingers that mistakenly appear in scans from digitized image collections, which uses processing techniques for edge detection, local image information extraction and its analysis for reasoning on scan quality. (3) The cropping error detection tool supports the detection of common cropping problems such as text shifted to the edge of the image, unwanted page borders, or unwanted text from a previous page on the image. Another important contribution of this work is a definition of the quality assurance workflow and its automatic execution for error detection in digital document collections. The presented tool suite detects described errors and presents them for additional manual analysis and collection cleaning. A statistical overview of evaluated data and characteristics like performance and accuracy is delivered. The results of the analysis confirm our hypothesis that an automated approach is able to detect errors with reliable quality, thus making quality control for large digitisation projects a feasible and affordable process.

This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Canny, J.: A computational approach to edge detection. IEEE Trans. Pat. Anal. Mach. Intell., 679–698 (1986)

    Google Scholar 

  2. Csurka, G., Dance, C.R., Fan, L., Willamowski, J.: Visual categorization with bags of keypoints. In: Workshop on SLCV, ECCV, pp. 1–22 (2004)

    Google Scholar 

  3. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1627–1645 (2010)

    Article  Google Scholar 

  4. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  5. Graf, R., King, R.: Finger detection for quality assurance of digitized image collections. In: Archiving Conference (2013)

    Google Scholar 

  6. Lu, G., Phillips, J.: Using perceptually weighted histograms for colour-based image retrieval. In: Fourth International Conference on Signal Processing, vol. 2 (1998)

    Google Scholar 

  7. Huber-Mörk, R., Schindler, A.: Quality assurance for document image collections in digital preservation. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P., Zemčík, P. (eds.) ACIVS 2012. LNCS, vol. 7517, pp. 108–119. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Huber-Mörk, R., Schindler, A.: Quality assurance for document image collections in digital preservation. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P., Zemčík, P. (eds.) ACIVS 2012. LNCS, vol. 7517, pp. 108–119. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Ke, Y., Sukthankar, R., Huston, L.: An efficient parts-based near-duplicate and sub-image retrieval system. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, MULTIMEDIA 2004, pp. 869–876. ACM, New York (2004)

    Google Scholar 

  10. Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document images analysis solutions for digital libraries, document image analysis for libraries. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 2–24 (2004)

    Google Scholar 

  11. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. of Comput. Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  12. Marr, D., Hildreth, E.: Theory of edge detection. In: Proc. of the Royal Soc. London, pp. 187–217 (1980)

    Google Scholar 

  13. Meyer, F.: Color image segmentation. In: Image Processing and its Applications, pp. 303–306 (1992)

    Google Scholar 

  14. Graf, R., King, R., Schlarb, S.: Blank page and duplicate detection for quality assurance of document image collections. In: APA CDAC 2014 (2014)

    Google Scholar 

  15. Wu, X., Zhao, W.-L., Ngo, C.-W.: Near-duplicate keyframe retrieval with visual keywords and semantic context. In: Proc. of the 6th ACM ICIVR, pp. 162–169. ACM, New York (2007)

    Google Scholar 

  16. Zhao, W.-L., Ngo, C.-W., Tan, H.-K., Wu, X.: Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Transactions on Multimedia 9(5), 1037–1048 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Graf, R., King, R. (2014). Quality Assurance Tool Suite for Error Detection in Digital Repositories. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham. https://doi.org/10.1007/978-3-319-12823-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12823-8_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12822-1

  • Online ISBN: 978-3-319-12823-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics