Skip to main content
Log in

A comparison of automatic and manual zoning

An information retrieval prospective

  • Published:
Document Analysis and Recognition Aims and scope Submit manuscript

Abstract.

In this paper, we study the effects of automatic zoning on retrieval and ranking variability. We will show that OCR-generated text from automatic zoning, followed by postprocessing, produces retrieval results equivalent to OCR-generated text from manual zoning. We further show that there is a strong linear association between the ranked query results obtained from these two methods of zoning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Autonomy Inc (1999) San Francisco, CA Autonomy Knowledge Server, 2.2.0 edn

  2. Croft WB, Harding S, Taghva K, Borsack J (1994) An evaluation of information retrieval accuracy with simulated OCR output. In: Proceedings of the 3rd symposium on document analysis and information retrieval, Las Vegas, NV, April 1994, pp 115-126

  3. Harman D (1992) Ranking algorithms. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ, pp 363-392

  4. Hawking D (1996) Document retrieval in ocr-scanned text. In: Proceedings of the 6th parallel computing workshop, paper P2-F, Kawasaki, Japan, November 1996

  5. Nartker T, Young R (2002) OCR accuracy produced by the current DOE document conversion system. Technical Report 2002-06, Information Science Research Institute, University of Nevada, Las Vegas

  6. Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York

  7. Scansoft Inc (2000) Peabody, MA Recognition API manual, v10 edn

  8. Science Applications International Corporation (1990) Capture station simulation lessons learned. Final report for the Licensing Support System prepared under contract DE-AC01-87RW00084 for the U.S. Department of Energy, Office of Civilian Radioactive Waste Management, Washington, DC

  9. Singhal A, Salton G, Buckley C (1996) Length normalization in degraded text collections. In: Proceedings of the 5th annual symposium on document analysis and information retrieval, Las Vegas, NV, April 1996, pp 149-162

  10. Taghva K, Borsack J, Condit A (1994) An expert system for automatically correcting OCR output. In: Proceedings of IS&T/SPIE 1994 international symposium on electronic imaging science and technology, San Jose, CA, February 1994, pp 270-278

  11. Taghva K, Borsack J, Condit A (1994) Results of applying probabilistic IR to OCR text. In: Proceedings of the 17th international ACM/SIGIR conference on research and development in information retrieval, Dublin, Ireland, July 1994, pp 202-211

  12. Taghva K, Borsack J, Condit A (1996) Effects of OCR errors on ranking and feedback using the vector space model. J Inf Process Manage 32(3):317-327

    Google Scholar 

  13. Taghva K, Borsack J, Condit A (1996) Evaluation of model-based retrieval effectiveness with OCR text. ACM Trans Inf Sys 14(1):64-93

    Google Scholar 

  14. Taghva K, Borsack J, Condit A, Erva S (1994) The effects of noisy data on text retrieval. J Am Soc Inf Sci 45(1):50-58

    Google Scholar 

  15. Taghva K, Condit A, Borsack J, Kilburg J, Wu C, Gilbreth J (1998) The MANICURE document processing system. In: Proceedings of the IS&T/SPIE 1998 international symposium on electronic imaging science and technology, San Jose, CA, January 1998

  16. Taghva K, Coombs J (2002) Hairetes: a search engine for OCR documents. In: Proceedings of Document Analysis Systems V: 5th international workshop, Princeton, NJ, August 2002. Lecture notes in computer science, vol 2423. Springer, Berlin Heidelberg New York, pp 412-422

Download references

Author information

Authors and Affiliations

Authors

Additional information

Received: 17 July 2003, Accepted: 18 October 2003, Published online: 6 February 2004

Information Science Research Institute: e-mail isri@isri.unlv.edu

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taghva, K., Borsack, J., Lumos, S. et al. A comparison of automatic and manual zoning. IJDAR 6, 230–235 (2003). https://doi.org/10.1007/s10032-003-0116-x

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-003-0116-x

Keywords

Navigation