article

Information retrieval and OCR: from converting content to grasping meaning

Authors:
Jamie Callan

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Paul Kantor

Rutgers University, New Brunswick, NJ

Rutgers University, New Brunswick, NJ
View Profile

,
David Grossman

Illinois Institute of Technology, Chicago, IL

Illinois Institute of Technology, Chicago, IL
View Profile

Authors Info & Claims

ACM SIGIR Forum Volume 36 Issue 2Fall 2002pp 58–61https://doi.org/10.1145/792550.792561

Published:01 September 2002Publication History

ACM SIGIR Forum

Abstract

IR and OCR have largely developed independent standards and metrics, with OCR focused on literal accuracy, and IR focused on essential "content/meaning". With more and more media not only paper, but in multiple image formats, the opportunities and challenges for OCR on new formats -- video and still images -- are enormous. While OCR is assessed in metrics that emphasize words and characters, IR has learned to apply end-to-end metrics that ask whether the needs of the users can be met by existing systems. The same considerations apply also to the problem of providing permanent worldwide access to millions of pages of legacy print documents, representing the shared human record as it existed until just a few years ago.The International Society for Optical Engineering (SPIE) has held a series of Document Recognition and Retrieval (DRR) conferences. The tenth, DRR X will be held in January 2003, in Santa Clara California. In 2001, Dan LoPresti of Bell Labs decided that the area would benefit from more intense collaboration between those who specialize in finding the words on a page image, and those researchers who know how to find the right documents, given the words. He invited Paul Kantor (Rutgers) to join the DRR Chairs, and together they invited Dave Lewis (Consultant) to give a keynote address at DRR VIII. Dan then stepped down. Paul chaired DRR IX (2002) and then handed the reins to Tapas Kanungo (IBM, Almaden) and together they invited Jamie Callan (CMU), David Grossman (IIT) and Alex Hauptmann (CMU) to join the conference committee for DRR X.To improve communication between SIGIR and DRR, this group proposed a SIGIR workshop on this area. The workshop on "Information Retrieval and OCR: From Converting Content to Grasping Meaning" was intended to stimulate cross-fertilization between OCR and IR, in hopes that better use of IR will enable the OCR community to avoid expensive hand processing, and to demonstrate that the combination of present static and dynamic image processing and present state-of-the-art robust information retrieval can generate substantial advances in both extraction of messages from image streams and conversion of existing paper variants. It solicited papers dealing with future applications, such as the indexing and retrieval of text embedded in static or video graphic images, with problems of skew, distortion, and obscuration, as well as state-of-the-art discussions of the storage and retrieval of handwritten or print legacy materials.The workshop was held on August 15, 2002 in Tampere, Finland, immediately following the SIGIR 2002 conference. Although the workshop was intended to appeal to a wide range of IR and OCR researchers (and indeed was proposed at the request of colleagues from the OCR community), it primarily drew people with a background in IR. About a dozen people participated. The small size allowed a very interactive, seminar-style format and very vigorous discussion between and during presentations. Most presentations ran 30% to 50% longer than planned, and our impression is that most of the participants found it very productive.

Recommendations

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
Read More
Crowdsourcing for information retrieval

The 2nd SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR 2011) was held on July 28, 2011 in Beijing, China, in conjunction with the 34th Annual ACM SIGIR Conference1. The workshop brought together researchers and practitioners to ...
Read More
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGIR Forum Volume 36, Issue 2
Fall 2002
99 pages
ISSN:0163-5840
DOI:10.1145/792550
Issue’s Table of Contents

Copyright © 2002 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2002
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 421
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Information retrieval and OCR: from converting content to grasping meaning

ACM SIGIR Forum

Abstract

Cited By

Recommendations

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Crowdsourcing for information retrieval

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Information retrieval and OCR: from converting content to grasping meaning

ACM SIGIR Forum

Abstract

Cited By

Recommendations

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Crowdsourcing for information retrieval

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media