Abstract
Question answering (QA) is the task of retrieving an answer in response to a question by analyzing documents. Although most of the efforts in developing QA systems are devoted to dealing with electronic text, we consider it is also necessary to develop systems for document images. In this paper, we propose a method of document image retrieval for such QA systems. Since the task is not to retrieve all relevant documents but to find the answer somewhere in documents, retrieval should be precision oriented. The main contribution of this paper is to propose a method of improving precision of document image retrieval by taking into account the co-occurrence of successive terms in a question. The indexing scheme is based on two-dimensional distributions of terms and the weight of co-occurrence is measured by calculating the density distributions of terms. The proposed method was tested by using 1253 pages of documents about the major league baseball with 20 questions and found that it is superior to the baseline method proposed by the authors.
Chapter PDF
Similar content being viewed by others
References
Voorhees, E.M.: Overview of the TREC 2002 Question Answering Track. In: Proc. of Text REtrieval Conference 2002, http://trec.nist.gov/pubs/trec11/t11proceedings.html
Kwok, C.C.T., Etzioni, O., Weld, D.S.: Scaling Question Answering to the Web. In: Proc. WWW, vol. 10, pp. 150–161 (2001)
http://labs.nttrd.com/ (in Japanese)
Kise, K., Tsujino, M., Matsumoto, K.: Spotting Where to Read on Pages — Retrieval of Relevant Parts from Page Images. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 388–399. Springer, Heidelberg (2002)
Kise, K., Yin, W., Matsumoto, K.: Document Image Retrieval Based on 2D Density Distributions of Terms with Pseudo Relevance Feedback. In: Proc. ICDAR 2003, pp. 488–492 (2003)
Information Retrieval and OCR: From Converting Content to Grasping Meaning. In: A Workshop at SIGIR 2002
Doermann, D.: The Indexing and Retrieval of Document Images: A Survey. Computer Vision and Image Processing 70(3), 287–298 (1998)
Kurohashi, S., Shiraki, N., Nagao, M.: A Method for Detecting Important Descriptions of a Word Based on Its Density Distribution in Text. Trans. Information Processing Society of Japan 30(4), 845–853 (1997) (In Japanese)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Pub. Co., Reading (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kise, K., Fukushima, S., Matsumoto, K. (2004). Document Image Retrieval in a Question Answering System for Document Images. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-28640-0_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive