Abstract
Automatically recognizing in large electronic texts short selfcontained passages relevant for a user query is necessary for fast and accurate information access to large text archives. Surprisingly, most search engines practically do not provide any help to the user in this tedious task, just presenting a list of whole documents supposedly containing the requested information. We show how different sources of evidence can be combined in order to assess the quality of different passages in a document and present the highest ranked ones to the user. Specifically, we take into account the relevance of a passage to the user query, structural integrity of the passage with respect to paragraphs and sections of the document, and topic integrity with respect to topic changes and topic threads in the text. Our experiments show that the results are promising.
Work done under partial support of the ITRI of Chung-Ang University, Korea, and for the first author, Korean Government (KIPA) and Mexican Government (SNI, CONACyT, The first author is currently on Sabbatical leave at Chung-Ang University.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Bolshakov, A.G.: Text segmentation into paragraphs based on local text cohesion. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 158–166. Springer, Heidelberg (2001)
Cardie, C.: Empirical Methods in Information Extraction. AI Magazine 18 (4), 65–79 (1997)
Clarke, C.L.A., Cormack, G.V., Lynam, T.R., Terra, E.L.: Question Answering by Passage Selection. In: Advances in Open Domain Question Answering, Kluwer Academic Publishers, Kluwer (2004)
Cormack, G.V., Clarke, C.L.A., Palmer, C.R., To, S.S.L.: Passage-Based Query Refinement. Information Processing and Management 36(1), 133–153 (2000)
Del-Castillo-Escobedo, A., Montes-y-Gómez, M., Villaseñor-Pineda, L.: QA on the Web: A Preliminary Study for Spanish Language. In: Proc. of ENC-2004, IEEE, Los Alamitos (2004)
Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An electronic lexical database, The MIT Press, Cambridge (1998)
LLopis, F., Vicedo, J.L., Ferrández, A.: Passage Selection to Improve Question Answering. In: Multilingual Summarization and Question Answering, COLING 2002 (2002)
Mochizuki, H., Iwayama, M., Okumura, M.: Passage-Level Document Retrieval Using Lexical Chains. RIAO 2000, 491–506 (2000)
Nakao, Y.: A Method for Related-passage Extraction based on Thematic Hierarchy. IPSJ Transactions on Databases 42 (SIG 10 (TOD 11)), 39–53 (2001)
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: 16th annual international ACM SIGIR conf. on Research and development in information retrieval, US, pp. 49–58 (1993)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summarization. In: Mani, I., Maybury, M. (eds.) Advances in automatic text summarization, MIT, Cambridge (1999)
Page, L., Brin, S.: The anatomy of a large-scale hypertextual web search engine. In: Proc. 7th Intl. WWW Conf., pp. 107–117 (1998)
Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relatedness for Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer, Dordrecht (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelbukh, A., Kang, N., Han, S. (2005). Combining Sources of Evidence for Recognition of Relevant Passages in Texts. In: Ramos, F.F., Larios Rosillo, V., Unger, H. (eds) Advanced Distributed Systems. ISSADS 2005. Lecture Notes in Computer Science, vol 3563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11533962_25
Download citation
DOI: https://doi.org/10.1007/11533962_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28063-7
Online ISBN: 978-3-540-31674-9
eBook Packages: Computer ScienceComputer Science (R0)