Skip to main content

Generating and Retrieving Text Segments for Focused Access to Scientific Documents

  • Conference paper
Advances in Information Retrieval (ECIR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3936))

Included in the following conference series:

Abstract

When presented with a retrieved document, users of a search engine are usually left with the task of pinning down the relevant information inside the document. Often this is done by a time-consuming combination of skimming, scrolling and Ctrl+F. In the setting of a digital library for scientific literature the issue is especially urgent when dealing with reference works, such as surveys and handbooks, as these typically contain long documents. Our aim is to develop methods for providing a “go-read-here” type of retrieval functionality, which points the user to a segment where she can best start reading to find out about her topic of interest. We examine multiple query-independent ways of segmenting texts into coherent chunks that can be returned in response to a query. Most (experienced) authors use paragraph breaks to indicate topic shifts, thus providing us with one way of segmenting documents. We compare this structural method with semantic text segmentation methods, both with respect to topical focus and relevancy. Our experimental evidence is based on manually segmented scientific documents and a set of queries against this corpus. Structural segmentation based on contiguous blocks of relevant paragraphs is shown to be a viable solution for our intended application of providing “go-read-here” functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agosti, M., Allan, J. (eds.): Methods and Tools for the Automatic Construction of Hypertext. Special Issue of Information Processing and Management, vol. 33. Elsevier Science Ltd., Amsterdam (1997)

    Google Scholar 

  2. Allan, J.: Building hypertext using information retrieval. Information Precessing and Management 33(2), 145–159 (1997)

    Article  Google Scholar 

  3. Baron, L., Tague-Sutcliffe, J., Kinnucan, M.T., Carey, T.: Labeled, typed links as cues when reading hypertext documents. Journal of the American Society for Information Science 47(12), 896–908 (1996)

    Article  Google Scholar 

  4. Brown, G., Yule, G.: Cambridge Textbooks in Linguistics Series. Cambridge University Press, Cambridge (1983)

    Google Scholar 

  5. Callan, J.P.: Passage-level evidence in document retrieval. In: Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 302–310 (July 1994)

    Google Scholar 

  6. Choi, F.: Advances in independent linear text segmentation. In: Proc. of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL 2000), pp. 26–33 (2000)

    Google Scholar 

  7. Choi, F.: Linear text segmentation: approaches, advances and applications. In: Proc. of CLUK3 (2000)

    Google Scholar 

  8. Cohen, J.: The coefficient of agreement for nominal scales. Educational and Psychological Measurement 21(1), 37–46 (1960)

    Article  Google Scholar 

  9. Conklin, J.: Hypertext: An introduction and survey. Computer 20(9), 17–41 (1987)

    Article  Google Scholar 

  10. de Vries, A.P., Kazai, G., Lalmas, M.: Tolerance to irrelevance: A usereffort oriented evaluation of retrieval systems without predefined retrieval unit. In: Recherche d’Informations Assistee par Ordinateur (RIAO 2004) (April 2004)

    Google Scholar 

  11. DeRose, S.J.: Expanding the notion of links. In: Proc. of Hypertext 1999, pp. 249–257 (1989)

    Google Scholar 

  12. Harper, D.J., Coulthord, S., Yixing, S.: A language modeling approach to relevance profiling for document browsing. In: Proc. of JCDL (2002)

    Google Scholar 

  13. Hearst, M.A.: Context and Structure in Automated Full-text Information Access. PhD thesis, University of California at Berkeley (1994)

    Google Scholar 

  14. Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proc. 32nd ACL (1994)

    Google Scholar 

  15. Hearst, M.A.: Tilebars: visualization of term distribution information in full text information access. In: Proc. of CHI 1995 (1995)

    Google Scholar 

  16. Hearst, M.A., Plaunt, C.: Subtopic structuring for full-lenght document access. In: Proc. of the 16th Annual International ACM SIGIR Conference on Research and Development in IR, pp. 59–68 (1993)

    Google Scholar 

  17. INEX. INitiative for the Evaluation of XML Retrieval (2004), http://inex.is.informatik.uni-duisburg.de:2004/

  18. Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proc. of SIGIR 1997, pp. 178–185 (1997)

    Google Scholar 

  19. Lesk, M.: Understanding Digital Libraries, 2nd edn. The Morgan Kaufmann series in multimedia information and systems. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  20. Manning, C.: Rethinking text segmentation models: An information extraction case study. Technical Report SULTRY-98-07-01, University of Sydney (1998)

    Google Scholar 

  21. Muskens, R., van Benthem, J., Visser, A.: Dynamics. In: Handbook of Logic and Language. Elsevier, Amsterdam (1997)

    Google Scholar 

  22. O’Neill, M., Denos, M.: Practical approach to the stereo matching of urban imagery. Image and Vision Computing 10(2), 89–98 (1992)

    Article  Google Scholar 

  23. Ponte, J.M., Croft, W.B.: Text segmentation by topic. In: European Conference on Digital Libraries, pp. 113–125 (1997)

    Google Scholar 

  24. Rabiner, L.W., Schafer, R.W.: Digital processing of speech signals. Prentice-Hall, Inc., Englewood Cliffs (1978)

    Google Scholar 

  25. Reynar, J.C.: Topic Segmentation: Algorithms and Applications. PhD thesis, University of Pennsylvania (1998)

    Google Scholar 

  26. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 25, 513–523 (1988)

    Article  Google Scholar 

  27. Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proc. of the 16th Annual International ACM/SIGIR Conference, Pittsburgh, USA, pp. 49–58 (1993)

    Google Scholar 

  28. Salton, G., Allan, J., Singhal, A.: Automatic text decomposition and structuring. Information Processing and Management 32(2), 127–138 (1996)

    Article  Google Scholar 

  29. Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: Proc. of the 7th ACM Conference on Hypertext, Washington, DC, USA (1996)

    Google Scholar 

  30. Skorochod’ko, E.: Adaptive method of automatic abstracting and indexing. Information Processing 71, 1179–1182 (1972)

    Google Scholar 

  31. Stokes, N., Carthy, J., Smeaton, A.F.: Segmenting broadcast news streams using lexical chaining. In: Vidal, T., Liberatore, P. (eds.) Proc. of STAIRS 2002, vol. 1, pp. 145–154. IOS Press, Amsterdam (2002)

    Google Scholar 

  32. Tenopir, C., King, D.W.: Reading behaviour and electronic journals. Learned Publishing 15(4), 159–165 (2002)

    Article  Google Scholar 

  33. Trigg, R.: A network approach to text handling for the online scientifc community. PhD thesis, University of Maryland (1983)

    Google Scholar 

  34. van Benthem, J., ter Meulen, A. (eds.): Handbook of Logic and Language. Elsevier, Amsterdam (1997)

    MATH  Google Scholar 

  35. van Dijk, T.: Some Aspects of Text Grammar. Mouton (1972)

    Google Scholar 

  36. van Eijck, J., Kamp, H.: Representing discourse in context. In: Handbook of Logic and Language. Elsevier, Amsterdam (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Caracciolo, C., de Rijke, M. (2006). Generating and Retrieving Text Segments for Focused Access to Scientific Documents. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_31

Download citation

  • DOI: https://doi.org/10.1007/11735106_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33347-0

  • Online ISBN: 978-3-540-33348-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics