Skip to main content

Dynamic Element Retrieval in a Semi-structured Collection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Abstract

This paper describes our methodology for the dynamic retrieval of XML elements, an overview of its implementation in a structured environment, and the challenges introduced by applying it to the INEX Wikipedia [4] collection, which can more aptly be described as semi-structured. Our system is based on the vector space model [9] and its basic functions are performed using the Smart experimental retrieval system [8]. A major change in the system this year is the incorporation of a method for the dynamic computation of query term weights [6] to be correlated with the dynamically generated and weighted element vectors. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (in this case, the paragraph). It returns a rank-ordered list of elements equivalent to that produced by the same query against an all-element index of the collection. (A detailed description of this method appears in [1].) As we move from a well structured collection, such as the INEX IEEE documents, to Wikipedia, changes in the structure of the articles must be accommodated.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Crouch, C.: Dynamic element retrieval in a structured environment. ACM Transactions on Information Systems 24(4), 437–454 (2006)

    Article  Google Scholar 

  2. Crouch, C., Mahajan, A., Bellamkonda, A.: Flexible retrieval based on the vector space model. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 292–302. Springer, Heidelberg (2005)

    Google Scholar 

  3. Crouch, C., Khanna, S., Potnis, P., Daddapaneni, N.: The dynamic retrieval of XML elements. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 268–281. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Denoyer, L., Gallineri, P.: The Wikipedia XML corpus. In: INEX Workshop Pre-Proceedings, pp. 367–372. (2006), http://inex.is.informatik.uni-duisberg.de/2006

  5. Fox, E.A.: Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. Dissertation, Department of Computer Science, Cornell University (1983)

    Google Scholar 

  6. Ganapathibhotla, M.: Query processing in a flexible retrieval environment. M.S. Thesis, Department of Computer Science, University of Minnesota Duluth, Duluth, MN (2006), http://www.d.umn.edu/cs/thesis/Ganapathibhotla.pdf

  7. Khanna, S.: Design and implementation of a flexible retrieval system. M.S. Thesis, Department of Computer Science, University of Minnesota Duluth, Duluth, MN (2005), http://www.d.umn.edu/cs/thesis/khanna.pdf

  8. Salton, G. (ed.): The Smart Rretrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  9. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Comm. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  10. Singhal, A.: AT&T at TREC-6. In: The Sixth Text REtrieval Conf (TREC-6), NIST SP 500-240, pp. 215–225 (1998)

    Google Scholar 

  11. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. of the 19th Annual International ACM SIGIR Conference, pp. 21–29 ( 1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Crouch, C.J., Crouch, D.B., Ganapathibhotla, M., Bakshi, V. (2007). Dynamic Element Retrieval in a Semi-structured Collection. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73888-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73887-9

  • Online ISBN: 978-3-540-73888-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics