Skip to main content

Logistic Regression and EVIs for XML Books and the Heterogeneous Track

  • Conference paper
Focused Access to XML Documents (INEX 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4862))

Abstract

For this year’s INEX UC Berkeley focused on the Book track and the Heterogeneous track, For these runs we used the TREC2 logistic regression probabilistic model with blind feedback as well as Entry Vocabulary Indexes (EVIs) for the Books Collection MARC data. For the full text records of the book track we encountered a number of interesting problems in setting up the database, and ended up using page-level indexing of the full collection.

As (once again) the only group to actually submit runs for the Het track, we are guaranteed both the highest, and lowest, effectiveness scores for each task. However, because it was again deemed pointless to conduct the actual relevance assessments on the submissions of a single system, we do not know the exact values of these results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan, J.: Distributed information retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval: Recent research from the Center for Intelligent Information Retrieval, ch. 5, pp. 127–150. Kluwer, Boston (2000)

    Google Scholar 

  2. Callan, J.P., Lu, Z., Croft, W.B.: Searching Distributed Collections with Inference Networks. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, pp. 21–28. ACM Press, New York (1995)

    Chapter  Google Scholar 

  3. Chen, A.: Multilingual information retrieval using english and chinese queries. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 44–58. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Chen, A.: Cross-Language Retrieval Experiments at CLEF 2002. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 28–48. Springer, Heidelberg (2003)

    Google Scholar 

  5. Chen, A., Gey, F.C.: Multilingual information retrieval using machine translation, relevance feedback and decompounding. Information Retrieval 7, 149–182 (2004)

    Article  Google Scholar 

  6. Cooper, W.S., Chen, A., Gey, F.C.: Full Text Retrieval based on Probabilistic Equations with Coefficients fitted by Logistic Regression. In: Text REtrieval Conference (TREC-2), pp. 57–66 (1994)

    Google Scholar 

  7. Cooper, W.S., Gey, F.C., Chen, A.: Full text retrieval based on a probabilistic equation with coefficients fitted by logistic regression. In: Harman, D.K. (ed.) The Second Text Retrieval Conference (TREC-2) (NIST Special Publication 500-215), Gaithersburg, MD, pp. 57–66. National Institute of Standards and Technology (1994)

    Google Scholar 

  8. Cooper, W.S., Gey, F.C., Dabney, D.P.: Probabilistic retrieval based on staged logistic regression. In: 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21-24, 1992, pp. 198–210. ACM Press, New York (1992)

    Chapter  Google Scholar 

  9. Gey, F., Buckland, M., Chen, A., Larson, R.: Entry vocabulary – a technology to enhance digital search. In: Proceedings of HLT 2001, First International Conference on Human Language Technology, San Diego, March 2001, pp. 91–95 (2001)

    Google Scholar 

  10. Gravano, L., García-Molina, H.: Generalizing GlOSS to vector-space databases and broker hierarchies. In: International Conference on Very Large Databases, VLDB, pp. 78–89 (1995)

    Google Scholar 

  11. Gravano, L., García-Molina, H., Tomasic, A.: GlOSS: text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)

    Article  Google Scholar 

  12. Harman, D.: Relevance feedback and other query modification techniques. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 241–263. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  13. Larson, R.R.: Classification clustering, probabilistic information retrieval, and the online catalog. Library Quarterly 61(2), 133–173 (1991)

    Article  MathSciNet  Google Scholar 

  14. Larson, R.R.: Evaluation of advanced retrieval techniques in an experimental online catalog. Journal of the American Society for Information Science 43(1), 34–53 (1992)

    Article  Google Scholar 

  15. Larson, R.R.: A logistic regression approach to distributed IR. In: SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August 11-15, 2002, pp. 399–400. ACM Press, New York (2002)

    Chapter  Google Scholar 

  16. Larson, R.R.: Distributed IR for digital libraries. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 487–498. Springer, Heidelberg (2003)

    Google Scholar 

  17. Larson, R.R.: A fusion approach to XML structured document retrieval. Information Retrieval 8, 601–629 (2005)

    Article  MathSciNet  Google Scholar 

  18. Larson, R.R.: Probabilistic retrieval approaches for thorough and heterogeneous xml retrieval. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 318–330. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Larson, R.R.: Probabilistic retrieval, component fusion and blind feedback for XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 225–239. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for xml retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 73–84. Springer, Heidelberg (2005)

    Google Scholar 

  21. Petras, V., Gey, F., Larson, R.: Domain-specific CLIR of english, german and russian using fusion and subject metadata for query expansion. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 226–237. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science, 129–146, May–June (1976)

    Google Scholar 

  23. Voorhees, E., Harman, D. (eds.): The Seventh Text Retrieval Conference (TREC-7). NIST (1998)

    Google Scholar 

  24. Voorhees, E., Harman, D. (eds.): The Eighth Text Retrieval Conference (TREC-8). NIST (1999)

    Google Scholar 

  25. Xu, J., Callan, J.: Effective retrieval with distributed collections. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 112–120 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Norbert Fuhr Jaap Kamps Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Larson, R.R. (2008). Logistic Regression and EVIs for XML Books and the Heterogeneous Track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds) Focused Access to XML Documents. INEX 2007. Lecture Notes in Computer Science, vol 4862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85902-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85902-4_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85901-7

  • Online ISBN: 978-3-540-85902-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics