Skip to main content

Improving Temporal Language Models for Determining Time of Non-timestamped Documents

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5173))

Abstract

Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or can not be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alonso, O., Gertz, M.: Clustering of search results using temporal attributes. In: Proceeding of the 29th SIGIR (2006)

    Google Scholar 

  2. Berberich, K., Bedathur, S.J., Neumann, T., Weikum, G.: A time machine for text search. In: Proceedings of SIGIR 2007 (2007)

    Google Scholar 

  3. de Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: Proceedings of AHC 2005 (History and Computing) (2005)

    Google Scholar 

  4. Google Zeitgeist, http://www.google.com/press/zeitgeist.html

  5. Internet Archive, http://archive.org/

  6. Klose, A., Nfirnberger, A., Kruse, R., Hartmann, G., Richards, M.: Interactive text retrieval based on document similarities

    Google Scholar 

  7. Kraaij, W.: Variations on language modeling for information retrieval. SIGIR Forum 39(1), 61 (2005)

    Article  Google Scholar 

  8. Li, X., Croft, W.B.: Time-based language models. In: Proceedings of CIKM 2003 (2003)

    Google Scholar 

  9. Llidó, D.M., Llavori, R.B., Cabo, M.J.A.: Extracting temporal references to assign document event-time periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Lochbaum, K.E., Streeter, L.A.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval. Inf. Process. Manage. 25(6), 665–676 (1989)

    Article  Google Scholar 

  11. Mani, I., Wilson, G.: Robust temporal processing of news. In: ACL 2000: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (2000)

    Google Scholar 

  12. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  13. Nørvåg, K.: The design, implementation, and performance of the V2 temporal document database system. Journal of Information and Software Technology 46(9), 557–574 (2004)

    Article  Google Scholar 

  14. Nørvåg, K.: Supporting temporal text-containment queries in temporal document databases. Journal of Data & Knowledge Engineering 49(1), 105–125 (2004)

    Article  Google Scholar 

  15. Omar Alonso, M.G., Baeza-Yates, R.: On the value of temporal information in information retrieval. ACM SIGIR Forum 41(2), 35–41 (2007)

    Article  Google Scholar 

  16. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of SIGIR 1998 (1998)

    Google Scholar 

  17. Swan, R., Allan, J.: Extracting significant time varying features from text. In: Proceedings of CIKM 1999 (1999)

    Google Scholar 

  18. Swan, R., Jensen, D.: Timemines: Constructing timelines with statistical models of word usage. In: Proceedings of KDD-2000 Workshop on Text Mining (2000)

    Google Scholar 

  19. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Birte Christensen-Dalsgaard Donatella Castelli Bolette Ammitzbøll Jurik Joan Lippincott

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kanhabua, N., Nørvåg, K. (2008). Improving Temporal Language Models for Determining Time of Non-timestamped Documents. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2008. Lecture Notes in Computer Science, vol 5173. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87599-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87599-4_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87598-7

  • Online ISBN: 978-3-540-87599-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics