Abstract
Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or can not be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alonso, O., Gertz, M.: Clustering of search results using temporal attributes. In: Proceeding of the 29th SIGIR (2006)
Berberich, K., Bedathur, S.J., Neumann, T., Weikum, G.: A time machine for text search. In: Proceedings of SIGIR 2007 (2007)
de Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: Proceedings of AHC 2005 (History and Computing) (2005)
Google Zeitgeist, http://www.google.com/press/zeitgeist.html
Internet Archive, http://archive.org/
Klose, A., Nfirnberger, A., Kruse, R., Hartmann, G., Richards, M.: Interactive text retrieval based on document similarities
Kraaij, W.: Variations on language modeling for information retrieval. SIGIR Forum 39(1), 61 (2005)
Li, X., Croft, W.B.: Time-based language models. In: Proceedings of CIKM 2003 (2003)
Llidó, D.M., Llavori, R.B., Cabo, M.J.A.: Extracting temporal references to assign document event-time periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113. Springer, Heidelberg (2001)
Lochbaum, K.E., Streeter, L.A.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval. Inf. Process. Manage. 25(6), 665–676 (1989)
Mani, I., Wilson, G.: Robust temporal processing of news. In: ACL 2000: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (2000)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Nørvåg, K.: The design, implementation, and performance of the V2 temporal document database system. Journal of Information and Software Technology 46(9), 557–574 (2004)
Nørvåg, K.: Supporting temporal text-containment queries in temporal document databases. Journal of Data & Knowledge Engineering 49(1), 105–125 (2004)
Omar Alonso, M.G., Baeza-Yates, R.: On the value of temporal information in information retrieval. ACM SIGIR Forum 41(2), 35–41 (2007)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of SIGIR 1998 (1998)
Swan, R., Allan, J.: Extracting significant time varying features from text. In: Proceedings of CIKM 1999 (1999)
Swan, R., Jensen, D.: Timemines: Constructing timelines with statistical models of word usage. In: Proceedings of KDD-2000 Workshop on Text Mining (2000)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kanhabua, N., Nørvåg, K. (2008). Improving Temporal Language Models for Determining Time of Non-timestamped Documents. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2008. Lecture Notes in Computer Science, vol 5173. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87599-4_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-87599-4_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87598-7
Online ISBN: 978-3-540-87599-4
eBook Packages: Computer ScienceComputer Science (R0)