Skip to main content

NTLM: A Time-Enhanced Language Model Based Ranking Approach for Web Search

  • Conference paper
Web Information Systems Engineering – WISE 2010 Workshops (WISE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6724))

Included in the following conference series:

  • 985 Accesses

Abstract

Time plays important roles in Web search, because most Web pages contain time information and a lot of Web queries are time-related. However, traditional search engines have little consideration on the time information in Web pages. In particular, they do not take into account the time information of Web pages when ranking search results. In this paper, we present NTLM, a new time-enhanced language model based ranking algorithm for Web search. First, we present an effective algorithm to extract <keyword, content time > pairs for Web pages, which associate each keyword in a Web page with an appropriate content time. Then we introduce the new concept of temporal tf, the time-constrained term frequency, for each keyword. After that, we propose a time-enhanced language model to measure the similarity between temporal-textual queries and Web pages on the basis of the combination of textual relevance and temporal relevance. We conduct comparison experiments between NTLM and five competitor algorithms and use two datasets, different types of queries, and two metrics as MRR and NDCG to evaluate the performance. The experimental results show that in the step of extracting <keyword, content time > pairs, NTLM reaches a high precision of 93.2%, and in the ranking step, NTLM wins the best with respect to MRR and NDCG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Metzler, D., Jones, R., Peng, F., Zhang, R.: Improving Search Relevance for Implicitly Temporal Queries. In: Proc. of SIGIR (2009)

    Google Scholar 

  2. Nunes, S., Ribeiro, C., David, G.: Use of Temporal Expressions in Web Search. In: Advances in Information Retrieval, Proc. of 30th European Conference on IR Research, ECIR, pp. 580–584 (2008)

    Google Scholar 

  3. ICTCLAS, http://www.ictclas.org/

  4. Yamron, J.: Topic Detection and Tracking Segmentation Task. In: Proc. of the Topic Detection and Tracking Workshop (1997)

    Google Scholar 

  5. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proc. of SIGIR, pp. 275–281 (1998)

    Google Scholar 

  6. Hiemstra, D.: Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval The Importance of a Query Term. In: SIGIR, pp. 35–41 (2002)

    Google Scholar 

  7. Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)

    Article  Google Scholar 

  8. Smucker, M.D., Allan, J.: An Investigation of Dirichlet Prior Smoothing’s Performance Advantage, Technical Report IR-548, Center for Intelligent Information Retrieval (CIIR), Department of Computer Science, University of Massachusetts Amherst (2007)

    Google Scholar 

  9. History section of China, http://zh.wikipedia.org/zh-cn/Category:中 国 历 史

    Google Scholar 

  10. The qq significant events segment, http://news.qq.com/topic/feature.htm

  11. The qq tech segment, http://tech.qq.com/

  12. The qq news segment, http://news.qq.com/

  13. TREC Question Answering Track, http://trec.nist.gov/data/qamain.html

  14. Jarvelin, K., Kekalainen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)

    Article  Google Scholar 

  15. Yoshioka, M., Haraguchi, M.: Study on the Combination of Probabilistic and Boolean IR Models for WWW Documents Retrieval. In: Proc. of NTCIR-4 WEB, pp. 9–16 (2004)

    Google Scholar 

  16. Baeza- Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  17. Robertson, S. E., Walker, S.: Okapi/keenbow at TREC-8. NIST Special publication: The Eighth Text Retrieval Conference (TREC 8), p. 151 (1999)

    Google Scholar 

  18. Arıkan, E.: Exploiting Temporal References in Text Retrieval, Master’s Thesis in Computer Science, Saarbruecken University (2009)

    Google Scholar 

  19. Li, X., Croft, W.B.: Time-Based Language Models. In: Proc. of CIKM, pp. 469–475 (2003)

    Google Scholar 

  20. Wechsler, M.: The Probability Ranking Principle Revisited. Information Retrieval 3(3), 217–227 (2000)

    Article  MATH  Google Scholar 

  21. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of WWW, pp. 107–117 (1998)

    Google Scholar 

  22. Deniz, E., Chris, F., Terence, J.P.: Chronica: a Temporal Web Search Engine. In: Proc. of ICWE, pp. 119–120 (2006)

    Google Scholar 

  23. Dyreson, C., Lin, H., Wang, Y.: Managing Versions of Web Documents in a Transaction-time Web Server. In: Proc. of WWW, pp. 422–432 (2004)

    Google Scholar 

  24. Berberich, K., Bedathur, S.J., Neumann, T., Weikum, G.: A Time Machine for Text Search. In: Proc. of SIGIR, pp. 519–526 (2007)

    Google Scholar 

  25. Yu, P.S., Li, X., Liu, B.: On the Temporal Dimension of Search. In: Proc. of WWW, poster, pp. 448–449 (2004)

    Google Scholar 

  26. Tezuka, T., Tanaka, K.: Temporal and spatial attribute extraction from web documents and time-specific regional web search system. In: Kwon, Y.-J., Bouju, A., Claramunt, C. (eds.) W2GIS 2004. LNCS, vol. 3428, pp. 14–25. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  27. Song, F., Croft, W.B.: A General Language Model for Information Retrieval. In: Proc. of SIGIR, pp. 279–280 (1999)

    Google Scholar 

  28. Hiemstra, D.: Using Language Models for Information Retrieval, PhD thesis, University of Twente (2001)

    Google Scholar 

  29. Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proc. of SIGIR, pp. 111–119 (2001)

    Google Scholar 

  30. Dakka, W., Gravano, L., Ipeirotis, P.G.: Answering General Time-Sensitive Queries. In: Proc. of CIKM, pp. 1437–1438 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, X., Jin, P., Zhao, X., Chen, H., Yue, L. (2011). NTLM: A Time-Enhanced Language Model Based Ranking Approach for Web Search. In: Chiu, D.K.W., et al. Web Information Systems Engineering – WISE 2010 Workshops. WISE 2010. Lecture Notes in Computer Science, vol 6724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24396-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24396-7_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24395-0

  • Online ISBN: 978-3-642-24396-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics