Skip to main content
Log in

Time segment language model for microblog retrieval

  • S.I. : Higher Level Artificial Neural Network Based Intelligent Systems
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Related studies have shown that the time characteristics of microblog can improve retrieval performance. However, these researches mainly focus on the time distribution of tweets related to a given query. And this single time characteristics might not be sufficient to reflect time characteristics of microblog. Inspired by the recent success of time-based language models for microblog retrieval, this paper proposes a time segment language model (TSLM) to model the time characteristics of microblog. Briefly, TSLM constructs the language model of each time segment to model the probability distribution over sequences of words for each different time segment. Based on TSLM, the time distribution of terms (tDT), the time distribution of queries (tDQ) and the time distribution of documents (tDD) are proposed. Furthermore, TSLM is exploited to estimate the query model, the document model and compute the similarity between query and document. The experimental results on the Tweets2011 corpus show that the proposed approaches outperform several state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://twittertools.cc/.

  2. The corpus contains about 16 million tweets, but some microblogs were not downloaded for the reasons such as having be deleted, hidden, and so on.

  3. http://nutch.apache.org/.

  4. http://www.lemurproject.org/indri/.

References

  1. Campos R, Dias G, Jorge AM et al (2017) Identifying top relevant dates for implicit time sensitive queries. Inf Retr J 20(4):363–398

    Article  Google Scholar 

  2. Martins F, Magalhães J, Callan J (2019) Modeling temporal evidence from external collections. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 159–167

  3. Rao J, Ture F, Niu X, et al (2017) Mining the temporal statistics of query terms for searching social media posts. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval, pp 133–140

  4. Chen Q, Hu Q, Huang JX et al (2018) Taker: fine-grained time-aware microblog search with kernel density estimation. IEEE Trans Knowl Data Eng 30(8):1602–1615

    Article  Google Scholar 

  5. Efron M, Golovchinsky G (2011) Estimation methods for ranking recent information. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, Beijing, China, pp 495–504

  6. Keikha M, Shima Gi, Fabio C (2011) Time-based relevance models. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, Beijing, China, pp 1087–1088

  7. Choi J, Croft W B (2012) Time models for microblogs. In: Proceedings of the 21st ACM international conference on Information and knowledge management, Maui, USA, pp 2491–2494

  8. Peetz MH, Edgar M, Maarten de R, Wouter W (2012) Adaptive time query modeling. In: Proceedings of the 34th European conference on information retrieval research, Barcelona, Spain, pp 455–458

  9. Li X, Croft WB (2003) Time-based language models. In: Proceedings of the 12th international conference on information and knowledge management, New Orleans, USA, pp 469–475

  10. Dong A, Zhang R, Kolari P, et al (2010) Time is of the essence: improving recency ranking using twitter data. In: Proceedings of the 19th international conference on World wide web, Raleigh, USA, pp 331–340

  11. Cheng S, Arvanitis A, Hristidis V (2013) How fresh do you want your search results?. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management, San Francisco, USA, ACM, pp 1271–1280

  12. Dakka W, Gravano L, Ipeirotis PG (2012) Answering general time-sensitive queries. IEEE Trans Knowl Data Eng 24(2):220–235

    Article  Google Scholar 

  13. Efron M, Lin J, He J, et al (2014) Time feedback for tweet search with non-parametric density estimation. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, Gold Coast, Australia, pp 33–42

  14. Lin J, Efron M (2013) Time relevance profiles for tweet search. In: Proceedings of the 36th annual international ACM SIGIR conference on research and development in information retrieval workshop on time-aware information access, Dublin, Ireland

  15. Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the c in information retrieval, Melbourne, Australia, pp 275-281

  16. Teevan J, Ramage D, Morris MR (2011) #TwitterSearch: a comparison of microblog search and web search. In: Proceedings of the fourth ACM international conference on Web search and data mining, Hong Kong, China, pp 35–44

  17. Gao J, Xu G, Xu J (2013) Query expansion using path-constrained random walks. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, Dublin, Ireland, pp 563–572

  18. Wei B, Wang B (2014) Time-aware mixed language model for microblog search. Chin J Comput 37(1):229–237

    Google Scholar 

  19. Metzler D, Cai C, Hovy E (2012) Structured event retrieval over microblog archives. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montreal, Canada, pp 646–655

  20. Albishre K, Li Y, Xu Y (2018) Query-based automatic training set selection for microblog retrieval. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, Cham, pp 325–336

  21. Albishre K, Li Y, Xu Y (2017) Effective pseudo-relevance for microblog retrieval. In: Proceedings of the Australasian computer science week multiconference

  22. Chy AN, Ullah MZ, Aono M (2017) Microblog retrieval using ensemble of feature sets through supervised feature selection. IEICE Trans Inf Syst 100(4):793–806

    Article  Google Scholar 

  23. Lavrenko V, Croft WB (2001) Relevance based language models. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, pp 120–127

  24. Bai J, Song D, Bruza P, et al (2005) Query expansion using term relationships in language models for information retrieval. In: Proceedings of the 14th ACM international conference on information and knowledge management, Bremen, Germany, pp 688–695

  25. Cao G, Nie J Y, Gao J, et al (2008) Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, Singapore, Singapore, pp 243–250

  26. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1):1–50

    Article  Google Scholar 

  27. Tao T, Wang X, Mei Q, et al (2006) Language model information retrieval with document expansion. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, association for computational linguistics, Morristown, pp 407–414

  28. Mei Q, Zhang D, Zhai CX (2008) A general optimization framework for smoothing language models on graph structures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, Singapore, pp 611–618

  29. Liu X, Croft WB (2004) Cluster-based retrieval using language models. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, Sheffield, UK, pp 186–193

  30. Ramage D, Dumais ST, Liebling DJ (2010) Characterizing Microblogs with Topic Models. In: The 4th international conference on weblogs and social media, Washington, DC

  31. Liang S, Ren Z, de Rijke M (2014) The impact of semantic document expansion on cluster-based fusion for microblog search. In: The 36th European conference on information retrieval (ECIR 2014), Springer, Amsterdam, pp 493–499

  32. Efron M, Organisciak P, Fenlon K (2012) Improving retrieval of short texts through document expansion. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, Portland Oregon, pp 911–920

  33. Soboroff I, Ounis I, Lin J, et al (2012) Overview of the TREC-2012 microblog track. In: Proceedings of the 21st Text REtrieval Conference, Gaithersburg

  34. Lafferty J, Zhai C (2001) Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, Louisiana, USA, pp 111–119

  35. Merigó JM, Casanovas M (2011) A new Minkowski distance based on induced aggregation operators. Int J Comput Intell Syst 4(2):123–133

    Article  Google Scholar 

  36. Han Z, Li X, Yang M, et al (2012) Hit at TREC 2012 microblog track. In: Proceedings of text retrieval conference, Gaithersburg, USA

  37. Ibtihel BL, Lobna H, Lotfi BR (2019) A deep learning-based ranking approach for microblog retrieval. Procedia Comput Sci 159:352–362

    Article  Google Scholar 

  38. Belhadi A, Djenouri Y, Lin JCW et al (2020) Exploring pattern mining algorithms for hashtag retrieval problem. IEEE Access 8:10569–10583

    Article  Google Scholar 

  39. Rao J, Lin J (2016) Temporal query expansion using a continuous hidden markov model. In: Proceedings of the 2016 ACM international conference on the theory of information retrieval, pp 295–298

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei-lei Kong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by National Social Science Fund of China (No. 18BYY125).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Zy., Kong, Ll. & Qi, Hl. Time segment language model for microblog retrieval. Neural Comput & Applic 33, 4763–4777 (2021). https://doi.org/10.1007/s00521-020-05534-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05534-x

Keywords

Navigation