Abstract
Related studies have shown that the time characteristics of microblog can improve retrieval performance. However, these researches mainly focus on the time distribution of tweets related to a given query. And this single time characteristics might not be sufficient to reflect time characteristics of microblog. Inspired by the recent success of time-based language models for microblog retrieval, this paper proposes a time segment language model (TSLM) to model the time characteristics of microblog. Briefly, TSLM constructs the language model of each time segment to model the probability distribution over sequences of words for each different time segment. Based on TSLM, the time distribution of terms (tDT), the time distribution of queries (tDQ) and the time distribution of documents (tDD) are proposed. Furthermore, TSLM is exploited to estimate the query model, the document model and compute the similarity between query and document. The experimental results on the Tweets2011 corpus show that the proposed approaches outperform several state-of-the-art baselines.
Similar content being viewed by others
Notes
The corpus contains about 16 million tweets, but some microblogs were not downloaded for the reasons such as having be deleted, hidden, and so on.
References
Campos R, Dias G, Jorge AM et al (2017) Identifying top relevant dates for implicit time sensitive queries. Inf Retr J 20(4):363–398
Martins F, Magalhães J, Callan J (2019) Modeling temporal evidence from external collections. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 159–167
Rao J, Ture F, Niu X, et al (2017) Mining the temporal statistics of query terms for searching social media posts. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval, pp 133–140
Chen Q, Hu Q, Huang JX et al (2018) Taker: fine-grained time-aware microblog search with kernel density estimation. IEEE Trans Knowl Data Eng 30(8):1602–1615
Efron M, Golovchinsky G (2011) Estimation methods for ranking recent information. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, Beijing, China, pp 495–504
Keikha M, Shima Gi, Fabio C (2011) Time-based relevance models. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, Beijing, China, pp 1087–1088
Choi J, Croft W B (2012) Time models for microblogs. In: Proceedings of the 21st ACM international conference on Information and knowledge management, Maui, USA, pp 2491–2494
Peetz MH, Edgar M, Maarten de R, Wouter W (2012) Adaptive time query modeling. In: Proceedings of the 34th European conference on information retrieval research, Barcelona, Spain, pp 455–458
Li X, Croft WB (2003) Time-based language models. In: Proceedings of the 12th international conference on information and knowledge management, New Orleans, USA, pp 469–475
Dong A, Zhang R, Kolari P, et al (2010) Time is of the essence: improving recency ranking using twitter data. In: Proceedings of the 19th international conference on World wide web, Raleigh, USA, pp 331–340
Cheng S, Arvanitis A, Hristidis V (2013) How fresh do you want your search results?. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management, San Francisco, USA, ACM, pp 1271–1280
Dakka W, Gravano L, Ipeirotis PG (2012) Answering general time-sensitive queries. IEEE Trans Knowl Data Eng 24(2):220–235
Efron M, Lin J, He J, et al (2014) Time feedback for tweet search with non-parametric density estimation. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, Gold Coast, Australia, pp 33–42
Lin J, Efron M (2013) Time relevance profiles for tweet search. In: Proceedings of the 36th annual international ACM SIGIR conference on research and development in information retrieval workshop on time-aware information access, Dublin, Ireland
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the c in information retrieval, Melbourne, Australia, pp 275-281
Teevan J, Ramage D, Morris MR (2011) #TwitterSearch: a comparison of microblog search and web search. In: Proceedings of the fourth ACM international conference on Web search and data mining, Hong Kong, China, pp 35–44
Gao J, Xu G, Xu J (2013) Query expansion using path-constrained random walks. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, Dublin, Ireland, pp 563–572
Wei B, Wang B (2014) Time-aware mixed language model for microblog search. Chin J Comput 37(1):229–237
Metzler D, Cai C, Hovy E (2012) Structured event retrieval over microblog archives. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montreal, Canada, pp 646–655
Albishre K, Li Y, Xu Y (2018) Query-based automatic training set selection for microblog retrieval. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, Cham, pp 325–336
Albishre K, Li Y, Xu Y (2017) Effective pseudo-relevance for microblog retrieval. In: Proceedings of the Australasian computer science week multiconference
Chy AN, Ullah MZ, Aono M (2017) Microblog retrieval using ensemble of feature sets through supervised feature selection. IEICE Trans Inf Syst 100(4):793–806
Lavrenko V, Croft WB (2001) Relevance based language models. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, pp 120–127
Bai J, Song D, Bruza P, et al (2005) Query expansion using term relationships in language models for information retrieval. In: Proceedings of the 14th ACM international conference on information and knowledge management, Bremen, Germany, pp 688–695
Cao G, Nie J Y, Gao J, et al (2008) Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, Singapore, Singapore, pp 243–250
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1):1–50
Tao T, Wang X, Mei Q, et al (2006) Language model information retrieval with document expansion. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, association for computational linguistics, Morristown, pp 407–414
Mei Q, Zhang D, Zhai CX (2008) A general optimization framework for smoothing language models on graph structures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, Singapore, pp 611–618
Liu X, Croft WB (2004) Cluster-based retrieval using language models. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, Sheffield, UK, pp 186–193
Ramage D, Dumais ST, Liebling DJ (2010) Characterizing Microblogs with Topic Models. In: The 4th international conference on weblogs and social media, Washington, DC
Liang S, Ren Z, de Rijke M (2014) The impact of semantic document expansion on cluster-based fusion for microblog search. In: The 36th European conference on information retrieval (ECIR 2014), Springer, Amsterdam, pp 493–499
Efron M, Organisciak P, Fenlon K (2012) Improving retrieval of short texts through document expansion. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, Portland Oregon, pp 911–920
Soboroff I, Ounis I, Lin J, et al (2012) Overview of the TREC-2012 microblog track. In: Proceedings of the 21st Text REtrieval Conference, Gaithersburg
Lafferty J, Zhai C (2001) Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, Louisiana, USA, pp 111–119
Merigó JM, Casanovas M (2011) A new Minkowski distance based on induced aggregation operators. Int J Comput Intell Syst 4(2):123–133
Han Z, Li X, Yang M, et al (2012) Hit at TREC 2012 microblog track. In: Proceedings of text retrieval conference, Gaithersburg, USA
Ibtihel BL, Lobna H, Lotfi BR (2019) A deep learning-based ranking approach for microblog retrieval. Procedia Comput Sci 159:352–362
Belhadi A, Djenouri Y, Lin JCW et al (2020) Exploring pattern mining algorithms for hashtag retrieval problem. IEEE Access 8:10569–10583
Rao J, Lin J (2016) Temporal query expansion using a continuous hidden markov model. In: Proceedings of the 2016 ACM international conference on the theory of information retrieval, pp 295–298
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by National Social Science Fund of China (No. 18BYY125).
Rights and permissions
About this article
Cite this article
Han, Zy., Kong, Ll. & Qi, Hl. Time segment language model for microblog retrieval. Neural Comput & Applic 33, 4763–4777 (2021). https://doi.org/10.1007/s00521-020-05534-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05534-x