skip to main content
10.1145/2808194.2809466acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval

Published: 27 September 2015 Publication History

Abstract

In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs).
Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking.
Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.

References

[1]
Y. Aboulnaga, C. L. A. Clarke, and D. R. Cheriton. Frequent itemset mining for query expansion in microblog ad-hoc search.
[2]
G. Amati, G. Amodeo, M. Bianchi, G. Marcone, F. U. Bordoni, C. Gaibisso, G. Gambosi, A. Celi, C. Di Nicola, and M. Flammini. Fub, iasi-cnr, univaq at trec 2011 microblog track. In TREC, 2011.
[3]
G. Amati, C. Joost, and V. Rijsbergen. Probabilistic models for information retrieval based on divergence from randomness. 2003.
[4]
A. E. C. Basave, A. Varga, M. Rowe, M. Stankovic, and A.-S. Dadzie. Making sense of microposts (# msm2013) concept extraction challenge. In # MSM, pages 1--15, 2013.
[5]
F. Damak, K. Pinel-Sauvagnat, M. Boughanem, and G. Cabanac. Effectiveness of state-of-the-art features for microblog search. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pages 914--919, New York, NY, USA, 2013. ACM.
[6]
P. Ferguson, N. O'Hare, J. Lanagan, O. Phelan, and K. McCarthy. An investigation of term weighting approaches for microblog retrieval. In Advances in Information Retrieval, pages 552--555. Springer, 2012.
[7]
J. Gao, G. Cui, S. Liu, Y. Liu, and X. Cheng. Ictnet at microblog track in trec 2013.
[8]
Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at trec 2012 microblog track. TREC Microblog 2012, 2012.
[9]
D. Hiemstra. Using language models for information retrieval. 2001.
[10]
L. B. Jabeur, F. Damak, L. Tamine, G. Cabanac, K. Pinel-Sauvagnat, and M. Boughanem. Irit at trec microblog track 2013.
[11]
Y. Kim, R. Yeniterzi, and J. Callan. Overcoming vocabulary limitations in twitter microblogs. TREC Microblog 2012, 2012.
[12]
Y. Li, Z. Zhang, W. Lv, Q. Xie, Y. Lin, R. Xu, W. Xu, G. Chen, and J. Guo. Pris at trec 2011 microblog track. In TREC, 2011.
[13]
K. Massoudi, M. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In Advances in Information Retrieval, pages 362--367. Springer, 2011.
[14]
D. Metzler and C. Cai. Usc/isi at trec 2011: Microblog track. In Proceedings of the Text REtrieval Conference (TREC 2011), 2011.
[15]
R. Nagmoti, A. Teredesai, and M. De Cock. Ranking approaches for microblog search. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, volume 1, pages 153--157. IEEE, 2010.
[16]
N. Naveed, T. Gottron, J. Kunegis, and A. C. Alhadi. Searching microblogs: coping with sparsity and document quality. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 183--188. ACM, 2011.
[17]
I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and D. Johnson. Terrier information retrieval platform. In Advances in Information Retrieval, pages 517--519. Springer, 2005.
[18]
I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval Conference, 2011.
[19]
J. A. R. Perez, A. J. McMinn, and J. M. Jose. University of glasgow (uog_twteam) at trec microblog.
[20]
B. Pre-Processing. Bjut at trec 2013 microblog track.
[21]
S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009.
[22]
T. Roelleke. Information retrieval models: Foundations and relationships. Synthesis Lectures on Information Concepts, Retrieval, and Services, 5(3):1--163, 2013.
[23]
B. Sharifi, M.-A. Hutton, and J. Kalita. Experiments in microblog summarization. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, pages 49--56, Aug 2010.
[24]
Y. Y. H. W. G. C. Siming Zhu, Zhe Gao. Pris at 2013 microblog track.
[25]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29. ACM, 1996.
[26]
K. Tao, F. Abel, C. Hauff, and G.-J. Houben. What makes a tweet relevant for a topic? Making Sense of Microposts (# MSM2012), pages 49--56, 2012.
[27]
J. Teevan, D. Ramage, and M. Morris. # twittersearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 35--44. ACM, 2011.
[28]
S. K. J. Y. P. Thomas. Searching and filtering tweets: Csiro at the trec 2012 microblog track.
[29]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM, 2001.

Cited By

View all
  • (2019)On fine-grained geolocalisation of tweets and real-time traffic incident detectionInformation Processing and Management: an International Journal10.1016/j.ipm.2018.03.01156:3(1119-1132)Online publication date: 1-May-2019

Index Terms

  1. On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
      September 2015
      402 pages
      ISBN:9781450338332
      DOI:10.1145/2808194
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 September 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. ad-hoc retrieval
      2. dimensions
      3. microblog
      4. modelling
      5. ranking
      6. state machine

      Qualifiers

      • Research-article

      Conference

      ICTIR '15
      Sponsor:

      Acceptance Rates

      ICTIR '15 Paper Acceptance Rate 29 of 57 submissions, 51%;
      Overall Acceptance Rate 235 of 527 submissions, 45%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)On fine-grained geolocalisation of tweets and real-time traffic incident detectionInformation Processing and Management: an International Journal10.1016/j.ipm.2018.03.01156:3(1119-1132)Online publication date: 1-May-2019

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media