ABSTRACT
Geo-temporal visualization of Twitter search results is a challenging task since the simultaneous display of all matching tweets would result in a saturated and unreadable display. In such settings, clustering search results can assist users to scan only a few coherent groups of related tweets rather than many individual tweets. However, in practice, the use of unsupervised clustering methods such as K -Means does not necessarily guarantee that the clusters themselves are relevant. Therefore, we develop a novel method of relevance-driven clustering for visual information retrieval to supply users with highly relevant clusters representing different information perspectives of their queries. We specifically propose a Visual Twitter Information Retrieval (Viz-TIR) tool for relevance-driven clustering and ranking of Twitter search results. At the heart of Viz-TIR is a fast greedy algorithm that optimizes an approximation of an expected F1-Score metric to generate these clusters. We demonstrate its effectiveness w.r.t. K -Means and a baseline method that shows all top matching results on a scenario related to searching natural disasters in US-based Twitter data spanning 2013 and 2014. Our demo shows that Viz-TIR is easy to use and more precise in extracting geo-temporally coherent clusters given search queries in comparison to K-Means, thus aiding the user in visually searching and browsing social network content. Overall, we believe this work enables new opportunities for the synthesis of information retrieval as well as combined relevance and display-aware optimization techniques to support query-adaptive visual information exploration interfaces.
- T. von Landesberger, A. Kuijper, T. Schreck, J. Kohlhammer, J.J. van Wijk, J.-D. Fekete, and D.W. Fellner. Visual analysis of large graphs: State-of-the-art and future research challenges. Computer Graphics Forum, 30(6):1719--1749, 2011.Google ScholarCross Ref
- Shixia Liu, Weiwei Cui, Yingcai Wu, and Mengchen Liu. A survey on information visualization: recent advances and challenges. The Visual Computer, 30(12):1373--1393, Dec 2014. Google ScholarDigital Library
- Guo-Dao Sun, Ying-Cai Wu, Rong-Hua Liang, and Shi-Xia Liu. A survey of visual analytics techniques and applications: State-of-the-art research and future challenges. Journal of Computer Science and Technology, 28(5):852--867, Sep 2013.Google ScholarCross Ref
- Stuart P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129--137, 1982. Google ScholarDigital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
- N. Jardine and C.J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.Google ScholarCross Ref
- Ellen M. Voorhees. The cluster hypothesis revisited. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '85, pages 188--196, New York, NY, USA, 1985. ACM. Google ScholarDigital Library
- C. J. Van Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979. Google ScholarDigital Library
- Oren Kurland. Re-ranking search results using language models of query-specific clusters. Inf. Retr., 12(4):437--460, August 2009. Google ScholarDigital Library
- Oren Kurland and Eyal Krikon. The opposite of smoothing: a language model approach to ranking query-specific document clusters. Journal of Artificial Intelligence Research, 41:367--395, 2011. Google ScholarDigital Library
- Or Levi, Ido Guy, Fiana Raiber, and Oren Kurland. Selective cluster presentation on the search results page. ACM Trans. Inf. Syst., 36(3):28:1--28:42, February 2018. Google ScholarDigital Library
- Fiana Raiber and Oren Kurland. Exploring the cluster hypothesis, and cluster-based retrieval, over the web. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM '12, pages 2507--2510, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- Ismail Sengor Altingovde, Rifat Ozcan, Huseyin Cagdas Ocalan, Fazli Can, and Özgür Ulusoy. Large-scale cluster-based retrieval experiments on turkish texts. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages 891--892, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- Xiaoyong Liu and W. Bruce Croft. Cluster-based retrieval using language models. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '04, pages 186--193, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- Ricardo A Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., 2 edition, 2010. Google ScholarDigital Library
- G.M.P. van Kempen and L.J. van Vliet. Mean and variance of ratio estimators used in fluorescence ratio imaging. Cytometry, 39(4):300--305, 2000.Google ScholarCross Ref
- Zahra Iman, Scott Sanner, Mohamed Reda Bouadjenek, and Lexing Xie. A longitudinal study of topic classification on twitter. In ICWSM, pages 552--555, 2017.Google Scholar
- Dan Pelleg and Andrew W. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pages 727--734, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
Recommendations
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media AnalyticsTwitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
Evaluating scalability in information retrieval with multigraded relevance
AIRS'06: Proceedings of the Third Asia conference on Information Retrieval TechnologyFor the user’s point of view, in large environments, it can be desirable to have Information Retrieval Systems (IRS) that retrieve documents according to their relevance levels. Relevance levels have been studied in some previous Information Retrieval (...
Content-based image retrieval embedded with agglomerative clustering built on information loss
Clustering algorithm and cluster-based CBIR system is developed.Implemented clustering algorithm forms good quality clusters as compared to others.A cluster-based CBIR system has also shown significant performance improvement.Proposed system reports ...
Comments