Abstract
Identifying similarities in microblog posts for event detection poses challenges due to short texts with idiosyncratic spellings, irregular writing styles, abbreviations and synonyms. In order to overcome these challenges, we present an enhancement to the incremental clustering techniques by detecting similar terms in microblog posts in a temporal context. We devise an unsupervised method to measure the similarities online using co-occurrence-based techniques and use them in a vector expansion process. The results of our evaluation performed on a tweet set indicate that the proposed vector expansion method helps identify similarities in tweets despite differences in their content. This facilitates the clustering of tweets and detection of events with higher accuracy without incurring a high execution cost.
Similar content being viewed by others
Notes
References
Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, New York, pp 77–128
Aggarwal CC, Subbian K (2012) Event detection in social streams. In: SDM. SIAM/Omnipress, pp 624–635
Aggarwal CC, Yu PS (2006) A framework for clustering massive text and categorical data streams. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J (eds) SDM. SIAM, Philadelphia, pp 479–483
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases—volume 29, VLDB Endowment, VLDB ’03, pp 81–92
Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, Association for Computational Linguistics, Stroudsburg, NAACL’09, pp 19–27
Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers
Atefeh F, Khreich W (2015) A survey of techniques for event detection in Twitter. Comput Intell 31(1):132–164
Bansal N, Koudas N (2007) Blogscope: a system for online analysis of high volume text streams. In: Proceedings of the 33rd international conference on very large data bases, VLDB Endowment, VLDB’07, pp 1410–1413
Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4):573–595. doi:10.1137/1037127
Cao G, Nie JY, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’08, pp 243–250
Chen L, Chun L, Ziyu L, Quan Z (2013) Hybrid pseudo-relevance feedback for microblog retrieval. J Inf Sci 39(6):773–788
Cheong M, Lee VCS (2011) A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Inf Syst Front 13(1):45–59
Cordeiro M, Gama J (2016) Online social networks event detection: A survey. In: Michaelis S, Piatkowski N, Stolpe M (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science, vol 9580. Springer, Cham, pp 1–41
Cotelo JM, Cruz FL, Troyano JA, Ortega FJ (2015) A modular approach for lexical normalization applied to spanish tweets. Expert Syst Appl 42(10):4743–4754
Cotelo JM, Cruz FL, Troyano JA (2014) Dynamic topic-related tweet retrieval. J Assoc Inf Sci Technol 65(3):513–523
Crooks A, Croitoru A, Stefanidis A, Radzikowski J (2013) #Earthquake: Twitter as a distributed sensor system. Trans GIS 17(1):124–147
De Choudhury M, Sundaram H, John A, Seligmann DD (2008) Can blog communication dynamics be correlated with stock market activity? In: Proceedings of the nineteenth ACM conference on hypertext and hypermedia, HT’08, pp 55–60
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Fang Y, Zhang H, Ye Y, Li X (2014) Detecting hot topics from Twitter: A multiview approach. J Inf Sci 40(5):578–593
Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, VLDB’05, pp 181–192
Imran M, Castillo C, Diaz F, Vieweg S (2015) Processing social media messages in mass emergency: a survey. ACM Comput Surv 47(4):67:1–67:38
Jun S, Park SS, Jang DS (2014) Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst Appl 41(7):3204–3212
Kaufmann M, Kalita J (2010) Syntactic normalization of Twitter messages. In: International conference on natural language processing, Kharagpur
Kim D, Kim D, Rho S, Hwang E (2013) Detecting trend and bursty keywords using characteristics of Twitter stream data. Int J Smart Home 7(1):209–220
Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’02, pp 91–101
Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM’12, pp 155–164
Lin D, Zhao S, Qin L, Zhou M (2003) Identifying synonyms among distributionally similar words. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03, pp 1492–1493
Magdy W, Elsayed T (2016) Unsupervised adaptive microblog filtering for broad dynamic topics. Inf Process Manage 52(4):513–528
Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI’11, pp 227–236
Nguyen D, Jung J (2015) Real-time event detection on social data stream. Mob Netw Appl 20(4):475–486
Okazaki M, Matsuo Y (2010) Semantic Twitter: analyzing tweets for real-time event notification. In: Breslin J, Burg T, Kim HG, Raftery T, Schmidt JH (eds) Recent trends and developments in social software, lecture notes in computer science, vol 6045. Springer, Berlin, pp 63–74
Ozdikis O, Senkul P, Oguztuzun H (2012a) Semantic expansion of hashtags for enhanced event detection in Twitter. In: Proceedings of VLDB 2012 Workshop on Online Social Systems (WOSS)
Ozdikis O, Senkul P, Oguztuzun H (2012b) Semantic expansion of tweet contents for enhanced event detection in Twitter. In: IEEE/ACM international conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 20–24
Ozdikis O, Senkul P, Oguztuzun H (2014) Context based semantic relations in tweets. In: Can F, Özyer T, Polat F (eds) State of the art applications of social network analysis, lecture notes in social networks. Springer International Publishing, pp 35–52
Phuvipadawat S, Murata T (2010) Breaking news detection and tracking in Twitter. In: IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol 3. pp 120–123
Qiu Y, Frei HP (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval. SIGIR’93, pp 160–169
Rapp R (2002) The computation of word associations: comparing syntagmatic and paradigmatic approaches. In: Proceedings of the 19th international conference on computational linguistics—volume 1, Association for Computational Linguistics, Stroudsburg, COLING’02, pp 1–7
Sakaki T, Okazaki M, Matsuo Y (2013) Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans Knowl Data Eng 25(4):919–931
Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) TwitterStand: News in tweets. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems. GIS’09, pp 42–51
Shou L, Wang Z, Chen K, Chen G (2013) Sumblr: Continuous summarization of evolving tweet streams. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR’13, pp 533–542
Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho ACPLF, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):13:1–13:31
Song W, Park SC (2007) A novel document clustering model based on latent semantic analysis. In: Proceedings of the third international conference on Semantics, knowledge and grid, pp 539–542
Thomas A, Sindhu L (2015) A survey on content based semantic relations in tweets. Int J Comput Appl 132(11):14–18
Varga A, Basave AEC, Rowe M, Ciravegna F, He Y (2014) Linked knowledge sources for topic classification of microposts: a semantic graph-based approach. J Web Semant Sci Serv Agents World Wide Web 26:36–57
Voorhees EM (1994) Query expansion using lexical-semantic relations. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’94, pp 61–69
Weng J, Lee B (2011) Event detection in Twitter. In: Proceedings of the fifth international conference on weblogs and social media, ICWSM’11, pp 401-408
Xie W, Zhu F, Jiang J, Lim EP, Wang K (2013) TopicSketch: Real-time bursty topic detection from Twitter. In: IEEE 13th international conference on Data mining (ICDM), pp 837–846
Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’98, pp 28–36
Yin J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergency situation awareness. IEEE Intell Syst 27(6):52–59
Zhou Y, Kanhabua N, Cristea AI (2016) Real-time timeline summarisation for high-impact events in Twitter. In: 22nd European conference on artificial intelligence, ECAI’16, pp 1158–1166
Acknowledgements
This work was financially supported by TUBITAK with the Grant number 112E275 and ICT COST Action IC1203.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ozdikis, O., Karagoz, P. & Oğuztüzün, H. Incremental clustering with vector expansion for online event detection in microblogs. Soc. Netw. Anal. Min. 7, 56 (2017). https://doi.org/10.1007/s13278-017-0476-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-017-0476-8