Skip to main content
Log in

Incremental clustering with vector expansion for online event detection in microblogs

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Identifying similarities in microblog posts for event detection poses challenges due to short texts with idiosyncratic spellings, irregular writing styles, abbreviations and synonyms. In order to overcome these challenges, we present an enhancement to the incremental clustering techniques by detecting similar terms in microblog posts in a temporal context. We devise an unsupervised method to measure the similarities online using co-occurrence-based techniques and use them in a vector expansion process. The results of our evaluation performed on a tweet set indicate that the proposed vector expansion method helps identify similarities in tweets despite differences in their content. This facilitates the clustering of tweets and detection of events with higher accuracy without incurring a high execution cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://dev.twitter.com/overview/api.

  2. http://twitter4j.org/en/index.html.

  3. https://github.com/ahmetax/trstop/.

  4. http://coltekin.net/cagri/trmorph.

  5. http://math.nist.gov/javanumerics/jama.

  6. http://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html.

  7. https://github.com/amueller/word_cloud.

References

  • Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, New York, pp 77–128

    Chapter  Google Scholar 

  • Aggarwal CC, Subbian K (2012) Event detection in social streams. In: SDM. SIAM/Omnipress, pp 624–635

  • Aggarwal CC, Yu PS (2006) A framework for clustering massive text and categorical data streams. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J (eds) SDM. SIAM, Philadelphia, pp 479–483

    Google Scholar 

  • Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases—volume 29, VLDB Endowment, VLDB ’03, pp 81–92

  • Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, Association for Computational Linguistics, Stroudsburg, NAACL’09, pp 19–27

  • Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers

  • Atefeh F, Khreich W (2015) A survey of techniques for event detection in Twitter. Comput Intell 31(1):132–164

    Article  MathSciNet  Google Scholar 

  • Bansal N, Koudas N (2007) Blogscope: a system for online analysis of high volume text streams. In: Proceedings of the 33rd international conference on very large data bases, VLDB Endowment, VLDB’07, pp 1410–1413

  • Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4):573–595. doi:10.1137/1037127

    Article  MathSciNet  MATH  Google Scholar 

  • Cao G, Nie JY, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’08, pp 243–250

  • Chen L, Chun L, Ziyu L, Quan Z (2013) Hybrid pseudo-relevance feedback for microblog retrieval. J Inf Sci 39(6):773–788

    Article  Google Scholar 

  • Cheong M, Lee VCS (2011) A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Inf Syst Front 13(1):45–59

    Article  Google Scholar 

  • Cordeiro M, Gama J (2016) Online social networks event detection: A survey. In: Michaelis S, Piatkowski N, Stolpe M (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science, vol 9580. Springer, Cham, pp 1–41

    Google Scholar 

  • Cotelo JM, Cruz FL, Troyano JA, Ortega FJ (2015) A modular approach for lexical normalization applied to spanish tweets. Expert Syst Appl 42(10):4743–4754

    Article  Google Scholar 

  • Cotelo JM, Cruz FL, Troyano JA (2014) Dynamic topic-related tweet retrieval. J Assoc Inf Sci Technol 65(3):513–523

    Article  Google Scholar 

  • Crooks A, Croitoru A, Stefanidis A, Radzikowski J (2013) #Earthquake: Twitter as a distributed sensor system. Trans GIS 17(1):124–147

    Article  Google Scholar 

  • De Choudhury M, Sundaram H, John A, Seligmann DD (2008) Can blog communication dynamics be correlated with stock market activity? In: Proceedings of the nineteenth ACM conference on hypertext and hypermedia, HT’08, pp 55–60

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Fang Y, Zhang H, Ye Y, Li X (2014) Detecting hot topics from Twitter: A multiview approach. J Inf Sci 40(5):578–593

    Article  Google Scholar 

  • Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, VLDB’05, pp 181–192

  • Imran M, Castillo C, Diaz F, Vieweg S (2015) Processing social media messages in mass emergency: a survey. ACM Comput Surv 47(4):67:1–67:38

    Article  Google Scholar 

  • Jun S, Park SS, Jang DS (2014) Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst Appl 41(7):3204–3212

    Article  Google Scholar 

  • Kaufmann M, Kalita J (2010) Syntactic normalization of Twitter messages. In: International conference on natural language processing, Kharagpur

  • Kim D, Kim D, Rho S, Hwang E (2013) Detecting trend and bursty keywords using characteristics of Twitter stream data. Int J Smart Home 7(1):209–220

    Google Scholar 

  • Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’02, pp 91–101

  • Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM’12, pp 155–164

  • Lin D, Zhao S, Qin L, Zhou M (2003) Identifying synonyms among distributionally similar words. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03, pp 1492–1493

  • Magdy W, Elsayed T (2016) Unsupervised adaptive microblog filtering for broad dynamic topics. Inf Process Manage 52(4):513–528

    Article  Google Scholar 

  • Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI’11, pp 227–236

  • Nguyen D, Jung J (2015) Real-time event detection on social data stream. Mob Netw Appl 20(4):475–486

    Article  Google Scholar 

  • Okazaki M, Matsuo Y (2010) Semantic Twitter: analyzing tweets for real-time event notification. In: Breslin J, Burg T, Kim HG, Raftery T, Schmidt JH (eds) Recent trends and developments in social software, lecture notes in computer science, vol 6045. Springer, Berlin, pp 63–74

  • Ozdikis O, Senkul P, Oguztuzun H (2012a) Semantic expansion of hashtags for enhanced event detection in Twitter. In: Proceedings of VLDB 2012 Workshop on Online Social Systems (WOSS)

  • Ozdikis O, Senkul P, Oguztuzun H (2012b) Semantic expansion of tweet contents for enhanced event detection in Twitter. In: IEEE/ACM international conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 20–24

  • Ozdikis O, Senkul P, Oguztuzun H (2014) Context based semantic relations in tweets. In: Can F, Özyer T, Polat F (eds) State of the art applications of social network analysis, lecture notes in social networks. Springer International Publishing, pp 35–52

  • Phuvipadawat S, Murata T (2010) Breaking news detection and tracking in Twitter. In: IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol 3. pp 120–123

  • Qiu Y, Frei HP (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval. SIGIR’93, pp 160–169

  • Rapp R (2002) The computation of word associations: comparing syntagmatic and paradigmatic approaches. In: Proceedings of the 19th international conference on computational linguistics—volume 1, Association for Computational Linguistics, Stroudsburg, COLING’02, pp 1–7

  • Sakaki T, Okazaki M, Matsuo Y (2013) Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans Knowl Data Eng 25(4):919–931

    Article  Google Scholar 

  • Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) TwitterStand: News in tweets. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems. GIS’09, pp 42–51

  • Shou L, Wang Z, Chen K, Chen G (2013) Sumblr: Continuous summarization of evolving tweet streams. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR’13, pp 533–542

  • Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho ACPLF, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):13:1–13:31

    Article  MATH  Google Scholar 

  • Song W, Park SC (2007) A novel document clustering model based on latent semantic analysis. In: Proceedings of the third international conference on Semantics, knowledge and grid, pp 539–542

  • Thomas A, Sindhu L (2015) A survey on content based semantic relations in tweets. Int J Comput Appl 132(11):14–18

    Google Scholar 

  • Varga A, Basave AEC, Rowe M, Ciravegna F, He Y (2014) Linked knowledge sources for topic classification of microposts: a semantic graph-based approach. J Web Semant Sci Serv Agents World Wide Web 26:36–57

    Article  Google Scholar 

  • Voorhees EM (1994) Query expansion using lexical-semantic relations. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’94, pp 61–69

  • Weng J, Lee B (2011) Event detection in Twitter. In: Proceedings of the fifth international conference on weblogs and social media, ICWSM’11, pp 401-408

  • Xie W, Zhu F, Jiang J, Lim EP, Wang K (2013) TopicSketch: Real-time bursty topic detection from Twitter. In: IEEE 13th international conference on Data mining (ICDM), pp 837–846

  • Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’98, pp 28–36

  • Yin J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergency situation awareness. IEEE Intell Syst 27(6):52–59

    Article  Google Scholar 

  • Zhou Y, Kanhabua N, Cristea AI (2016) Real-time timeline summarisation for high-impact events in Twitter. In: 22nd European conference on artificial intelligence, ECAI’16, pp 1158–1166

Download references

Acknowledgements

This work was financially supported by TUBITAK with the Grant number 112E275 and ICT COST Action IC1203.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ozer Ozdikis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ozdikis, O., Karagoz, P. & Oğuztüzün, H. Incremental clustering with vector expansion for online event detection in microblogs. Soc. Netw. Anal. Min. 7, 56 (2017). https://doi.org/10.1007/s13278-017-0476-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-017-0476-8

Keywords

Navigation