Skip to main content
Log in

Keyword extraction from micro-blogs using collective weight

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The growth of social networking has increased the scope of expression on a public platform. Twitter alone, being one of the most trending social networking sites, generates a huge amount of text every minute. Twitter content analysis and summarization benefits many applications such as information retrieval, automatic indexing, automatic classification, automatic clustering, automatic filtering, etc. One of the most important tasks in analyzing tweets is automatic keyword extraction. Many existing graph-based keyword extraction approaches determine keywords purely based on centrality measure. However, various features such as frequency, centrality, position, and strength of the neighbours of the keyword also affect the importance of a keyword in tweets. Therefore, this paper proposes a novel unsupervised graph-based keyword extraction method called keywords from collective weights (KCW) which determines the importance of a keyword by collectively considering various influencing features. The KCW is based on node-edge rank centrality with node weight depending on various features. The model is validated with five data sets: Uri Attack, Harry Potter, IPL, Donald Trump and IPhone5. The result of KCW is compared with three existing models. It is observed from the experimental results that the proposed method is far better than the others. The performances are shown in terms of precision, recall, and F measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abilhoa WD, de Castro LN (2014) A keyword extraction method from twitter messages represented as graphs. Appl Math Comput 240:308–325

    Google Scholar 

  • Beliga S, Mestrovic A, Martincic-Ipsic S (2015) An overview of graph-based keyword extraction methods and approaches. JIOS 39(1):1–20

    Google Scholar 

  • Bellaachia A, Al-Dhelaan M (2012) NE-rank: a novel graph-based key phrase extraction in twitter. In: International Joint conferences on web intelligence and intelligent agent technology, vol 1. IEEE, WIC, ACM, pp 372–379

  • Berry MW, Kogan J (2010) Text mining: applications and theory. Wiley, West Sussex

    Book  Google Scholar 

  • Bordag S, Heyer G, Quasthoff U (2003) Small worlds of concepts and other principles of semantic search. In: Bhme T, Heyer G, Unger H (eds) IICS, 2003, lecture notes in computer science, vol 2877, pp 10–19

  • Boudin F (2013) A comparison of centrality measures for graph-based keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 834–838

  • Bougouin A, Boudin F, Daille B (2013) TopicRank: graph-based topic ranking for keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 543–551

  • Chen P, Lin S (2010) Automatic keyword prediction using Google similarity distance. Expert Syst Appl 37(3):1928–1938

    Article  Google Scholar 

  • Cohen-Kerner H (2003) Automatic extraction of keyword from abstracts. In: Automatic extraction of keyword from abstracts, lecture notes in computer science, vol 2773, pp 843–849

  • Ediger D, Jiang K, Riedy J, Bader DA, Corley C, Farber R, Reynolds WN (2010) Massive social network analysis: mining twitter for social good. In: 39th international conference on parallel processing. IEEE, pp 583–593

  • Goutte C, Gaussier E, Probabilistic A (2005) Interpretation of precision, recall and F-score, with implication for evaluation. In: Losada DE, Fernández-Luna JM (eds) Advances in information retrieval, ECIR 2005, lecture notes in computer science, vol 3408. Springer, Berlin

    Google Scholar 

  • Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multi-theme documents. In: 18th international conference on World Wide Web, NY, USA, pp 661–670

  • Hemalatha I, Saradhi Varma GP, Govardhan A (2013) Sentiment analysis tool using machine learning algorithms. Int J Emerg Trends Technol Comput Sci (IJETTCS) 2(2):105–109

    Google Scholar 

  • Hotho A, Nürnberger A, Paab G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62

    Google Scholar 

  • Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Conference on Empirical methods in natural language processing, pp 216–223

  • Jin W, Srihari R (2007) Graph-based text representation and knowledge discovery. In: Proceedings of the SAC conference, pp 807–811

  • Khan TM, Yukun, Kim J (2016) Term ranker: a graph based re-ranking approach. In: FLAIRS conference (AAAI), pp. 310–315

  • Kwon K, Choi CH, Lee J (2015) A graph based representative keywords extraction model from news articles. In: International conference on big data applications and services. ACM, pp 30–36

  • Lahiri S, Choudhury SR, Caragea C (2014) Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv:1401.6571 [cs.CL]

  • Litvak M, Last M, Aizenman H, Gobits H, Kandel A (2011) DegExt—a language-independent graph-based keyphrase extractor, In: Mugellini E, Szczepaniak PS, Pettenati MC, Sokhn M (eds) Advances in intelligent web mastering—3. Advances in intelligent and soft computing, vol 86. Springer, Berlin, pp 121–130

    Chapter  Google Scholar 

  • Medelyan O, Witten IH (2006) Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS joint conference on digital libraries, pp 296–297

  • Nagarajan R, Nair DSAH, Aruna DrP, Puviarasan N (2016) Keyword extraction using graph based approach. Int J Adv Res Comput Sci Softw Eng 6(10):25–29

    Google Scholar 

  • Nguyen TD, Kan MY (2007) Keyphrase extraction in scientific publications. In: 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers, pp 317–326

  • Ravinuthala MKVV, Reddy ChS, Graph TT (2016) A text representation technique for keyword weighting in extractive summarization system. Int J Inf Eng Electron Bus (MECS) 8(4):18–25

    Google Scholar 

  • Rousseau F, Vazigiannis M (2013) Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of the 22nd ACM international conference on conference on information and knowledge management 2013, pp 59–68

  • Savita DB, Gore PD (2016) Sentiment analysis on twitter data using support vector machine. Int J Comput Sci Trends Technol (IJCST) 4(3):365–370

    Google Scholar 

  • Sonawane SS, Dr PA, Kulkarni (2014) Graph based representation and analysis of text document: a survey of techniques. Int J Comput Appl 96(19):1–8

    Google Scholar 

  • Song HJ, Go J, B.Park S, Park SY, Kim KY (2017) A just-in-time keyword extraction from meeting transcripts using temporal and participant information. J Intell Inf Syst 48(1):117–140

    Article  Google Scholar 

  • Wang Z, Feng Y, Li F (2016) The improvements of text rank for domain-specific key phrase extraction. Int J Simul Syst Sci Technol 17(20):11.1–11.5

    Google Scholar 

  • Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (1999) KEA: practical automatic keyphrase extraction. In: Fourth ACM conference on digital libraries, pp 254–255

  • Wu J, Xuan Z, Pan D (2011) Enhancing text representation for classification tasks with semantic graph structures. Int J Innov Comput Inf Control 7(5B):2689–2698

    Google Scholar 

  • Zahang C, Wang H, Liu Y, Wu D, Liao Y, Wang B (2008) Automatic keyword extraction from documents using conditional random fields. J CIS 4(3):1169–1180

    Google Scholar 

  • Zhang K, Xu H, Tang J, Li J (2006) Keyword extraction using support vector machine. In: 7th international conference on advances in web-age information management, pp 85–96

  • Zhao WX, Jiang J, He J, Song Y, Achananuparp P, Li EP, Li X (2011) Topical keyphrase extraction from twitter. In: Proceedings of the 49th annual meeting of the ACL, Portland, Oregon, June 19–24. ACL, pp 379–388

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saroj Kr. Biswas.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bordoloi, M., Biswas, S.K. Keyword extraction from micro-blogs using collective weight. Soc. Netw. Anal. Min. 8, 58 (2018). https://doi.org/10.1007/s13278-018-0536-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0536-8

Keywords

Navigation