Abstract
The growth of social networking has increased the scope of expression on a public platform. Twitter alone, being one of the most trending social networking sites, generates a huge amount of text every minute. Twitter content analysis and summarization benefits many applications such as information retrieval, automatic indexing, automatic classification, automatic clustering, automatic filtering, etc. One of the most important tasks in analyzing tweets is automatic keyword extraction. Many existing graph-based keyword extraction approaches determine keywords purely based on centrality measure. However, various features such as frequency, centrality, position, and strength of the neighbours of the keyword also affect the importance of a keyword in tweets. Therefore, this paper proposes a novel unsupervised graph-based keyword extraction method called keywords from collective weights (KCW) which determines the importance of a keyword by collectively considering various influencing features. The KCW is based on node-edge rank centrality with node weight depending on various features. The model is validated with five data sets: Uri Attack, Harry Potter, IPL, Donald Trump and IPhone5. The result of KCW is compared with three existing models. It is observed from the experimental results that the proposed method is far better than the others. The performances are shown in terms of precision, recall, and F measure.
Similar content being viewed by others
References
Abilhoa WD, de Castro LN (2014) A keyword extraction method from twitter messages represented as graphs. Appl Math Comput 240:308–325
Beliga S, Mestrovic A, Martincic-Ipsic S (2015) An overview of graph-based keyword extraction methods and approaches. JIOS 39(1):1–20
Bellaachia A, Al-Dhelaan M (2012) NE-rank: a novel graph-based key phrase extraction in twitter. In: International Joint conferences on web intelligence and intelligent agent technology, vol 1. IEEE, WIC, ACM, pp 372–379
Berry MW, Kogan J (2010) Text mining: applications and theory. Wiley, West Sussex
Bordag S, Heyer G, Quasthoff U (2003) Small worlds of concepts and other principles of semantic search. In: Bhme T, Heyer G, Unger H (eds) IICS, 2003, lecture notes in computer science, vol 2877, pp 10–19
Boudin F (2013) A comparison of centrality measures for graph-based keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 834–838
Bougouin A, Boudin F, Daille B (2013) TopicRank: graph-based topic ranking for keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 543–551
Chen P, Lin S (2010) Automatic keyword prediction using Google similarity distance. Expert Syst Appl 37(3):1928–1938
Cohen-Kerner H (2003) Automatic extraction of keyword from abstracts. In: Automatic extraction of keyword from abstracts, lecture notes in computer science, vol 2773, pp 843–849
Ediger D, Jiang K, Riedy J, Bader DA, Corley C, Farber R, Reynolds WN (2010) Massive social network analysis: mining twitter for social good. In: 39th international conference on parallel processing. IEEE, pp 583–593
Goutte C, Gaussier E, Probabilistic A (2005) Interpretation of precision, recall and F-score, with implication for evaluation. In: Losada DE, Fernández-Luna JM (eds) Advances in information retrieval, ECIR 2005, lecture notes in computer science, vol 3408. Springer, Berlin
Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multi-theme documents. In: 18th international conference on World Wide Web, NY, USA, pp 661–670
Hemalatha I, Saradhi Varma GP, Govardhan A (2013) Sentiment analysis tool using machine learning algorithms. Int J Emerg Trends Technol Comput Sci (IJETTCS) 2(2):105–109
Hotho A, Nürnberger A, Paab G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62
Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Conference on Empirical methods in natural language processing, pp 216–223
Jin W, Srihari R (2007) Graph-based text representation and knowledge discovery. In: Proceedings of the SAC conference, pp 807–811
Khan TM, Yukun, Kim J (2016) Term ranker: a graph based re-ranking approach. In: FLAIRS conference (AAAI), pp. 310–315
Kwon K, Choi CH, Lee J (2015) A graph based representative keywords extraction model from news articles. In: International conference on big data applications and services. ACM, pp 30–36
Lahiri S, Choudhury SR, Caragea C (2014) Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv:1401.6571 [cs.CL]
Litvak M, Last M, Aizenman H, Gobits H, Kandel A (2011) DegExt—a language-independent graph-based keyphrase extractor, In: Mugellini E, Szczepaniak PS, Pettenati MC, Sokhn M (eds) Advances in intelligent web mastering—3. Advances in intelligent and soft computing, vol 86. Springer, Berlin, pp 121–130
Medelyan O, Witten IH (2006) Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS joint conference on digital libraries, pp 296–297
Nagarajan R, Nair DSAH, Aruna DrP, Puviarasan N (2016) Keyword extraction using graph based approach. Int J Adv Res Comput Sci Softw Eng 6(10):25–29
Nguyen TD, Kan MY (2007) Keyphrase extraction in scientific publications. In: 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers, pp 317–326
Ravinuthala MKVV, Reddy ChS, Graph TT (2016) A text representation technique for keyword weighting in extractive summarization system. Int J Inf Eng Electron Bus (MECS) 8(4):18–25
Rousseau F, Vazigiannis M (2013) Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of the 22nd ACM international conference on conference on information and knowledge management 2013, pp 59–68
Savita DB, Gore PD (2016) Sentiment analysis on twitter data using support vector machine. Int J Comput Sci Trends Technol (IJCST) 4(3):365–370
Sonawane SS, Dr PA, Kulkarni (2014) Graph based representation and analysis of text document: a survey of techniques. Int J Comput Appl 96(19):1–8
Song HJ, Go J, B.Park S, Park SY, Kim KY (2017) A just-in-time keyword extraction from meeting transcripts using temporal and participant information. J Intell Inf Syst 48(1):117–140
Wang Z, Feng Y, Li F (2016) The improvements of text rank for domain-specific key phrase extraction. Int J Simul Syst Sci Technol 17(20):11.1–11.5
Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (1999) KEA: practical automatic keyphrase extraction. In: Fourth ACM conference on digital libraries, pp 254–255
Wu J, Xuan Z, Pan D (2011) Enhancing text representation for classification tasks with semantic graph structures. Int J Innov Comput Inf Control 7(5B):2689–2698
Zahang C, Wang H, Liu Y, Wu D, Liao Y, Wang B (2008) Automatic keyword extraction from documents using conditional random fields. J CIS 4(3):1169–1180
Zhang K, Xu H, Tang J, Li J (2006) Keyword extraction using support vector machine. In: 7th international conference on advances in web-age information management, pp 85–96
Zhao WX, Jiang J, He J, Song Y, Achananuparp P, Li EP, Li X (2011) Topical keyphrase extraction from twitter. In: Proceedings of the 49th annual meeting of the ACL, Portland, Oregon, June 19–24. ACL, pp 379–388
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bordoloi, M., Biswas, S.K. Keyword extraction from micro-blogs using collective weight. Soc. Netw. Anal. Min. 8, 58 (2018). https://doi.org/10.1007/s13278-018-0536-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0536-8