Abstract
Citation count of any research paper published is valuable to researcher’s career. Millions of research papers are available and it keeps growing fast. These published articles referred by upcoming research, but not all papers having same impact because of huge depository of research papers, only few publications reach researchers. So it will be interesting to study which articles are getting more citation, so in this paper citation count are considered as quantification parameter. To bring quality literature, it is important to analyze text in the research paper. With this motivation, this paper focuses on different technology to predict citation count considering text and structure features from research articles. For implementation purpose, title, abstract and conclusion fields are abstracted from research papers as the main content to analyze. The linguistic analysis of the corpus of research papers having high and low citation count is done using Natural Language Processing and Machine Learning. A system is implemented using supervised classification model, which takes input few features of a particular publication and gives output as it belongs to either high or low citation category after 9 to 10 years of its publication. There are few classification models considered to evolve learning process and appraise its performance using few performance measures. Experimental results on dataset shows performance accuracy of 60.67% by Random Forest model. The comprehensive experiments on dataset exhibits that proposed models outperform and achieve convincing results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mazloumian, A.: Predicting scholars’ scientific impact. PloS one 7(11), e49246 (2012)
Nie, B., Sun, S.: Using text mining techniques to identify research trends: a case study of design research. Appl. Sci. 7(4), 401 (2017)
Bertsimas, D.: OR forum—tenure analytics: Models for predicting research impact. Oper. Res. 63(6), 1246–1261 (2015)
Bailey, C.: Exploring features for predicting policy citations. In: ACM/IEEE Joint Conference on Digital libraries (JCDL), pp. 1–2 (2017)
Thelwall, M., Nevill, T.: Could scientists use Altmetric.com scores to predict longer term citation counts? J. Inf. 12(1), 237–248 (2018)
Barnes, C.: The use of altmetrics as a tool for measuring research impact. Aust. Acad. Res. Libr. 46(2), 121–134 (2015)
Ding, Y.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)
Zhang, G., Ding, Y., Milojevi, S.: Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content. J. Am. Soc. Inf. Sci. Technol. 64(7), 1490–1503 (2013)
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc, vol. 10, no. 2010, pp. 13201326 (2010)
Yan, R., et al.: Citation count prediction: learning to estimate future citations for literature. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1247–1252. ACM (2011)
Hirsch, J.: An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. U.S.A. 102(46), 16569–16572 (2005)
Fast, A., Elder, J.F.: Text mining versus text analytics. International Institute for Analytics, August 2014
Yuan, S., et al.: Modeling and predicting citation count via recurrent neural network with long short-term memory. arXiv preprint arXiv:1811.02129 (2018)
Dang, Q.V., Ignat, C.-L.: Quality assessment of wikipedia articles: a deep learning approach by Quang Vinh Dang and Claudia-Lavinia Ignat with Martin Vesely as coordinator. ACM SIGWEB Newsl. Autumn 5 (2016).
Bornmann, L., Leydesdorff, L.: Does quality and content matter for citedness? A comparison with para-textual factors and over time. J. Inf. 9(3), 419–429 (2015)
Pobiedina, N., Ichise, R.: Predicting citation counts for academic literature using graph pattern mining. In: Ali, Moonis, Pan, Jeng-Shyang., Chen, Shyi-Ming., Horng, Mong-Fong. (eds.) IEA/AIE 2014. LNCS (LNAI), vol. 8482, pp. 109–119. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07467-2_12
Xiao, S., et al.: On modeling and predicting individual paper citation count over time. In: IJCAI, pp. 2676–2682 (2016).
Hasan Dalip, D., et al.: Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (ACM), pp. 295–304 (2009)
https://towardsdatascience.com. Accessed 10 Nov 2019
Bai, X., et al.: An overview on evaluating and predicting scholarly article impact. Information 8(3), 73 (2017)
https://towardsdatascience.com/why-random-forest-is-my-favorite-machine-learning-model-b97651fa3706. Accessed 9 Dec 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Porwal, P., Devare, M.H. (2021). Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms. In: Santosh, K.C., Gawali, B. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2020. Communications in Computer and Information Science, vol 1380. Springer, Singapore. https://doi.org/10.1007/978-981-16-0507-9_46
Download citation
DOI: https://doi.org/10.1007/978-981-16-0507-9_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0506-2
Online ISBN: 978-981-16-0507-9
eBook Packages: Computer ScienceComputer Science (R0)