Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms

Porwal, Priya; Devare, Manoj H.

doi:10.1007/978-981-16-0507-9_46

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1380))

Included in the following conference series:

International Conference on Recent Trends in Image Processing and Pattern Recognition

703 Accesses
2 Citations

Abstract

Citation count of any research paper published is valuable to researcher’s career. Millions of research papers are available and it keeps growing fast. These published articles referred by upcoming research, but not all papers having same impact because of huge depository of research papers, only few publications reach researchers. So it will be interesting to study which articles are getting more citation, so in this paper citation count are considered as quantification parameter. To bring quality literature, it is important to analyze text in the research paper. With this motivation, this paper focuses on different technology to predict citation count considering text and structure features from research articles. For implementation purpose, title, abstract and conclusion fields are abstracted from research papers as the main content to analyze. The linguistic analysis of the corpus of research papers having high and low citation count is done using Natural Language Processing and Machine Learning. A system is implemented using supervised classification model, which takes input few features of a particular publication and gives output as it belongs to either high or low citation category after 9 to 10 years of its publication. There are few classification models considered to evolve learning process and appraise its performance using few performance measures. Experimental results on dataset shows performance accuracy of 60.67% by Random Forest model. The comprehensive experiments on dataset exhibits that proposed models outperform and achieve convincing results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

Article 22 November 2018

Citation Classification Using Natural Language Processing and Machine Learning Models

Contextualised segment-wise citation function classification

Article 12 July 2023

References

Mazloumian, A.: Predicting scholars’ scientific impact. PloS one 7(11), e49246 (2012)
Google Scholar
Nie, B., Sun, S.: Using text mining techniques to identify research trends: a case study of design research. Appl. Sci. 7(4), 401 (2017)
Article Google Scholar
Bertsimas, D.: OR forum—tenure analytics: Models for predicting research impact. Oper. Res. 63(6), 1246–1261 (2015)
Article MathSciNet Google Scholar
Bailey, C.: Exploring features for predicting policy citations. In: ACM/IEEE Joint Conference on Digital libraries (JCDL), pp. 1–2 (2017)
Google Scholar
Thelwall, M., Nevill, T.: Could scientists use Altmetric.com scores to predict longer term citation counts? J. Inf. 12(1), 237–248 (2018)
Google Scholar
Barnes, C.: The use of altmetrics as a tool for measuring research impact. Aust. Acad. Res. Libr. 46(2), 121–134 (2015)
Article Google Scholar
Ding, Y.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)
Article Google Scholar
Zhang, G., Ding, Y., Milojevi, S.: Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content. J. Am. Soc. Inf. Sci. Technol. 64(7), 1490–1503 (2013)
Article Google Scholar
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc, vol. 10, no. 2010, pp. 13201326 (2010)
Google Scholar
Yan, R., et al.: Citation count prediction: learning to estimate future citations for literature. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1247–1252. ACM (2011)
Google Scholar
Hirsch, J.: An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. U.S.A. 102(46), 16569–16572 (2005)
Article Google Scholar
Fast, A., Elder, J.F.: Text mining versus text analytics. International Institute for Analytics, August 2014
Google Scholar
Yuan, S., et al.: Modeling and predicting citation count via recurrent neural network with long short-term memory. arXiv preprint arXiv:1811.02129 (2018)
Dang, Q.V., Ignat, C.-L.: Quality assessment of wikipedia articles: a deep learning approach by Quang Vinh Dang and Claudia-Lavinia Ignat with Martin Vesely as coordinator. ACM SIGWEB Newsl. Autumn 5 (2016).
Google Scholar
Bornmann, L., Leydesdorff, L.: Does quality and content matter for citedness? A comparison with para-textual factors and over time. J. Inf. 9(3), 419–429 (2015)
Google Scholar
Pobiedina, N., Ichise, R.: Predicting citation counts for academic literature using graph pattern mining. In: Ali, Moonis, Pan, Jeng-Shyang., Chen, Shyi-Ming., Horng, Mong-Fong. (eds.) IEA/AIE 2014. LNCS (LNAI), vol. 8482, pp. 109–119. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07467-2_12
Chapter Google Scholar
Xiao, S., et al.: On modeling and predicting individual paper citation count over time. In: IJCAI, pp. 2676–2682 (2016).
Google Scholar
Hasan Dalip, D., et al.: Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (ACM), pp. 295–304 (2009)
Google Scholar
https://towardsdatascience.com. Accessed 10 Nov 2019
Bai, X., et al.: An overview on evaluating and predicting scholarly article impact. Information 8(3), 73 (2017)
Article Google Scholar
https://towardsdatascience.com/why-random-forest-is-my-favorite-machine-learning-model-b97651fa3706. Accessed 9 Dec 2019

Download references

Author information

Authors and Affiliations

Amity University, Mumbai, 410206, India
Priya Porwal & Manoj H. Devare

Authors

Priya Porwal
View author publications
You can also search for this author in PubMed Google Scholar
Manoj H. Devare
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priya Porwal .

Editor information

Editors and Affiliations

University of South Dakota, Vermillion, SD, USA
K. C. Santosh
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India
Bharti Gawali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Porwal, P., Devare, M.H. (2021). Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms. In: Santosh, K.C., Gawali, B. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2020. Communications in Computer and Information Science, vol 1380. Springer, Singapore. https://doi.org/10.1007/978-981-16-0507-9_46

Download citation

DOI: https://doi.org/10.1007/978-981-16-0507-9_46
Published: 26 February 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0506-2
Online ISBN: 978-981-16-0507-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms

Abstract

Access this chapter

Similar content being viewed by others

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

Citation Classification Using Natural Language Processing and Machine Learning Models

Contextualised segment-wise citation function classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms

Abstract

Access this chapter

Similar content being viewed by others

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

Citation Classification Using Natural Language Processing and Machine Learning Models

Contextualised segment-wise citation function classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation