Skip to main content

Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms

  • Conference paper
  • First Online:
Recent Trends in Image Processing and Pattern Recognition (RTIP2R 2020)

Abstract

Citation count of any research paper published is valuable to researcher’s career. Millions of research papers are available and it keeps growing fast. These published articles referred by upcoming research, but not all papers having same impact because of huge depository of research papers, only few publications reach researchers. So it will be interesting to study which articles are getting more citation, so in this paper citation count are considered as quantification parameter. To bring quality literature, it is important to analyze text in the research paper. With this motivation, this paper focuses on different technology to predict citation count considering text and structure features from research articles. For implementation purpose, title, abstract and conclusion fields are abstracted from research papers as the main content to analyze. The linguistic analysis of the corpus of research papers having high and low citation count is done using Natural Language Processing and Machine Learning. A system is implemented using supervised classification model, which takes input few features of a particular publication and gives output as it belongs to either high or low citation category after 9 to 10 years of its publication. There are few classification models considered to evolve learning process and appraise its performance using few performance measures. Experimental results on dataset shows performance accuracy of 60.67% by Random Forest model. The comprehensive experiments on dataset exhibits that proposed models outperform and achieve convincing results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mazloumian, A.: Predicting scholars’ scientific impact. PloS one 7(11), e49246 (2012)

    Google Scholar 

  2. Nie, B., Sun, S.: Using text mining techniques to identify research trends: a case study of design research. Appl. Sci. 7(4), 401 (2017)

    Article  Google Scholar 

  3. Bertsimas, D.: OR forum—tenure analytics: Models for predicting research impact. Oper. Res. 63(6), 1246–1261 (2015)

    Article  MathSciNet  Google Scholar 

  4. Bailey, C.: Exploring features for predicting policy citations. In: ACM/IEEE Joint Conference on Digital libraries (JCDL), pp. 1–2 (2017)

    Google Scholar 

  5. Thelwall, M., Nevill, T.: Could scientists use Altmetric.com scores to predict longer term citation counts? J. Inf. 12(1), 237–248 (2018)

    Google Scholar 

  6. Barnes, C.: The use of altmetrics as a tool for measuring research impact. Aust. Acad. Res. Libr. 46(2), 121–134 (2015)

    Article  Google Scholar 

  7. Ding, Y.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)

    Article  Google Scholar 

  8. Zhang, G., Ding, Y., Milojevi, S.: Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content. J. Am. Soc. Inf. Sci. Technol. 64(7), 1490–1503 (2013)

    Article  Google Scholar 

  9. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc, vol. 10, no. 2010, pp. 13201326 (2010)

    Google Scholar 

  10. Yan, R., et al.: Citation count prediction: learning to estimate future citations for literature. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1247–1252. ACM (2011)

    Google Scholar 

  11. Hirsch, J.: An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. U.S.A. 102(46), 16569–16572 (2005)

    Article  Google Scholar 

  12. Fast, A., Elder, J.F.: Text mining versus text analytics. International Institute for Analytics, August 2014

    Google Scholar 

  13. Yuan, S., et al.: Modeling and predicting citation count via recurrent neural network with long short-term memory. arXiv preprint arXiv:1811.02129 (2018)

  14. Dang, Q.V., Ignat, C.-L.: Quality assessment of wikipedia articles: a deep learning approach by Quang Vinh Dang and Claudia-Lavinia Ignat with Martin Vesely as coordinator. ACM SIGWEB Newsl. Autumn 5 (2016).

    Google Scholar 

  15. Bornmann, L., Leydesdorff, L.: Does quality and content matter for citedness? A comparison with para-textual factors and over time. J. Inf. 9(3), 419–429 (2015)

    Google Scholar 

  16. Pobiedina, N., Ichise, R.: Predicting citation counts for academic literature using graph pattern mining. In: Ali, Moonis, Pan, Jeng-Shyang., Chen, Shyi-Ming., Horng, Mong-Fong. (eds.) IEA/AIE 2014. LNCS (LNAI), vol. 8482, pp. 109–119. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07467-2_12

    Chapter  Google Scholar 

  17. Xiao, S., et al.: On modeling and predicting individual paper citation count over time. In: IJCAI, pp. 2676–2682 (2016).

    Google Scholar 

  18. Hasan Dalip, D., et al.: Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (ACM), pp. 295–304 (2009)

    Google Scholar 

  19. https://towardsdatascience.com. Accessed 10 Nov 2019

  20. Bai, X., et al.: An overview on evaluating and predicting scholarly article impact. Information 8(3), 73 (2017)

    Article  Google Scholar 

  21. https://towardsdatascience.com/why-random-forest-is-my-favorite-machine-learning-model-b97651fa3706. Accessed 9 Dec 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priya Porwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Porwal, P., Devare, M.H. (2021). Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms. In: Santosh, K.C., Gawali, B. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2020. Communications in Computer and Information Science, vol 1380. Springer, Singapore. https://doi.org/10.1007/978-981-16-0507-9_46

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0507-9_46

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0506-2

  • Online ISBN: 978-981-16-0507-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics