Identification of important citations by exploiting research articles’ metadata and cue-terms from content

Qayyum, Faiza; Afzal, Muhammad Tanvir

doi:10.1007/s11192-018-2961-x

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

Published: 22 November 2018

Volume 118, pages 21–43, (2019)
Cite this article

Scientometrics Aims and scope Submit manuscript

Faiza Qayyum¹ &
Muhammad Tanvir Afzal¹

1078 Accesses
38 Citations
Explore all metrics

Abstract

Citations play a pivotal role in indicating various aspects of scientific literature. Quantitative citation analysis approaches have been used over the decades to measure the impact factor of journals, to rank researchers or institutions, to discover evolving research topics etc. Researchers doubted the pure quantitative citation analysis approaches and argued that all citations are not equally important; citation reasons must be considered while counting. In the recent past, researchers have focused on identifying important citation reasons by classifying them into important and non-important classes rather than individually classifying each reason. Most of contemporary citation classification techniques either rely on full content of articles, or they are dominated by content based features. However, most of the time content is not freely available as various journal publishers do not provide open access to articles. This paper presents a binary citation classification scheme, which is dominated by metadata based parameters. The study demonstrates the significance of metadata and content based parameters in varying scenarios. The experiments are performed on two annotated data sets, which are evaluated by employing SVM, KLR, Random Forest machine learning classifiers. The results are compared with the contemporary study that has performed similar classification employing rich list of content-based features. The results of comparisons revealed that the proposed model has attained improved value of precision (i.e., 0.68) just by relying on freely available metadata. We claim that the proposed approach can serve as the best alternative in the scenarios wherein content in unavailable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023

Article 07 December 2023

Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms

Important citation identification by exploiting the syntactic and contextual information of citations

Article 02 September 2020

Notes

References

Abu-Jbara, A., & Radev, D. (2011).Coherent citation-based summarization of scientific papers. In Proceedings of the 49th annual meeting of the association for computational linguistics (Vol. 1, pp. 500–509). Stroudsburg, PA: Association for Computational Linguistics.
Anderson, R., Narin, F., & McAllister, P. (1978). Publication ratings versus peer ratings of universities. Journal of the American Society for Information Science, 29(2), 91–103.
Article Google Scholar
Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.
Article Google Scholar
Benedictus, R., Miedema, F., & Ferguson, M. (2016). Fewer numbers, better science. Nature, 538(7626), 453–455.
Article Google Scholar
Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216.
Article Google Scholar
Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Article Google Scholar
Brooks, T. (1985). Private acts and public objects: An investigation of citer motivations. Journal of the American Society for Information Science, 6(4), 223–229.
Article Google Scholar
Case, D. O., & Higgins, G. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.
Article Google Scholar
Diederich, J., & Balke, W. T. (2007). The semantic growbag algorithm: Automatically deriving categorization systems. In International conference on theory and practice of digital libraries (pp. 1–13). Berlin: Springer.
Ellis, D. (1993). Modeling the information-seeking patterns of academic researchers: A grounded theory approach. The Library Quarterly, 63(4), 469–486.
Article Google Scholar
Finney, B. (1979). The reference characteristics of scientific texts. Master’s thesis. London: The City University of London.
Garfield, E. (1965). Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 269, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.
Garzone, M., & Mercer, R. (2000).Towards an automated citation classifier. In Conference of the canadian society for computational studies of intelligence (pp. 346–337). Berlin: Springer.
Giles, L. C., Bollacker, K., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In Proceedings of the third ACM conference on Digital libraries (pp. 88–98). ACM.
Hirsch, Jorge E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.
Article MATH Google Scholar
Inhaber, H., & Przednowek, K. (1976). Quality of research and the Nobel prizes. Social Studies of Science, 6(1), 33–50.
Article Google Scholar
Jeong, Y., Song, M., & Ding, Y. (2014). Content-based Author co-citation analysis. Journal of Informetrics, 8(1), 197–211.
Article Google Scholar
Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING’12 (pp. 1343–1358). Mumbai, India: COLING’12.
Krikelas, J. (1983). Information-seeking behavior: Patterns and concepts. Drexel Library Quarterly, 19(2), 5–20.
Google Scholar
Lawrence, S., Giles, C. L., & Bollacker, K. D. (1999). Digital libraries and autonomous citation indexing. Computer, 32(6), 67–71.
Article Google Scholar
Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of recent advances in natural language processing (pp. 402–407). Hissar, Bulgaria.
MacRoberts, M. H., & MacRoberts, B. R. (2018). The mismeasure of science: Citation analysis. Journal of the Association for Information Science and Technology, 69(3), 474–482.
Article Google Scholar
Mai, J. E. (2016). Looking for information: A survey of research on information seeking, needs, and behavior. Bingley: Emerald Group Publishing.
Google Scholar
Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & Börner, K. (2013). Global multi-level analysis of the ‘Scientific Food Web’. Scientific, reports, 3.
Google Scholar
Mehmood, Q., Qadir, M., & Afzal, M. (2014). Finding relatedness between research papers using similarity and dissimilarity scores. In 15th international conference Web-Age information Management (pp. 707–710). Macau, China.
Meyers, A. (2013). Contrasting and corroborating citations in journal articles. In Proceedings of the international conference recent advances in natural language processing RANLP (pp. 460–466). Hissar, Bulgaria: RANLP.
Moravcsik, J. M., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 88–91.
Article Google Scholar
Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Washington, DC: Computer Horizons.
Google Scholar
Oppenheim, C., & Renn, S. P. (1978). Cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information, 29(5), 227–231.
Google Scholar
Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.
Article Google Scholar
Pham, S., & Hoffmann, A. (2003). A new approach for scientific citation classification using cue phrases. In L. C. C. F. Tam´as Domonkos Gedeon (Ed.), AI 2003: Advances in artificial intelligence (Vol. 2903, pp. 759–771)., Lecture notes in computer science Berlin: Springer.
Chapter Google Scholar
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Article Google Scholar
Raheel, M., Ayaz, S., & Afzal, M. T. (2018). Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering. Scientometrics, 114(3), 1107–1127.
Article Google Scholar
Shahid, A., Afzal, M. T., & Qadir, M. A. (2011). Discovering semantic relatedness between scientific articles through citation. Australian Journal of Basic and Applied Sciences, 5(6), 1599–1604.
Google Scholar
Smith, A. T., & Eysenck, M. (2002). The correlation between RAE ratings and citation counts in psychology. London: University of Royal Holloway.
Google Scholar
Spiegel-Rusing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113.
Article Google Scholar
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics.
Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. Workshops at the twenty-ninth AAAI conference on artificial intelligence. AAAI
Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S. H., Jones, R., et al. (2015). The metric tide: Report of the independent review of the role of metrics in research assessment and management. Publisher Full Text.
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.
Article Google Scholar
Ziman, J. M. (1968). Public knowledge: An essay concerning the social dimension of science (Vol. 519). Cambridge: CUP Archive.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Capital University of Science and Technology, Islamabad, Pakistan
Faiza Qayyum & Muhammad Tanvir Afzal

Authors

Faiza Qayyum
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Tanvir Afzal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faiza Qayyum.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qayyum, F., Afzal, M.T. Identification of important citations by exploiting research articles’ metadata and cue-terms from content. Scientometrics 118, 21–43 (2019). https://doi.org/10.1007/s11192-018-2961-x

Download citation

Received: 23 October 2017
Published: 22 November 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s11192-018-2961-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

Abstract

Access this article

Similar content being viewed by others

Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023

Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms

Important citation identification by exploiting the syntactic and contextual information of citations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

Abstract

Access this article

Similar content being viewed by others

Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023

Citation Classification Prediction Implying Text Features Using Natural Language Processing and Supervised Machine Learning Algorithms

Important citation identification by exploiting the syntactic and contextual information of citations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation