Abstract
Citations play a pivotal role in indicating various aspects of scientific literature. Quantitative citation analysis approaches have been used over the decades to measure the impact factor of journals, to rank researchers or institutions, to discover evolving research topics etc. Researchers doubted the pure quantitative citation analysis approaches and argued that all citations are not equally important; citation reasons must be considered while counting. In the recent past, researchers have focused on identifying important citation reasons by classifying them into important and non-important classes rather than individually classifying each reason. Most of contemporary citation classification techniques either rely on full content of articles, or they are dominated by content based features. However, most of the time content is not freely available as various journal publishers do not provide open access to articles. This paper presents a binary citation classification scheme, which is dominated by metadata based parameters. The study demonstrates the significance of metadata and content based parameters in varying scenarios. The experiments are performed on two annotated data sets, which are evaluated by employing SVM, KLR, Random Forest machine learning classifiers. The results are compared with the contemporary study that has performed similar classification employing rich list of content-based features. The results of comparisons revealed that the proposed model has attained improved value of precision (i.e., 0.68) just by relying on freely available metadata. We claim that the proposed approach can serve as the best alternative in the scenarios wherein content in unavailable.
Similar content being viewed by others
References
Abu-Jbara, A., & Radev, D. (2011).Coherent citation-based summarization of scientific papers. In Proceedings of the 49th annual meeting of the association for computational linguistics (Vol. 1, pp. 500–509). Stroudsburg, PA: Association for Computational Linguistics.
Anderson, R., Narin, F., & McAllister, P. (1978). Publication ratings versus peer ratings of universities. Journal of the American Society for Information Science, 29(2), 91–103.
Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.
Benedictus, R., Miedema, F., & Ferguson, M. (2016). Fewer numbers, better science. Nature, 538(7626), 453–455.
Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216.
Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Brooks, T. (1985). Private acts and public objects: An investigation of citer motivations. Journal of the American Society for Information Science, 6(4), 223–229.
Case, D. O., & Higgins, G. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.
Diederich, J., & Balke, W. T. (2007). The semantic growbag algorithm: Automatically deriving categorization systems. In International conference on theory and practice of digital libraries (pp. 1–13). Berlin: Springer.
Ellis, D. (1993). Modeling the information-seeking patterns of academic researchers: A grounded theory approach. The Library Quarterly, 63(4), 469–486.
Finney, B. (1979). The reference characteristics of scientific texts. Master’s thesis. London: The City University of London.
Garfield, E. (1965). Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 269, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.
Garzone, M., & Mercer, R. (2000).Towards an automated citation classifier. In Conference of the canadian society for computational studies of intelligence (pp. 346–337). Berlin: Springer.
Giles, L. C., Bollacker, K., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In Proceedings of the third ACM conference on Digital libraries (pp. 88–98). ACM.
Hirsch, Jorge E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.
Inhaber, H., & Przednowek, K. (1976). Quality of research and the Nobel prizes. Social Studies of Science, 6(1), 33–50.
Jeong, Y., Song, M., & Ding, Y. (2014). Content-based Author co-citation analysis. Journal of Informetrics, 8(1), 197–211.
Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING’12 (pp. 1343–1358). Mumbai, India: COLING’12.
Krikelas, J. (1983). Information-seeking behavior: Patterns and concepts. Drexel Library Quarterly, 19(2), 5–20.
Lawrence, S., Giles, C. L., & Bollacker, K. D. (1999). Digital libraries and autonomous citation indexing. Computer, 32(6), 67–71.
Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of recent advances in natural language processing (pp. 402–407). Hissar, Bulgaria.
MacRoberts, M. H., & MacRoberts, B. R. (2018). The mismeasure of science: Citation analysis. Journal of the Association for Information Science and Technology, 69(3), 474–482.
Mai, J. E. (2016). Looking for information: A survey of research on information seeking, needs, and behavior. Bingley: Emerald Group Publishing.
Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & Börner, K. (2013). Global multi-level analysis of the ‘Scientific Food Web’. Scientific, reports, 3.
Mehmood, Q., Qadir, M., & Afzal, M. (2014). Finding relatedness between research papers using similarity and dissimilarity scores. In 15th international conference Web-Age information Management (pp. 707–710). Macau, China.
Meyers, A. (2013). Contrasting and corroborating citations in journal articles. In Proceedings of the international conference recent advances in natural language processing RANLP (pp. 460–466). Hissar, Bulgaria: RANLP.
Moravcsik, J. M., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 88–91.
Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Washington, DC: Computer Horizons.
Oppenheim, C., & Renn, S. P. (1978). Cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information, 29(5), 227–231.
Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.
Pham, S., & Hoffmann, A. (2003). A new approach for scientific citation classification using cue phrases. In L. C. C. F. Tam´as Domonkos Gedeon (Ed.), AI 2003: Advances in artificial intelligence (Vol. 2903, pp. 759–771)., Lecture notes in computer science Berlin: Springer.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Raheel, M., Ayaz, S., & Afzal, M. T. (2018). Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering. Scientometrics, 114(3), 1107–1127.
Shahid, A., Afzal, M. T., & Qadir, M. A. (2011). Discovering semantic relatedness between scientific articles through citation. Australian Journal of Basic and Applied Sciences, 5(6), 1599–1604.
Smith, A. T., & Eysenck, M. (2002). The correlation between RAE ratings and citation counts in psychology. London: University of Royal Holloway.
Spiegel-Rusing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics.
Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. Workshops at the twenty-ninth AAAI conference on artificial intelligence. AAAI
Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S. H., Jones, R., et al. (2015). The metric tide: Report of the independent review of the role of metrics in research assessment and management. Publisher Full Text.
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.
Ziman, J. M. (1968). Public knowledge: An essay concerning the social dimension of science (Vol. 519). Cambridge: CUP Archive.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qayyum, F., Afzal, M.T. Identification of important citations by exploiting research articles’ metadata and cue-terms from content. Scientometrics 118, 21–43 (2019). https://doi.org/10.1007/s11192-018-2961-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-018-2961-x