Skip to main content
Log in

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Citations play a pivotal role in indicating various aspects of scientific literature. Quantitative citation analysis approaches have been used over the decades to measure the impact factor of journals, to rank researchers or institutions, to discover evolving research topics etc. Researchers doubted the pure quantitative citation analysis approaches and argued that all citations are not equally important; citation reasons must be considered while counting. In the recent past, researchers have focused on identifying important citation reasons by classifying them into important and non-important classes rather than individually classifying each reason. Most of contemporary citation classification techniques either rely on full content of articles, or they are dominated by content based features. However, most of the time content is not freely available as various journal publishers do not provide open access to articles. This paper presents a binary citation classification scheme, which is dominated by metadata based parameters. The study demonstrates the significance of metadata and content based parameters in varying scenarios. The experiments are performed on two annotated data sets, which are evaluated by employing SVM, KLR, Random Forest machine learning classifiers. The results are compared with the contemporary study that has performed similar classification employing rich list of content-based features. The results of comparisons revealed that the proposed model has attained improved value of precision (i.e., 0.68) just by relying on freely available metadata. We claim that the proposed approach can serve as the best alternative in the scenarios wherein content in unavailable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://wordnet.princeton.edu/.

  2. http://www.lextek.com/onix/.

  3. https://lucene.apache.org/core/.

  4. http://weka.sourceforge.net/doc.packages/SMOTE/weka/filters/supervised/instance/SMOTE.html.

  5. http://allenai.org/data.htm.

References

  • Abu-Jbara, A., & Radev, D. (2011).Coherent citation-based summarization of scientific papers. In Proceedings of the 49th annual meeting of the association for computational linguistics (Vol. 1, pp. 500–509). Stroudsburg, PA: Association for Computational Linguistics.

  • Anderson, R., Narin, F., & McAllister, P. (1978). Publication ratings versus peer ratings of universities. Journal of the American Society for Information Science, 29(2), 91–103.

    Article  Google Scholar 

  • Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.

    Article  Google Scholar 

  • Benedictus, R., Miedema, F., & Ferguson, M. (2016). Fewer numbers, better science. Nature, 538(7626), 453–455.

    Article  Google Scholar 

  • Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216.

    Article  Google Scholar 

  • Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.

    Article  Google Scholar 

  • Brooks, T. (1985). Private acts and public objects: An investigation of citer motivations. Journal of the American Society for Information Science, 6(4), 223–229.

    Article  Google Scholar 

  • Case, D. O., & Higgins, G. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.

    Article  Google Scholar 

  • Diederich, J., & Balke, W. T. (2007). The semantic growbag algorithm: Automatically deriving categorization systems. In International conference on theory and practice of digital libraries (pp. 1–13). Berlin: Springer.

  • Ellis, D. (1993). Modeling the information-seeking patterns of academic researchers: A grounded theory approach. The Library Quarterly, 63(4), 469–486.

    Article  Google Scholar 

  • Finney, B. (1979). The reference characteristics of scientific texts. Master’s thesis. London: The City University of London.

  • Garfield, E. (1965). Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 269, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.

  • Garzone, M., & Mercer, R. (2000).Towards an automated citation classifier. In Conference of the canadian society for computational studies of intelligence (pp. 346–337). Berlin: Springer.

  • Giles, L. C., Bollacker, K., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In Proceedings of the third ACM conference on Digital libraries (pp. 88–98). ACM.

  • Hirsch, Jorge E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.

    Article  MATH  Google Scholar 

  • Inhaber, H., & Przednowek, K. (1976). Quality of research and the Nobel prizes. Social Studies of Science, 6(1), 33–50.

    Article  Google Scholar 

  • Jeong, Y., Song, M., & Ding, Y. (2014). Content-based Author co-citation analysis. Journal of Informetrics, 8(1), 197–211.

    Article  Google Scholar 

  • Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING’12 (pp. 1343–1358). Mumbai, India: COLING’12.

  • Krikelas, J. (1983). Information-seeking behavior: Patterns and concepts. Drexel Library Quarterly, 19(2), 5–20.

    Google Scholar 

  • Lawrence, S., Giles, C. L., & Bollacker, K. D. (1999). Digital libraries and autonomous citation indexing. Computer, 32(6), 67–71.

    Article  Google Scholar 

  • Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of recent advances in natural language processing (pp. 402–407). Hissar, Bulgaria.

  • MacRoberts, M. H., & MacRoberts, B. R. (2018). The mismeasure of science: Citation analysis. Journal of the Association for Information Science and Technology, 69(3), 474–482.

    Article  Google Scholar 

  • Mai, J. E. (2016). Looking for information: A survey of research on information seeking, needs, and behavior. Bingley: Emerald Group Publishing.

    Google Scholar 

  • Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & Börner, K. (2013). Global multi-level analysis of the ‘Scientific Food Web’. Scientific, reports, 3.

    Google Scholar 

  • Mehmood, Q., Qadir, M., & Afzal, M. (2014). Finding relatedness between research papers using similarity and dissimilarity scores. In 15th international conference Web-Age information Management (pp. 707–710). Macau, China.

  • Meyers, A. (2013). Contrasting and corroborating citations in journal articles. In Proceedings of the international conference recent advances in natural language processing RANLP (pp. 460–466). Hissar, Bulgaria: RANLP.

  • Moravcsik, J. M., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 88–91.

    Article  Google Scholar 

  • Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Washington, DC: Computer Horizons.

    Google Scholar 

  • Oppenheim, C., & Renn, S. P. (1978). Cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information, 29(5), 227–231.

    Google Scholar 

  • Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.

    Article  Google Scholar 

  • Pham, S., & Hoffmann, A. (2003). A new approach for scientific citation classification using cue phrases. In L. C. C. F. Tam´as Domonkos Gedeon (Ed.), AI 2003: Advances in artificial intelligence (Vol. 2903, pp. 759–771)., Lecture notes in computer science Berlin: Springer.

    Chapter  Google Scholar 

  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.

    Article  Google Scholar 

  • Raheel, M., Ayaz, S., & Afzal, M. T. (2018). Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering. Scientometrics, 114(3), 1107–1127.

    Article  Google Scholar 

  • Shahid, A., Afzal, M. T., & Qadir, M. A. (2011). Discovering semantic relatedness between scientific articles through citation. Australian Journal of Basic and Applied Sciences, 5(6), 1599–1604.

    Google Scholar 

  • Smith, A. T., & Eysenck, M. (2002). The correlation between RAE ratings and citation counts in psychology. London: University of Royal Holloway.

    Google Scholar 

  • Spiegel-Rusing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113.

    Article  Google Scholar 

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics.

  • Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. Workshops at the twenty-ninth AAAI conference on artificial intelligence. AAAI

  • Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S. H., Jones, R., et al. (2015). The metric tide: Report of the independent review of the role of metrics in research assessment and management. Publisher Full Text.

  • Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.

    Article  Google Scholar 

  • Ziman, J. M. (1968). Public knowledge: An essay concerning the social dimension of science (Vol. 519). Cambridge: CUP Archive.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faiza Qayyum.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qayyum, F., Afzal, M.T. Identification of important citations by exploiting research articles’ metadata and cue-terms from content. Scientometrics 118, 21–43 (2019). https://doi.org/10.1007/s11192-018-2961-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-018-2961-x

Keywords

Navigation