skip to main content
10.1145/3416028.3416043acmotherconferencesArticle/Chapter ViewAbstractPublication PagesimmsConference Proceedingsconference-collections
research-article

Identifying Important Citations by Incorporating Generative Model into Discriminative Classifiers

Published: 21 September 2020 Publication History

Abstract

Since Budapest open access initiative was launched, a large number of full-text articles in the format of XML are available, which further promotes the technology management on the basis of citation context analysis, such as emerging technology forecasting, technology opportunity detection and innovation measurement. Inspired by the success of kernel functions utilized to promote the performance of SVM (Support Vector Machine) model, we explore the potential of combining generative and discriminative models for the task of citation function and importance classification. In more details, generative features are generated from a topic model, Citation Influence Model (CIM), and then fed to two state-of-the-art discriminative models, SVM and RF (Random Forest), with other 13 features derived from citation contexts directly to identify important citations from a brand new perspective. Extensive experimental results on a dataset from the Association for Computational Linguistics anthology indicate that our approach outperforms the counterparts.

References

[1]
Abu-Jbara A., Ezra J. and Radev D. 2013. Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (Stroudsburg, PA, June 09-14, 2013). Association for Computational Linguistics, 596--606.
[2]
Bird S., Klein E. and Loper E. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media Inc.
[3]
Councill I. G., Giles C. L. and Kan M. Y. 2008. ParsCit: an open-source CRF reference string parsing package. In LREC (Marrakech, Morocco, May 28-30, 2008) European Language Resources Association, Paris, 661--667.
[4]
Davis J and Goadrich M. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning (Pittsburgh, USA, June 2006). ICML '06. ACM, New York, NY, 233--240. DOI= https://doi.org/10.1145/1143844.1143874.
[5]
Dietz L., Bickel S. and Scheffer T. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th international conference on Machine learning (Oregon, USA, June 20-24, 2007). ICML '07. ACM, New York, NY, 233--240. DOI= https://doi.org/10.1145/1273496.1273526.
[6]
Finney B. 1979. The reference characteristics of scientific texts. Doctoral Thesis. City University, London.
[7]
Garfield E. 1965. Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings (Washington, DC, December, 1965). National Bureau of Standards, Miscellaneous Publication 269, Washington, DC, 189--192.
[8]
Garzone M. and Mercer R. E. 2000. Towards an automated citation classifier. In Conference of the canadian society for computational studies of intelligence (Berlin, Heidelberg, May, 2000). Springer, Berlin, 337--346.
[9]
Hassan S. U., Akram A. and Haddawy P. 2017. Identifying important citations using contextual information from full text. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). (Toronto, ON, Canada, June 19-23, 2017). IEEE, New York, 1--8. DOI= 10.1109/JCDL.2017.7991558.
[10]
Hassan S. U., Safder I., Akram A., Kamiran F. 2018. A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis. Scientometrics. 116, 2 (2018), 973--996. DOI=https://doi.org/10.1007/s11192-018-2767-x.
[11]
Hassan S. U., Imran M., Iqbal S., Aljohani N. R., and Nawaz R. 2018. Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics. 117, 3 (2018), 1645--1662. DOI= https://doi.org/10.1007/s11192-018-2944-y.
[12]
Jaakkola T. and Haussler D. 1998. Exploiting generative models in discriminative classifiers. In Advances in neural information processing systems (Denver, Colorado, USA, November 30-December 05, 1998). Curran Associates, Inc, Red Hook, NY, 487--493.
[13]
Li X., He Y., Meyers A., Grishman R. 2013. Towards finegrained citation function classification. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP (Hissar, Bulgaria, September 7-13, 2013). Association for Computational Linguistics, Stroudsburg, PA, 402--407.
[14]
Provost F., Fawcett T. and Kohavi R. 1998. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning (Madison, Wisconsin, USA, 1998). ACM, New York, NY, 445--453.
[15]
Qayyum F. and Afzal M. T. 2019. Identification of important citations by exploiting research articles' metadata and cue-terms from content. Scientometrics. 118, 1 (NOV. 2019), 21--43. DOI= https://doi.org/10.1007/s11192-018-2961-x.
[16]
Spiegel-Rosing I. 1977. Science studies: bibliometric and content analysis. Soc. Stud. Sci. 7, 1 (1977), 97--113.
[17]
Tsuda K., Kawanabe M., Rätsch G., Sonnenburg S., and Müller K. R. 2002. A new discriminative kernel from probabilistic models. In Advances in Neural Information Processing Systems (Vancouver, Canada, December 09-14, 2002). Curran Associates, Inc, Red Hook, NY, 977--984.
[18]
Teufel S., Siddharthan A. and Tidhar D. 2006. Automatic classifcation of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (Sydney, Australia, July 22-23, 2006). Association for Computational Linguistics, Stroudsburg, PA, 103--110.
[19]
Tahamtan I. and Bornmann L. 2019. What Do Citation Counts Measure? An Updated Review of Studies on Citations in Scientific Documents Published between 2006 and 2018. Scientometrics. 121, 3 (Sep. 2019), 1635--1684. DOI= https://doi.org/10.1007/s11192-019-03243-4.
[20]
Valenzuela M., Ha V. and Etzioni O. 2015. Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence (Austin, TX, USA, April 2015). AAAI, Menlo Park, CA, 21--26.
[21]
Xu S., Ma F. and Tao L. 2007. Learn from the Information contained in the false splice sites as well as in the true splice sites with SVM. In International Conference on Intelligent Systems and Knowledge Engineering 2007 (October 2007). Atlantis Press, Paris, 1360--1366. DOI=https://doi.org/10.2991/iske.2007.13.
[22]
Xu S., Hao L. Y., An X., Yang G., and Wang F. 2019. Emerging research topics detection with multiple machine learning models. J. Informetr. 13, 4 (2019), 100983. DOI=https://doi.org/10.1016/j.joi.2019.100983.
[23]
Zhu X, Turney P, Lemire D, Vellino A. 2015. Measuring academic influence: not all citations are equal. J. Assoc. Inf. Sci. Technol. 66, 2 (2015), 408--427. DOI=https://doi.org/10.1002/asi.2317.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IMMS '20: Proceedings of the 3rd International Conference on Information Management and Management Science
August 2020
120 pages
ISBN:9781450375467
DOI:10.1145/3416028
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Citation context analysis
  2. Discriminative model
  3. Generative model
  4. Important citations

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Social Science Foundation of Beijing Municipality
  • Fundamental Research Funds for the Central Universities
  • Natural Science Foundation of Guangdong Province

Conference

IMMS 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 52
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media