research-article

Identifying Important Citations by Incorporating Generative Model into Discriminative Classifiers

Authors:

Jinghong LiAuthors Info & Claims

IMMS '20: Proceedings of the 3rd International Conference on Information Management and Management Science

Pages 72 - 76

https://doi.org/10.1145/3416028.3416043

Published: 21 September 2020 Publication History

Abstract

Since Budapest open access initiative was launched, a large number of full-text articles in the format of XML are available, which further promotes the technology management on the basis of citation context analysis, such as emerging technology forecasting, technology opportunity detection and innovation measurement. Inspired by the success of kernel functions utilized to promote the performance of SVM (Support Vector Machine) model, we explore the potential of combining generative and discriminative models for the task of citation function and importance classification. In more details, generative features are generated from a topic model, Citation Influence Model (CIM), and then fed to two state-of-the-art discriminative models, SVM and RF (Random Forest), with other 13 features derived from citation contexts directly to identify important citations from a brand new perspective. Extensive experimental results on a dataset from the Association for Computational Linguistics anthology indicate that our approach outperforms the counterparts.

References

[1]

Abu-Jbara A., Ezra J. and Radev D. 2013. Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (Stroudsburg, PA, June 09-14, 2013). Association for Computational Linguistics, 596--606.

[2]

Bird S., Klein E. and Loper E. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media Inc.

[3]

Councill I. G., Giles C. L. and Kan M. Y. 2008. ParsCit: an open-source CRF reference string parsing package. In LREC (Marrakech, Morocco, May 28-30, 2008) European Language Resources Association, Paris, 661--667.

[4]

Davis J and Goadrich M. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning (Pittsburgh, USA, June 2006). ICML '06. ACM, New York, NY, 233--240. DOI= https://doi.org/10.1145/1143844.1143874.

Digital Library

[5]

Dietz L., Bickel S. and Scheffer T. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th international conference on Machine learning (Oregon, USA, June 20-24, 2007). ICML '07. ACM, New York, NY, 233--240. DOI= https://doi.org/10.1145/1273496.1273526.

Digital Library

[6]

Finney B. 1979. The reference characteristics of scientific texts. Doctoral Thesis. City University, London.

[7]

Garfield E. 1965. Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings (Washington, DC, December, 1965). National Bureau of Standards, Miscellaneous Publication 269, Washington, DC, 189--192.

[8]

Garzone M. and Mercer R. E. 2000. Towards an automated citation classifier. In Conference of the canadian society for computational studies of intelligence (Berlin, Heidelberg, May, 2000). Springer, Berlin, 337--346.

[9]

Hassan S. U., Akram A. and Haddawy P. 2017. Identifying important citations using contextual information from full text. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). (Toronto, ON, Canada, June 19-23, 2017). IEEE, New York, 1--8. DOI= 10.1109/JCDL.2017.7991558.

[10]

Hassan S. U., Safder I., Akram A., Kamiran F. 2018. A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis. Scientometrics. 116, 2 (2018), 973--996. DOI=https://doi.org/10.1007/s11192-018-2767-x.

Digital Library

[11]

Hassan S. U., Imran M., Iqbal S., Aljohani N. R., and Nawaz R. 2018. Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics. 117, 3 (2018), 1645--1662. DOI= https://doi.org/10.1007/s11192-018-2944-y.

Digital Library

[12]

Jaakkola T. and Haussler D. 1998. Exploiting generative models in discriminative classifiers. In Advances in neural information processing systems (Denver, Colorado, USA, November 30-December 05, 1998). Curran Associates, Inc, Red Hook, NY, 487--493.

[13]

Li X., He Y., Meyers A., Grishman R. 2013. Towards finegrained citation function classification. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP (Hissar, Bulgaria, September 7-13, 2013). Association for Computational Linguistics, Stroudsburg, PA, 402--407.

[14]

Provost F., Fawcett T. and Kohavi R. 1998. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning (Madison, Wisconsin, USA, 1998). ACM, New York, NY, 445--453.

[15]

Qayyum F. and Afzal M. T. 2019. Identification of important citations by exploiting research articles' metadata and cue-terms from content. Scientometrics. 118, 1 (NOV. 2019), 21--43. DOI= https://doi.org/10.1007/s11192-018-2961-x.

Digital Library

[16]

Spiegel-Rosing I. 1977. Science studies: bibliometric and content analysis. Soc. Stud. Sci. 7, 1 (1977), 97--113.

[17]

Tsuda K., Kawanabe M., Rätsch G., Sonnenburg S., and Müller K. R. 2002. A new discriminative kernel from probabilistic models. In Advances in Neural Information Processing Systems (Vancouver, Canada, December 09-14, 2002). Curran Associates, Inc, Red Hook, NY, 977--984.

[18]

Teufel S., Siddharthan A. and Tidhar D. 2006. Automatic classifcation of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (Sydney, Australia, July 22-23, 2006). Association for Computational Linguistics, Stroudsburg, PA, 103--110.

[19]

Tahamtan I. and Bornmann L. 2019. What Do Citation Counts Measure? An Updated Review of Studies on Citations in Scientific Documents Published between 2006 and 2018. Scientometrics. 121, 3 (Sep. 2019), 1635--1684. DOI= https://doi.org/10.1007/s11192-019-03243-4.

[20]

Valenzuela M., Ha V. and Etzioni O. 2015. Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence (Austin, TX, USA, April 2015). AAAI, Menlo Park, CA, 21--26.

[21]

Xu S., Ma F. and Tao L. 2007. Learn from the Information contained in the false splice sites as well as in the true splice sites with SVM. In International Conference on Intelligent Systems and Knowledge Engineering 2007 (October 2007). Atlantis Press, Paris, 1360--1366. DOI=https://doi.org/10.2991/iske.2007.13.

[22]

Xu S., Hao L. Y., An X., Yang G., and Wang F. 2019. Emerging research topics detection with multiple machine learning models. J. Informetr. 13, 4 (2019), 100983. DOI=https://doi.org/10.1016/j.joi.2019.100983.

[23]

Zhu X, Turney P, Lemire D, Vellino A. 2015. Measuring academic influence: not all citations are equal. J. Assoc. Inf. Sci. Technol. 66, 2 (2015), 408--427. DOI=https://doi.org/10.1002/asi.2317.

Digital Library

Index Terms

Identifying Important Citations by Incorporating Generative Model into Discriminative Classifiers
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction
  2. Information systems applications
    1. Data mining

Recommendations

Important citations identification by exploiting generative model into discriminative model

Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models ...
Identifying important citations using contextual information from full text
JCDL '17: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries

In this paper we address the problem of classifying cited work into important and non-important to the developments presented in a research publication. This task is vital for the algorithmic techniques that detect and follow emerging research topics ...
Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis

During Eugene Garfield's (EG's) lengthy career as information scientist, he published about 1500 papers. In this study, we use the impressive oeuvre of EG to introduce a new type of bibliometric networks: keyword co-occurrences networks based on the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IMMS '20: Proceedings of the 3rd International Conference on Information Management and Management Science

August 2020

120 pages

ISBN:9781450375467

DOI:10.1145/3416028

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Social Science Foundation of Beijing Municipality
Fundamental Research Funds for the Central Universities
Natural Science Foundation of Guangdong Province

Conference

IMMS 2020

IMMS 2020: 2020 3rd International Conference on Information Management and Management Science

August 7 - 9, 2020

London, United Kingdom

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
52
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten