Unsupervised Citation Sentence Identification Based on Similarity Measurement

Ou, Shiyan; Kim, Hyonil

doi:10.1007/978-3-319-78105-1_42

Unsupervised Citation Sentence Identification Based on Similarity Measurement

Conference paper
First Online: 15 March 2018

5894 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10766))

Abstract

Citation Context Analysis has obtained the interest of many researchers in the field of bibliometrics. To do this, the first step is to extract the context of each citation from a citing paper. In this paper, we proposed a novel unsupervised approach for the identification of implicit citation sentences without attaching a citation tag. Our approach selects the neighboring sentences around an explicit citation sentence as candidate sentences, calculates the similarity between a candidate sentence and a cited or citing paper, and deems those that are more similar to the cited paper to be implicit citation sentences. To calculate text similarity, we proposed four methods based on the Doc2vec model, the Vector Space Model (VSM) and the LDA model respectively. The experiment results showed that the hybrid method combing the probabilistic TF-IDF weighted VSM with the TF-IDF weighted Doc2vec obtained the best performance. Compared against other supervised methods, our approach does not need any annotated training corpus, and thus can be easy to apply to other domains in theory.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
SENSEVAL is a English corpus used in a word sense disambiguation evaluation exercise, see https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/senseval.zip.
2.
Apache PDFBox is an open source Java PDF library, see https://pdfbox.apache.org/.

References

Abu-Jbara, A., Radev, D.: Reference scope identification in citing sentences. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 80–90. Association for Computational Linguistics (2012)
Google Scholar
Angrosh, M.A., Cranefield, S., Stanger, N.: Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 293–302. ACM (2010)
Google Scholar
Athar, A.: Sentiment analysis of citations using sentence structure-based features. In: Proceedings of the ACL 2011 Student Session, pp. 81–87. Association for Computational Linguistics (2011)
Google Scholar
Ding, Y., Zhang, G., Chambers, T., et al.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)
Article Google Scholar
Kaplan, D., Iida, R., Tokunaga, T.: Automatic extraction of citation contexts for research paper summarization: a coreference-chain based approach. In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pp. 88–95. Association for Computational Linguistics (2009)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1188–1196 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. IJCAI 99, 926–931 (1999)
Google Scholar
O’Connor, J.: Citing statements: computer recognition and use to improve retrieval. Inf. Process. Manag. 18(3), 125–131 (1982)
Article Google Scholar
Qazvinian, V., Radev, D.R.: Identifying non-explicit citing sentences for citation-based summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 555–564. Association for Computational Linguistics (2010)
Google Scholar
Sondhi, P., Zhai, C.X.: A constrained hidden Markov model approach for non-explicit citation context extraction. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 361–369. Society for Industrial and Applied Mathematics (2014)
Google Scholar

Download references

Acknowledgement

This paper is one of the research outputs of the project supported by the State Key Program of National Social Science Foundation of China (Grant No. 17ATQ001).

Author information

Authors and Affiliations

School of Information Management, Nanjing University, Nanjing, China
Shiyan Ou & Hyonil Kim

Authors

Shiyan Ou
View author publications
You can also search for this author in PubMed Google Scholar
Hyonil Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiyan Ou .

Editor information

Editors and Affiliations

Northumbria University, Newcastle upon Tyne, United Kingdom
Gobinda Chowdhury
Northumbria University, Newcastle upon Tyne, United Kingdom
Julie McLeod
University of Sheffield, Sheffield, United Kingdom
Val Gillet
University of Sheffield, Sheffield, United Kingdom
Peter Willett

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ou, S., Kim, H. (2018). Unsupervised Citation Sentence Identification Based on Similarity Measurement. In: Chowdhury, G., McLeod, J., Gillet, V., Willett, P. (eds) Transforming Digital Worlds. iConference 2018. Lecture Notes in Computer Science(), vol 10766. Springer, Cham. https://doi.org/10.1007/978-3-319-78105-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-78105-1_42
Published: 15 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78104-4
Online ISBN: 978-3-319-78105-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics