Skip to main content

Advertisement

Log in

Computational linguistics literature and citations oriented citation linkage, classification and summarization

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: IEEE/Wic/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)

  2. Garca, J., Laurent, F., Gillard, O.F.: Bag-of-senses versus bag-of-words: comparing semantic and lexical approaches on sentence extraction. In: TAC 2008 Workshop—Notebook Papers and Results (2008)

  3. Bellemare, S., Bergler, S., Witte, R.: ERSS at TAC 2008. In: TAC 2008 Proceedings (2008)

  4. Conroy, J., Schlesinger, J.D.: CLASSY at TAC 2008 Metrics. In: TAC 2008 Proceedings (2008)

  5. Zheng, Y., Takenobu, T.: The TITech Summarization System at TAC-2009. In: TAC 2009 Proceedings (2009)

  6. Annie, L., Ani, N.: Predicting summary quality using limited human input. In: TAC 2009 Proceedings (2009)

  7. Darling, W.M.: Multi-document summarization from first principles. In: Proceedings of the third Text Analysis Conference, TAC-2010. NIST, vol. 150 (2010)

  8. Kokil, J., Muthu, K.C., Sajal, R., MinYen, K.: Overview of the 2nd computational linguistics scientific document summarization shared task (CL-SciSumm 2016). In: The Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), Newark, New Jersey, USA (2016)

  9. Genest, P., Lapalme, G., Qubec, M.: Text generation for abstractive summarization. In: TAC 2010 Proceedings (2010)

  10. Jin, F., Huang, M., Zhu, X.: The THU summarization systems at TAC 2010. In: Text Analysis Conference (2010)

  11. Zhang, R., Ouyang, Y., Li, W., Zhang, R., Ouyang, Y., Li, W.: Guided summarization with aspect recognition. In: TAC 2011 Proceedings (2011)

  12. Marina, L., Natalia, V.: Multilingual multi-document summarization with POLY. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)

  13. Steinberger, J.: The UWB summariser at multiling-2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)

  14. Ardjomand, N., Mcalister, J.C., Rogers, N.J., Tan, P.H., George, A.J., Larkin, D. F.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Multiling 2013 Workshop on Multilingual Multi-Document Summarization, pp. 3899–3905 (2013)

  15. Anechitei, D.A., Ignat, E.: Multi-lingual summarization system based on analyzing the discourse structure at MultiLing 2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)

  16. El-Haj, M., Rayson, P.: Using a keyness metric for single and multi document summarisation. In: Multiling 2013 Workshop, ACL (2013)

  17. Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(40), 592–600 (2014)

    Article  Google Scholar 

  18. Zhang, R., Li, W., Gao, D., Ouyang, Y.: Automatic twitter topic summarization with speech acts. IEEE Trans. Audio Speech Lang. Process. 21(3), 649–658 (2013)

    Article  Google Scholar 

  19. Xu, Y.D., Zhang, X.D., Quan, G.R., Wang, Y.D.: MRS for multi-document summarization by sentence extraction. Tele-commun. Syst. 53(1), 91–98 (2013)

    Article  Google Scholar 

  20. Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: The Workshop on Analytics for Noisy Unstructured Text Data. ACM, pp. 91–97 (2008)

  21. Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: ACM Conference on Recommender Systems, pp. 61–68 (2009)

  22. Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. Adv. Neural Inf. Process. Syst. 17, 537–544 (2010)

    Google Scholar 

  23. Blei, D.M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)

  24. Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22. In: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 1982–1989 (2009)

  25. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. Sharing clusters among related groups: hierarchical Dirichlet processes. Advanced Neural Inf Process Syst 37(2), 1385–1392 (2004)

  26. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 87–103 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  27. Celikyilmaz, A., HakkaniTur, D.: A hybrid hierarchical model for multi-document summarization. In: ACL 2010, Proceedings of the, Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 815–824 (2010)

  28. Ren, Z., De Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 93–102 (2015)

  29. Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)

    Article  Google Scholar 

  30. Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008)

  31. Qazvinian, V., Radev, D.R., Özgür, A.: Citation summarization through keyphrase extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics (2010)

  32. Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, pp. 500–509 (2010)

  33. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. arXiv:1301.3781 (2013)

  34. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Comput. Sci. 4, 1188–1196 (2014)

    Google Scholar 

  35. Heng, W., Yu, J., Li, L., Liu, Y.: Research on key factors in multi-document topic modelling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013)

    Google Scholar 

  36. Huang, T., Li, L., Zhang, Y., Chi, J.: Summarization based on multiple feature combination. In: Proceedings of 2016 4th IEEE International Conference on Cloud Computing and Intelligence Systems (IEEE CCIS 2016), 2016.8.17–19, Beijing, China, pp. 11–15 (2016)

  37. Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization with enhanced hLDA features. In: Springer Lecture Notes in Artificial Intelligence, LNAI10035, Subseries of Lecture Notes in Computer Science, Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated BigData, 15th China National Conference, CCL 2016, and 4th International Symposium, NLP-NABD 2016. Yantai, China, October 15–16, 2016, Proceedings, pp. 299–312 (2016)

  38. Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 132–138, June 2016

  39. Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 168–174, June 2016

  40. Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable Citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 175–186, June 2016

  41. Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT. Association of Computational Linguistics, Newark, NJ, USA, pp. 186–191 (2015)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 91546121, 61202247, 71231002 and 61472046; EU FP7 IRSES MobileCloud Project (Grant No. 612212); the 111 Project of China under Grant B08004; EngineeringResearch Center of Information Networks, Ministry of Education(MOE); MOE Liberal arts and Social Sciences Foundation under Grant 16YJA630011; BeijingInstitute of Science and Technology Information; CapInfo Company Limited.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Mao, L., Zhang, Y. et al. Computational linguistics literature and citations oriented citation linkage, classification and summarization. Int J Digit Libr 19, 173–190 (2018). https://doi.org/10.1007/s00799-017-0219-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-017-0219-5

Keywords

Navigation