Predicting the Age of Scientific Papers

Savov, Pavel; Jatowt, Adam; Nielek, Radoslaw

doi:10.1007/978-3-030-77961-0_58

Pavel Savov¹³,
Adam Jatowt¹⁴ &
Radoslaw Nielek¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12742))

Included in the following conference series:

International Conference on Computational Science

1609 Accesses

Abstract

In this paper we show how the age of scientific papers can be predicted given a diachronic corpus of papers from a particular domain published over a certain time period. We first train ordinal regression models for the task of predicting the age of individual sentences by fine-tuning series of BERT models for binary classification. We then aggregate the prediction results on individual sentences into a final result for entire papers. Using two corpora of publications from the International World Wide Web Conference and the Journal of Artificial Societies and Social Simulation, we compare various result aggregation methods, and show that the sentence-based approach produces better results than the direct document-level method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Achakulvisut, T., Bhagavatula, C., Acuna, D., Kording, K.: Claim extraction in biomedical publications using deep discourse model and transfer learning. arXiv preprint arXiv:1907.00962 (2019)
Barrios, F., López, F., Argerich, L., Wachenchauzer, R.: Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606 (2016)
Beltagy, I., Lo, K., Cohan, A.: Scibert: pretrained language model for scientific text. In: EMNLP (2019)
Google Scholar
Bird, S.: Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72 (2006)
Google Scholar
De Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: International Conference of the Association for History and Computing (AHC 2005), pp. 161–168. Koninklijke Nederlandse Academie van Wetenschappen, Amsterdam, the Netherlands (2005)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Garcia-Fernandez, A., Ligozat, A.L., Dinarelli, M., Bernhard, D.: When was it written? automatically determining publication dates. In: SPIRE, pp. 221–236 (2011)
Google Scholar
Jatowt, A., Campos, R.: Interactive system for reasoning about document age. In: CIKM 2017, pp. 2471–2474. ACM
Google Scholar
Kanhabua, N., Nørvåg, K.: Using temporal language models for document dating. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5782, pp. 738–741. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04174-7_53
Chapter Google Scholar
Kim, Y., Chiu, Y.I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. arXiv preprint arXiv:1405.3515 (2014)
Li, L., Lin, H.T.: Ordinal regression by extended binary classification. In: Advances in Neural Information Processing Systems, pp. 865–872 (2007)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Scoring, term weighting and the vector space model. Introduction to information retrieval 100, 2–4 (2008)
Google Scholar
Martin, P., Doucet, A., Jurie, F.: Dating color images with ordinal classification. In: ICMR, pp. 447–450 (2014)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP, pp. 404–411 (2004)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Niculae, V., Zampieri, M., Dinu, L.P., Ciobanu, A.M.: Temporal text ranking and automatic dating of texts. In: EACL, pp. 17–21 (2014)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Popescu, O., Strapparava, C.: Semeval 2015, task 7: diachronic text evaluation. In: Proceedings of SemEval 2015, pp. 870–878 (2015)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. Nist Special Publication Sp 109, 109 (1995)
Google Scholar
Savov, P., Jatowt, A., Nielek, R.: Innovativeness analysis of scholarly publications by age prediction using ordinal regression. In: Krzhizhanovskaya, V.V., Závodszky, G., Lees, M.H., Dongarra, J.J., Sloot, P.M.A., Brissos, S., Teixeira, J. (eds.) ICCS 2020. LNCS, vol. 12138, pp. 646–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50417-5_48
Chapter Google Scholar
Soni, S., Lerman, K., Eisenstein, J.: Follow the leader: Documents on the leading edge of semantic change get more citations. JASIST (2020)
Google Scholar
Surowiecki, J.: The wisdom of crowds. Anchor (2005)
Google Scholar
Vashishth, S., Dasgupta, S.S., Ray, S.N., Talukdar, P.: Dating documents using graph convolution networks. In: Proceedings of ACL, pp. 1605–1615 (2018)
Google Scholar
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: EMNLP: System Demonstrations, pp. 38–45 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Polish-Japanese Academy of Information Technology, ul. Koszykowa 86, 02-008, Warszawa, Poland
Pavel Savov & Radoslaw Nielek
University of Innsbruck, Innrain 52, 6020, Innsbruck, Austria
Adam Jatowt

Authors

Pavel Savov
View author publications
You can also search for this author in PubMed Google Scholar
Adam Jatowt
View author publications
You can also search for this author in PubMed Google Scholar
Radoslaw Nielek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavel Savov .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Savov, P., Jatowt, A., Nielek, R. (2021). Predicting the Age of Scientific Papers. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-77961-0_58
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics