Domain Identification of Scientific Articles Using Transfer Learning and Ensembles

Hande, Adeep; Puranik, Karthik; Priyadharshini, Ruba; Chakravarthi, Bharathi Raja

doi:10.1007/978-3-030-75015-2_9

Adeep Hande¹⁰,
Karthik Puranik¹⁰,
Ruba Priyadharshini¹¹ &
…
Bharathi Raja Chakravarthi¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12705))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1420 Accesses

Abstract

This paper describes our transfer learning-based approach for domain identification of scientific articles as a part of the SDPRA-2021 Shared Task. We experiment with transfer learning using pre-trained language models (BERT, RoBERTa, SciBERT), and these are then fine-tuned for this task. The result shows that the ensemble approach performs best as the weights are being taken into consideration. We propose improvements for future work. The codes for the best system are published here: https://github.com/SDPRA-2021/shared-task/tree/main/IIITT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SsciBERT: a pre-trained language model for social science texts

Article 17 December 2022

FoRC@NSLP2024: Overview and Insights from the Field of Research Classification Shared Task

Sequential sentence classification in research papers using cross-domain multi-task learning

Article Open access 22 January 2024

Notes

References

Cohan, A., Goharian, N.: Scientific article summarization using citation-context and article’s discourse structure. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 390–400. Association for Computational Linguistics, Lisbon, September 2015. https://doi.org/10.18653/v1/D15-1045. https://www.aclweb.org/anthology/D15-1045
Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66(11), 2215–2222 (2015). https://doi.org/10.1002/asi.23329. https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.23329
Reddy, S., Saini., N.: Overview and insights from scope detection of the peer review articles shared tasks 2021. In: In: Proceedings of the The First Workshop & Shared Task on Scope Detection of the Peer Review Articles (SDPRA 2021) (forthcoming)
Google Scholar
Semberecki, P., Maciejewski, H.: Deep learning methods for subject text classification of articles, pp. 357–360 (2017). https://doi.org/10.15439/2017F414
Roul, R., Sahoo, J.: Classification of research articles hierarchically: a new technique, pp. 347–361, May 2017. https://doi.org/10.1007/978-981-10-3874-7_32
Taheriyan, M.: Subject classification of research papers based on interrelationships analysis. In: Proceedings of the 2011 Workshop on Knowledge Discovery, Modeling and Simulation, KDMS 2011, pp. 39–44. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2023568.2023579
Gonçalves, S., Cortez, P., Moro, S.: A deep learning classifier for sentence classification in biomedical and computer science abstracts. Neural Comput. Appl. 32(11), 6793–6807 (2019). https://doi.org/10.1007/s00521-019-04334-2
Article Google Scholar
Gurbuz, S., Aydin, G.: Classification of scientific papers with big data technologies. In: 2017 International Conference on Computer Science and Engineering (UBMK), pp. 697–701 (2017). https://doi.org/10.1109/UBMK.2017.8093504
Liu, X., Liu, Z., Wang, G., Cai, Z., Zhang, H.: Ensemble transfer learning algorithm. IEEE Access 6, 2389–2396 (2018). https://doi.org/10.1109/ACCESS.2017.2782884
Article Google Scholar
Bühlmann, P.: Bagging, boosting and ensemble methods. In: Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics, pp. 985–1022. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-21551-3_33
Chapter Google Scholar
Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Int. Res. 11(1), 169–198 (1999)
MATH Google Scholar
Ankit, Saleena, N.: An ensemble classification system for twitter sentiment analysis. Proc. Comput. Sci. 132, 937–946 (2018). https://doi.org/10.1016/j.procs.2018.05.109. http://www.sciencedirect.com/science/article/pii/S187705091830841X. International Conference on Computational Intelligence and Data Science
Dadu, T., Pant, K., Mamidi, R.: BERT-based ensembles for modeling disclosure and support in conversational social media text, June 2020
Google Scholar
Reddy, S., Saini, N.: SDPRA 2021 shared task data (2021). https://doi.org/10.17632/NJB74CZV49.1. https://data.mendeley.com/datasets/njb74czv49/1
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. CoRR abs/1506.06724 (2015). http://arxiv.org/abs/1506.06724
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam (2018). https://openreview.net/forum?id=rk6qdGgCZ
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach (2020). https://openreview.net/forum?id=SyxS0T4tvS
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620. Association for Computational Linguistics, Hong Kong, November 2019. https://doi.org/10.18653/v1/D19-1371. https://www.aclweb.org/anthology/D19-1371
Sharma, P., Roychowdhury, S.: IIT-KGP at MEDIQA 2019: recognizing question entailment using sci-BERT stacked with a gradient boosting classifier. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 471–477. Association for Computational Linguistics, Florence, August 2019. https://doi.org/10.18653/v1/W19-5050. https://www.aclweb.org/anthology/W19-5050
Kirch, W. (ed.): Pearson’s Correlation Coefficient, pp. 1090–1091. Springer, Dordrecht (2008). https://doi.org/10.1007/978-1-4020-5614-7_2569
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology Tiruchirappalli, Tiruchirappalli, Tamil Nadu, India
Adeep Hande & Karthik Puranik
ULTRA Arts and Science College, Madurai, Tamil Nadu, India
Ruba Priyadharshini
Insight SFI Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland
Bharathi Raja Chakravarthi

Authors

Adeep Hande
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Puranik
View author publications
You can also search for this author in PubMed Google Scholar
Ruba Priyadharshini
View author publications
You can also search for this author in PubMed Google Scholar
Bharathi Raja Chakravarthi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adeep Hande .

Editor information

Editors and Affiliations

Microsoft, Hyderabad, India
Manish Gupta
Indian Institute of Technology Bombay, Mumbai, India
Ganesh Ramakrishnan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hande, A., Puranik, K., Priyadharshini, R., Chakravarthi, B.R. (2021). Domain Identification of Scientific Articles Using Transfer Learning and Ensembles. In: Gupta, M., Ramakrishnan, G. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12705. Springer, Cham. https://doi.org/10.1007/978-3-030-75015-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-75015-2_9
Published: 03 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75014-5
Online ISBN: 978-3-030-75015-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics