Abstract
This paper describes our transfer learning-based approach for domain identification of scientific articles as a part of the SDPRA-2021 Shared Task. We experiment with transfer learning using pre-trained language models (BERT, RoBERTa, SciBERT), and these are then fine-tuned for this task. The result shows that the ensemble approach performs best as the weights are being taken into consideration. We propose improvements for future work. The codes for the best system are published here: https://github.com/SDPRA-2021/shared-task/tree/main/IIITT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Cohan, A., Goharian, N.: Scientific article summarization using citation-context and article’s discourse structure. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 390–400. Association for Computational Linguistics, Lisbon, September 2015. https://doi.org/10.18653/v1/D15-1045. https://www.aclweb.org/anthology/D15-1045
Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66(11), 2215–2222 (2015). https://doi.org/10.1002/asi.23329. https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.23329
Reddy, S., Saini., N.: Overview and insights from scope detection of the peer review articles shared tasks 2021. In: In: Proceedings of the The First Workshop & Shared Task on Scope Detection of the Peer Review Articles (SDPRA 2021) (forthcoming)
Semberecki, P., Maciejewski, H.: Deep learning methods for subject text classification of articles, pp. 357–360 (2017). https://doi.org/10.15439/2017F414
Roul, R., Sahoo, J.: Classification of research articles hierarchically: a new technique, pp. 347–361, May 2017. https://doi.org/10.1007/978-981-10-3874-7_32
Taheriyan, M.: Subject classification of research papers based on interrelationships analysis. In: Proceedings of the 2011 Workshop on Knowledge Discovery, Modeling and Simulation, KDMS 2011, pp. 39–44. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2023568.2023579
Gonçalves, S., Cortez, P., Moro, S.: A deep learning classifier for sentence classification in biomedical and computer science abstracts. Neural Comput. Appl. 32(11), 6793–6807 (2019). https://doi.org/10.1007/s00521-019-04334-2
Gurbuz, S., Aydin, G.: Classification of scientific papers with big data technologies. In: 2017 International Conference on Computer Science and Engineering (UBMK), pp. 697–701 (2017). https://doi.org/10.1109/UBMK.2017.8093504
Liu, X., Liu, Z., Wang, G., Cai, Z., Zhang, H.: Ensemble transfer learning algorithm. IEEE Access 6, 2389–2396 (2018). https://doi.org/10.1109/ACCESS.2017.2782884
Bühlmann, P.: Bagging, boosting and ensemble methods. In: Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics, pp. 985–1022. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-21551-3_33
Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Int. Res. 11(1), 169–198 (1999)
Ankit, Saleena, N.: An ensemble classification system for twitter sentiment analysis. Proc. Comput. Sci. 132, 937–946 (2018). https://doi.org/10.1016/j.procs.2018.05.109. http://www.sciencedirect.com/science/article/pii/S187705091830841X. International Conference on Computational Intelligence and Data Science
Dadu, T., Pant, K., Mamidi, R.: BERT-based ensembles for modeling disclosure and support in conversational social media text, June 2020
Reddy, S., Saini, N.: SDPRA 2021 shared task data (2021). https://doi.org/10.17632/NJB74CZV49.1. https://data.mendeley.com/datasets/njb74czv49/1
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. CoRR abs/1506.06724 (2015). http://arxiv.org/abs/1506.06724
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam (2018). https://openreview.net/forum?id=rk6qdGgCZ
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach (2020). https://openreview.net/forum?id=SyxS0T4tvS
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620. Association for Computational Linguistics, Hong Kong, November 2019. https://doi.org/10.18653/v1/D19-1371. https://www.aclweb.org/anthology/D19-1371
Sharma, P., Roychowdhury, S.: IIT-KGP at MEDIQA 2019: recognizing question entailment using sci-BERT stacked with a gradient boosting classifier. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 471–477. Association for Computational Linguistics, Florence, August 2019. https://doi.org/10.18653/v1/W19-5050. https://www.aclweb.org/anthology/W19-5050
Kirch, W. (ed.): Pearson’s Correlation Coefficient, pp. 1090–1091. Springer, Dordrecht (2008). https://doi.org/10.1007/978-1-4020-5614-7_2569
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hande, A., Puranik, K., Priyadharshini, R., Chakravarthi, B.R. (2021). Domain Identification of Scientific Articles Using Transfer Learning and Ensembles. In: Gupta, M., Ramakrishnan, G. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12705. Springer, Cham. https://doi.org/10.1007/978-3-030-75015-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-75015-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75014-5
Online ISBN: 978-3-030-75015-2
eBook Packages: Computer ScienceComputer Science (R0)