Automated SLR with a Few Labeled Papers and a Fair Workload Metric

Faria, Allan Victor Almeida; de Melo, Maísa Kely; de Oliveira, Flávio Augusto R.; Weigang, Li; Celestino, Victor Rafael Rezende

doi:10.1007/978-3-031-43088-6_1

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 494))

Included in the following conference series:

International Conference on Web Information Systems and Technologies

265 Accesses

Abstract

Citation screening is a crucial stage in conducting a Systematic Literature Review, where reviewers must read hundreds, if not thousands, of papers. Natural Language Processing-based models using Transformers have been successfully employed to automate this process and minimize the chances of missing relevant papers. In our research, we proposed three variations of these Transformer models, each with different pre-training techniques. With our models, reviewers only need to read 16 papers to train the model, thus saving as much as 80$\%$ of the workload. In addition, we revisited the AWSS@R metric, which normalized the WSS@R index and provided a fair way to estimate the workload saved using the different datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated Literature Review Using Large Language Models

Literature Hunter: Literature Reading Aided by Large Language Models

Utilizing Out-Domain Datasets to Enhance Multi-task Citation Analysis

Notes

References

Bannach-Brown, A., et al.: Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. System. Rev. 8(1), 1–12 (2019). https://doi.org/10.1186/s13643-019-0942-7
Article Google Scholar
Beltagy, I., Cohan, A., Lo, K.: Scibert: pretrained contextualized embeddings for scientific text. CoRR abs/1903.10676 (2019). http://arxiv.org/abs/1903.10676
van den Bulk, L.M., Bouzembrak, Y., Gavai, A., Liu, N., van den Heuvel, L.J., Marvin, H.J.: Automatic classification of literature in systematic reviews on food safety using machine learning. Curr. Res. Food Sci. 5, 84–95 (2022). https://doi.org/10.1016/j.crfs.2021.12.010
Article Google Scholar
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.E.: Big self-supervised models are strong semi-supervised learners. Adv. Neural Inf. Process. Syst. 33, 22243–22255 (2020). https://doi.org/10.48550/arXiv.2006.10029
Article Google Scholar
Cohen, A.M., Hersh, W.R., Peterson, K., Yen, P.Y.: Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inf. Assoc. 13(2), 206–219 (2006). https://doi.org/10.1197/jamia.M1929
Article Google Scholar
Collins, C., Dennehy, D., Conboy, K., Mikalef, P.: Artificial intelligence in information systems research: a systematic literature review and research agenda. Int. J. Inf. Manag. 60(June), 102383 (2021). https://doi.org/10.1016/j.ijinfomgt.2021.102383
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. ArXiv abs/1810.04805 (2019). https://doi.org/10.18653/v1/N19-1423
van Dinter, R., Catal, C., Tekinerdogan, B.: A multi-channel convolutional neural network approach to automate the citation screening process. Appl. Soft Comput. 112, 107765 (2021). https://doi.org/10.1016/j.asoc.2021.107765
Article Google Scholar
van Dinter, R., Tekinerdogan, B., Catal, C.: Automation of systematic literature reviews: a systematic literature review. Inf. Softw. Technol. 136, 106589 (2021). https://doi.org/10.1016/j.infsof.2021.106589
Article Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: A bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings Ninth IEEE International Conference on Computer Vision, pp. 1134–1141. IEEE (2003). https://doi.org/10.1109/ICCV.2003.1238476
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. CoRR abs/1703.03400 (2017). http://arxiv.org/abs/1703.03400
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 28 (2015). https://doi.org/10.48550/arXiv.1506.02626
Houlsby, N., et al.: Parameter-efficient transfer learning for nlp. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019). https://doi.org/10.48550/arXiv.1902.00751
Howard, B.E., et al.: Swift-review: a text-mining workbench for systematic review. Syst. Rev. 5(1), 1–16 (2016). https://doi.org/10.1186/s13643-016-0263-z
Article MathSciNet Google Scholar
IBM Cloud Education: Natural language processing (NLP) (2021). https://www.ibm.com/cloud/learn/natural-language-processing. Acessed 08 Mar 2022
Jackson, R.G., et al.: Ablations over transformer models for biomedical relationship extraction. F1000Research 9, 710 (2020). https://doi.org/10.12688/f1000research.24552.1
Kontonatsios, G., Spencer, S., Matthew, P., Korkontzelos, I.: Using a neural network-based feature extraction method to facilitate citation screening for systematic reviews. Expert Syst. Appl. X 6, 100030 (2020). https://doi.org/10.1016/j.eswax.2020.100030
Article Google Scholar
Kurtic, E., et al.: The optimal bert surgeon: Scalable and accurate second-order pruning for large language models (2022). arXiv preprint arXiv:2203.07259
Kusa, W., Hanbury, A., Knoth, P.: Automation of citation screening for systematic literature reviews using neural networks: a replicability study (2022). arXiv preprint arXiv:2201.07534
Kusa, W., Lipani, A., Knoth, P., Hanbury, A.: An analysis of work saved over sampling in the evaluation of automated citation screening in systematic literature reviews. Intell. Syst. Appl. 18, 200193 (2023). https://doi.org/10.1016/j.iswa.2023.200193
Article Google Scholar
van der Maaten, L., Hinton, G.E.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008). https://www.jmlr.org/papers/v9/vandermaaten08a.html
Melo, M., et al.: Few-shot approach for systematic literature review classifications. In: 18th International Conference on Web Information Systems and Technologies (2022). https://doi.org/10.5220/0011526400003318
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. CoRR abs/1803.02999 (2018). http://arxiv.org/abs/1803.02999
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. CoRR abs/1802.05365 (2018). http://arxiv.org/abs/1802.05365
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019). http://arxiv.org/abs/1908.10084
van de Schoot, R., et al.: An open source machine learning framework for efficient and transparent systematic reviews. Nat. Mach. Intell. 3(2), 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7
Article Google Scholar
Sellak, H., Ouhbi, B., Frikh, B.: Using rule-based classifiers in systematic reviews: a semantic class association rules approach. In: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services, pp. 1–5 (2015). https://doi.org/10.1145/2837185.2837279
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.: Mpnet: masked and permuted pre-training for language understanding. CoRR abs/2004.09297 (2020). https://arxiv.org/abs/2004.09297
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Chapter Google Scholar
Tsafnat, G., Glasziou, P., Karystianis, G., Coiera, E.: Automated screening of research studies for systematic reviews using study characteristics. Syst. Rev. 7(1), 1–9 (2018). https://doi.org/10.1186/s13643-018-0724-7
Article Google Scholar
Wang, S., Fang, H., Khabsa, M., Mao, H., Ma, H.: Entailment as few-shot learner. CoRR abs/2104.14690 (2021). https://arxiv.org/abs/2104.14690
Weigang, L., da Silva, N.C.: A study of parallel neural networks. In: IJCNN 1999. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), vol. 2, pp. 1113–1116. IEEE (1999). https://doi.org/10.1109/IJCNN.1999.831112
Wu, L., Won, Y.S., Jap, D., Perin, G., Bhasin, S., Picek, S.: Explain some noise: ablation analysis for deep learning-based physical side-channel analysis. Cryptology ePrint Archive (2021). https://eprint.iacr.org/2021/717

Download references

Acknowledgement

We sincerely thank the Brazilian Ministry of Science, Technology, and Innovation, which partially supported this project.

Author information

Authors and Affiliations

LAMFO - Laboratory of ML in Finance and Organizations, University of Brasilia Campus Darcy Ribeiro, Brasilia, Brazil
Allan Victor Almeida Faria, Maísa Kely de Melo, Flávio Augusto R. de Oliveira, Li Weigang & Victor Rafael Rezende Celestino
University of Brasilia Campus Darcy Ribeiro, Brasília, Brazil
Allan Victor Almeida Faria, Li Weigang & Victor Rafael Rezende Celestino
Federal Institute of Minas Gerais Campus Formiga, Formiga, Brazil
Maísa Kely de Melo

Authors

Allan Victor Almeida Faria
View author publications
You can also search for this author in PubMed Google Scholar
Maísa Kely de Melo
View author publications
You can also search for this author in PubMed Google Scholar
Flávio Augusto R. de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Li Weigang
View author publications
You can also search for this author in PubMed Google Scholar
Victor Rafael Rezende Celestino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor Rafael Rezende Celestino .

Editor information

Editors and Affiliations

University of Padua (UNIPD), Padua, Italy
Massimo Marchiori
University of Seville, Seville, Spain
Francisco José Domínguez Mayo
Polytechnic Institute of Setúbal/INSTICC, Setubal, Portugal
Joaquim Filipe

Appendix

Table 2. Summary of the mean and std. deviation of five validations, considering 16 examples (eight positive and eight negative) in the domain learner phase, after training the respective ML-SLRC in the meta learner phase (50–50 split).

Full size table

Table 3. Summary of the mean and std. deviation of five validations, considering 16 examples (eight positive and eight negative) in the domain learner phase, after training the respective ML-SLRC in the meta learner phase (benchmarking).

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faria, A.V.A., de Melo, M.K., de Oliveira, F.A.R., Weigang, L., Celestino, V.R.R. (2023). Automated SLR with a Few Labeled Papers and a Fair Workload Metric. In: Marchiori, M., Domínguez Mayo, F.J., Filipe, J. (eds) Web Information Systems and Technologies. WEBIST 2022. Lecture Notes in Business Information Processing, vol 494. Springer, Cham. https://doi.org/10.1007/978-3-031-43088-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-43088-6_1
Published: 29 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43087-9
Online ISBN: 978-3-031-43088-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automated SLR with a Few Labeled Papers and a Fair Workload Metric

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Literature Review Using Large Language Models

Literature Hunter: Literature Reading Aided by Large Language Models

Utilizing Out-Domain Datasets to Enhance Multi-task Citation Analysis

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automated SLR with a Few Labeled Papers and a Fair Workload Metric

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Literature Review Using Large Language Models

Literature Hunter: Literature Reading Aided by Large Language Models

Utilizing Out-Domain Datasets to Enhance Multi-task Citation Analysis

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation