skip to main content
10.1145/3538641.3561508acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
poster

Deep learning models for multiple answers extraction and classification of scientific publications

Published:20 October 2022Publication History

ABSTRACT

This paper 1 presents an overview of a data augmented classification and multi-span (multiple) answer system for extracting key information from academic publications. This study consists of two sections: (a) implementing a new fine-tuned model to solve the multiple answer extraction issue (b) reporting results of sub-classification in various RF-EMF topics. In our previous study, it has been found that the essential cause of the low performance of the extractive question answering (EQA) system for certain types of questions was the multiple answer issue. To solve this problem, this study applies the TASE (TAg-based Span Extraction) technique and introduces the results. Our approach can retrieve multiple answers spreading over a given text by referring to the pre-trained TASE model with fine accuracy. In addition, this work adopts 'PEO (Population, Exposure, Outcome)' from the 'PECO' of the WHO-funded study on RF-EMF safety, as our holistic research framework. Based on the PEO perspective, the results of three sub-topic (RF, SAR, Causal Relationship) classifications are presented. For both models of multi-span answer and classification tasks, the data-augmenting method plays an important role. In particular, it is found that our proposed system outperforms the pre-trained BERT model in multi-span answer tasks with our RF-EMF dataset.

References

  1. S. Y. Feng, V. Ganga, J. Wei, S. Chandar, S. Vosoughi, T. Mitamura, and E. Hovy. 2021. A Survey of Data Augmentation Approaches for NLP. In Findings of the Association for Computational Linguistics (ACL-IJCNLP). Association for Computational Linguistics, Online, 968--988.Google ScholarGoogle Scholar
  2. J. Wei and K. Zou. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. 2019Google ScholarGoogle Scholar
  3. E. Segal, A. Efrat, M. Shoham, A. Globerson, and J. Berant. A Simple and Effective Model for Answering Multi-span Questions. In the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). November 2020.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Zhu, A. Ahuja, D-C. Juan, W. Wei, and C. Reddy. Question Answering with Long Multiple-Span Answers. Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics. November 2020.Google ScholarGoogle Scholar
  5. Malka N. Halgamuge, "Machine Learning for Bioelectromagnetics: Prediction Model using Data of Weak Radio frequency Radiation Effect on Plants" International Journal of Advanced Computer Science and Applications (IJACSA), 8(11), 2017. Google ScholarGoogle ScholarCross RefCross Ref
  6. Battineni G, Sagaro GG, Chinatalapudi N, Amenta F. Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis. J Pers Med. 2020 Mar 31;10(2):21. PMID: 32244292; PMCID: PMC7354442. Google ScholarGoogle ScholarCross RefCross Ref
  7. Kuo SH, Lin CY, Wang J, Sims PA, Pan MK, Liou JY, Lee D, Tate WJ, Kelly GC, Louis ED, Faust PL. Climbing fiber-Purkinje cell synaptic pathology in tremor and cerebellar degenerative diseases. Acta Neuropathol. 2017 Jan;133(1):121--138. Epub 2016 Oct 4. PMID: 27704282; PMCID: PMC5481163. Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Erkal, S. Bašak, A. Çiloğlu and D. D. Şener, "Multiclass Classification of Brain Cancer with Machine Learning Algorithms," 2020 Medical Technologies Congress (TIPTEKNO), 2020, pp. 1--4 Google ScholarGoogle ScholarCross RefCross Ref
  9. Henschenmacher B, Bitsch A, de Las Heras Gala T, Forman HJ, Fragoulis A, Ghezzi P, Kellner R, Koch W, Kuhne J, Sachno D, Schmid G, Tsaioun K, Verbeek J, Wright R. The effect of radiofrequency electromagnetic fields (RFEMF) on biomarkers of oxidative stress in vivo and in vitro: A protocol for a systematic review. Environ Int. 2022 Jan;158:106932. Epub 2021 Oct 15. PMID: 34662800; PMCID: PMC8668870. Google ScholarGoogle ScholarCross RefCross Ref
  10. K. Pearce, T. Zhan, A. Komanduri, and J. Zhan. A Comparative Study of Transformer-Based Language Models on Extractive Question Answering. In Computing Research Repository (CoRR), October 2021.Google ScholarGoogle Scholar
  11. D. Bahdanau, KH. Cho, and Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In the Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), May 2015, San Diego, CA, USA.Google ScholarGoogle Scholar
  12. C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. Zhang, and M. Iyyer. BERT with History Answer Embedding for Conversational Question Answering. In the Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19), July 2019, New York, NY, United States. 1133--1136. Association for Computing Machinery.Google ScholarGoogle Scholar
  13. A. Esteva, A. Kale, R. Paulus, K. Hashimoto, W. Yin, D. Radev, and R. Socher. COVID-1 9 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization. Npj Digital Medicine, Vol. 4, No. 68 (2021).Google ScholarGoogle Scholar
  14. K. Won, H.-d. Choi, and S. Shin. Deep Learning-based Semantic Classification of EMF-related Scientific Literature. ACM SIGAPP Applied Computing Review, Vol. 21, No. 2 (Jan, 2021), 48--56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Devlin, M-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In the Proceedings of NAACL-HLT 2019. Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.Google ScholarGoogle Scholar
  16. Y. Jang, H.-d. Choi, F. Deng, and S. Shin. Evaluation of deep learning models for information extraction from emf-related literature. In the Proceedings of the Conference on Research in Adaptive and Convergent Systems, RACS '19, 113--116, September 2019, New York, NY, USA, Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In the Proceedings of the 31st Conference on Neural Information Processing Systems 30 (NIPS 2017), December 2017, Long Beach, CA, USA. 6000--6010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Niu, K. Yang, G. Zhang, Z. Yang, and X. Hu. A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions. Frontiers in Genetics, Vol. 10 (Jan, 2020).Google ScholarGoogle ScholarCross RefCross Ref
  19. Y. Yan, X-C. Yin, C. Yang, S. Li, and B-W. Zhang. Biomedical literature classification with a CNNs-based hybrid learning network. PLoS ONE 13, 7 (2018).Google ScholarGoogle ScholarCross RefCross Ref
  20. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Hierarchical attention networks for document classification. In the Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480--1489, San Diego, California, June 2016. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. N. Halgamuge. Supervised Machine Learning Algorithms for Bioelectromagnetics: Prediction Models and Feature Selection Techniques Using Data from Weak Radio frequency Radiation Effect on Human and Animals Cells. International Journal of Environmental Research and Public Health, Vol. 17. No. 12 (Jun 2020).Google ScholarGoogle Scholar
  22. Z. Pala, İ. Bozkurt, and T. Etem. Estimation of Low Frequency Electromagnetic Values Using Machine Learning. In the Proceedings of the 2017 XIIIth International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), April 2017, Lviv, Ukraine. 136--139.Google ScholarGoogle Scholar
  23. W. Yang, Y. Xie, L. Tan, K. Xiong, M. Li, and J. Lin. Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering. (Apr 2019). In the arXiv preprint https://arxiv.org/abs/1904.06652.Google ScholarGoogle Scholar
  24. M. d'Hoffschmidt, W. Belblidia, Q. Heinrich, and T. Brendlé. FQuAD: French Question Answering Dataset. In the Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19), July 2019, New York, NY, United States. 1133--1136. Association for Computing Machinery.Google ScholarGoogle Scholar
  25. P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100,000+ questions for machine comprehension of text. In the Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, November 2016, Austin, Texas. 2383--2392. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  26. A. Otegi, I. S. Vicente, X. Saralegi, A. Peñas, B. Lozano, and E. Agirre. Information retrieval and question answering: A case study on COVID-19 scientific literature. Knowledge-Based Systems, Vol. 240 (2022).Google ScholarGoogle Scholar
  27. Halgamuge MN, Davis D. Lessons learned from the application of machine learning to studies on plant response to radio-frequency. Environ Res. 2019 Nov;178:108634. Epub 2019 Aug 16. PMID: 31450151. Google ScholarGoogle ScholarCross RefCross Ref
  28. Son Y, Kim JS, Jeong YJ, Jeong YK, Kwon JH, Choi HD, Pack JK, Kim N, Lee YS, Lee HJ. Long-term RF exposure on behavior and cerebral glucose metabolism in 5xFAD mice. Neurosci Lett. 2018 Feb 14;666:64--69. Epub 2017 Dec 19. PMID: 29273398. Google ScholarGoogle ScholarCross RefCross Ref
  29. Movvahedi MM, Tavakkoli-Golpayegani A, Mortazavi SA, Haghani M, Razi Z, Shojaie-Fard MB, Zare M, Mina E, Mansourabadi L, Nazari-Jahromi, Safari A, Shokrpour N, Mortazavi SM. Does exposure to GSM 900 MHz mobile phone radiation affect short-term memory of elementary school students? J Pediatr Neurosci. 2014 May;9(2):121--4. PMID: 25250064; PMCID: PMC4166831. Google ScholarGoogle ScholarCross RefCross Ref
  30. Arendash GW, Mori T, Dorsey M, Gonzalez R, Tajiri N, Borlongan C. Electromagnetic treatment to old Alzheimer's mice reverses β-amyloid deposition, modifies cerebral blood flow, and provides selected cognitive benefit. PLoS One. 2012;7(4):e35751. Epub 2012 Apr 25. PMID: 22558216; PMCID: PMC3338462. Google ScholarGoogle ScholarCross RefCross Ref
  31. Hugging Face: https://huggingface.co (accessed May 01, 2022)Google ScholarGoogle Scholar
  32. Kwanghee Won, Youngsun Jang, Hyung-do Choi, and Sung Shin. 2022. Design and implementation of information extraction system for scientific literature using fine-tuned deep learning models. SIGAPP Appl. Comput. Rev. 22, 1 (March 2022), 31--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. TextAugment: https://github.com/dsfsi/textaugment (accessed May 24, 2022)Google ScholarGoogle Scholar
  34. DROP dataset: https://leaderboard.allenai.org/drop/submissions/public (accessed April 25, 2022)Google ScholarGoogle Scholar

Index Terms

  1. Deep learning models for multiple answers extraction and classification of scientific publications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RACS '22: Proceedings of the Conference on Research in Adaptive and Convergent Systems
      October 2022
      208 pages
      ISBN:9781450393980
      DOI:10.1145/3538641

      Copyright © 2022 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 October 2022

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate393of1,581submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader