ABSTRACT
This paper 1 presents an overview of a data augmented classification and multi-span (multiple) answer system for extracting key information from academic publications. This study consists of two sections: (a) implementing a new fine-tuned model to solve the multiple answer extraction issue (b) reporting results of sub-classification in various RF-EMF topics. In our previous study, it has been found that the essential cause of the low performance of the extractive question answering (EQA) system for certain types of questions was the multiple answer issue. To solve this problem, this study applies the TASE (TAg-based Span Extraction) technique and introduces the results. Our approach can retrieve multiple answers spreading over a given text by referring to the pre-trained TASE model with fine accuracy. In addition, this work adopts 'PEO (Population, Exposure, Outcome)' from the 'PECO' of the WHO-funded study on RF-EMF safety, as our holistic research framework. Based on the PEO perspective, the results of three sub-topic (RF, SAR, Causal Relationship) classifications are presented. For both models of multi-span answer and classification tasks, the data-augmenting method plays an important role. In particular, it is found that our proposed system outperforms the pre-trained BERT model in multi-span answer tasks with our RF-EMF dataset.
- S. Y. Feng, V. Ganga, J. Wei, S. Chandar, S. Vosoughi, T. Mitamura, and E. Hovy. 2021. A Survey of Data Augmentation Approaches for NLP. In Findings of the Association for Computational Linguistics (ACL-IJCNLP). Association for Computational Linguistics, Online, 968--988.Google Scholar
- J. Wei and K. Zou. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. 2019Google Scholar
- E. Segal, A. Efrat, M. Shoham, A. Globerson, and J. Berant. A Simple and Effective Model for Answering Multi-span Questions. In the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). November 2020.Google ScholarCross Ref
- M. Zhu, A. Ahuja, D-C. Juan, W. Wei, and C. Reddy. Question Answering with Long Multiple-Span Answers. Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics. November 2020.Google Scholar
- Malka N. Halgamuge, "Machine Learning for Bioelectromagnetics: Prediction Model using Data of Weak Radio frequency Radiation Effect on Plants" International Journal of Advanced Computer Science and Applications (IJACSA), 8(11), 2017. Google ScholarCross Ref
- Battineni G, Sagaro GG, Chinatalapudi N, Amenta F. Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis. J Pers Med. 2020 Mar 31;10(2):21. PMID: 32244292; PMCID: PMC7354442. Google ScholarCross Ref
- Kuo SH, Lin CY, Wang J, Sims PA, Pan MK, Liou JY, Lee D, Tate WJ, Kelly GC, Louis ED, Faust PL. Climbing fiber-Purkinje cell synaptic pathology in tremor and cerebellar degenerative diseases. Acta Neuropathol. 2017 Jan;133(1):121--138. Epub 2016 Oct 4. PMID: 27704282; PMCID: PMC5481163. Google ScholarCross Ref
- B. Erkal, S. Bašak, A. Çiloğlu and D. D. Şener, "Multiclass Classification of Brain Cancer with Machine Learning Algorithms," 2020 Medical Technologies Congress (TIPTEKNO), 2020, pp. 1--4 Google ScholarCross Ref
- Henschenmacher B, Bitsch A, de Las Heras Gala T, Forman HJ, Fragoulis A, Ghezzi P, Kellner R, Koch W, Kuhne J, Sachno D, Schmid G, Tsaioun K, Verbeek J, Wright R. The effect of radiofrequency electromagnetic fields (RFEMF) on biomarkers of oxidative stress in vivo and in vitro: A protocol for a systematic review. Environ Int. 2022 Jan;158:106932. Epub 2021 Oct 15. PMID: 34662800; PMCID: PMC8668870. Google ScholarCross Ref
- K. Pearce, T. Zhan, A. Komanduri, and J. Zhan. A Comparative Study of Transformer-Based Language Models on Extractive Question Answering. In Computing Research Repository (CoRR), October 2021.Google Scholar
- D. Bahdanau, KH. Cho, and Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In the Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), May 2015, San Diego, CA, USA.Google Scholar
- C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. Zhang, and M. Iyyer. BERT with History Answer Embedding for Conversational Question Answering. In the Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19), July 2019, New York, NY, United States. 1133--1136. Association for Computing Machinery.Google Scholar
- A. Esteva, A. Kale, R. Paulus, K. Hashimoto, W. Yin, D. Radev, and R. Socher. COVID-1 9 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization. Npj Digital Medicine, Vol. 4, No. 68 (2021).Google Scholar
- K. Won, H.-d. Choi, and S. Shin. Deep Learning-based Semantic Classification of EMF-related Scientific Literature. ACM SIGAPP Applied Computing Review, Vol. 21, No. 2 (Jan, 2021), 48--56.Google ScholarDigital Library
- J. Devlin, M-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In the Proceedings of NAACL-HLT 2019. Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.Google Scholar
- Y. Jang, H.-d. Choi, F. Deng, and S. Shin. Evaluation of deep learning models for information extraction from emf-related literature. In the Proceedings of the Conference on Research in Adaptive and Convergent Systems, RACS '19, 113--116, September 2019, New York, NY, USA, Association for Computing Machinery.Google ScholarDigital Library
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In the Proceedings of the 31st Conference on Neural Information Processing Systems 30 (NIPS 2017), December 2017, Long Beach, CA, USA. 6000--6010.Google ScholarDigital Library
- X. Niu, K. Yang, G. Zhang, Z. Yang, and X. Hu. A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions. Frontiers in Genetics, Vol. 10 (Jan, 2020).Google ScholarCross Ref
- Y. Yan, X-C. Yin, C. Yang, S. Li, and B-W. Zhang. Biomedical literature classification with a CNNs-based hybrid learning network. PLoS ONE 13, 7 (2018).Google ScholarCross Ref
- Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Hierarchical attention networks for document classification. In the Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480--1489, San Diego, California, June 2016. Association for Computational Linguistics.Google ScholarCross Ref
- M. N. Halgamuge. Supervised Machine Learning Algorithms for Bioelectromagnetics: Prediction Models and Feature Selection Techniques Using Data from Weak Radio frequency Radiation Effect on Human and Animals Cells. International Journal of Environmental Research and Public Health, Vol. 17. No. 12 (Jun 2020).Google Scholar
- Z. Pala, İ. Bozkurt, and T. Etem. Estimation of Low Frequency Electromagnetic Values Using Machine Learning. In the Proceedings of the 2017 XIIIth International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), April 2017, Lviv, Ukraine. 136--139.Google Scholar
- W. Yang, Y. Xie, L. Tan, K. Xiong, M. Li, and J. Lin. Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering. (Apr 2019). In the arXiv preprint https://arxiv.org/abs/1904.06652.Google Scholar
- M. d'Hoffschmidt, W. Belblidia, Q. Heinrich, and T. Brendlé. FQuAD: French Question Answering Dataset. In the Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19), July 2019, New York, NY, United States. 1133--1136. Association for Computing Machinery.Google Scholar
- P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100,000+ questions for machine comprehension of text. In the Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, November 2016, Austin, Texas. 2383--2392. Association for Computational Linguistics.Google ScholarCross Ref
- A. Otegi, I. S. Vicente, X. Saralegi, A. Peñas, B. Lozano, and E. Agirre. Information retrieval and question answering: A case study on COVID-19 scientific literature. Knowledge-Based Systems, Vol. 240 (2022).Google Scholar
- Halgamuge MN, Davis D. Lessons learned from the application of machine learning to studies on plant response to radio-frequency. Environ Res. 2019 Nov;178:108634. Epub 2019 Aug 16. PMID: 31450151. Google ScholarCross Ref
- Son Y, Kim JS, Jeong YJ, Jeong YK, Kwon JH, Choi HD, Pack JK, Kim N, Lee YS, Lee HJ. Long-term RF exposure on behavior and cerebral glucose metabolism in 5xFAD mice. Neurosci Lett. 2018 Feb 14;666:64--69. Epub 2017 Dec 19. PMID: 29273398. Google ScholarCross Ref
- Movvahedi MM, Tavakkoli-Golpayegani A, Mortazavi SA, Haghani M, Razi Z, Shojaie-Fard MB, Zare M, Mina E, Mansourabadi L, Nazari-Jahromi, Safari A, Shokrpour N, Mortazavi SM. Does exposure to GSM 900 MHz mobile phone radiation affect short-term memory of elementary school students? J Pediatr Neurosci. 2014 May;9(2):121--4. PMID: 25250064; PMCID: PMC4166831. Google ScholarCross Ref
- Arendash GW, Mori T, Dorsey M, Gonzalez R, Tajiri N, Borlongan C. Electromagnetic treatment to old Alzheimer's mice reverses β-amyloid deposition, modifies cerebral blood flow, and provides selected cognitive benefit. PLoS One. 2012;7(4):e35751. Epub 2012 Apr 25. PMID: 22558216; PMCID: PMC3338462. Google ScholarCross Ref
- Hugging Face: https://huggingface.co (accessed May 01, 2022)Google Scholar
- Kwanghee Won, Youngsun Jang, Hyung-do Choi, and Sung Shin. 2022. Design and implementation of information extraction system for scientific literature using fine-tuned deep learning models. SIGAPP Appl. Comput. Rev. 22, 1 (March 2022), 31--38. Google ScholarDigital Library
- TextAugment: https://github.com/dsfsi/textaugment (accessed May 24, 2022)Google Scholar
- DROP dataset: https://leaderboard.allenai.org/drop/submissions/public (accessed April 25, 2022)Google Scholar
Index Terms
- Deep learning models for multiple answers extraction and classification of scientific publications
Recommendations
A deep stacked wavelet auto-encoders to supervised feature extraction to pattern classification
The major issue in pattern classification is in the extraction of features in the training phase. The focus of this work is on combining the ability of wavelet networks and the deep learning techniques to propose a new supervised feature extraction ...
Classification of radiolarian images with hand-crafted and deep features
Radiolarians are planktonic protozoa and are important biostratigraphic and paleoenvironmental indicators for paleogeographic reconstructions. Radiolarian paleontology still remains as a low cost and the one of the most convenient way to obtain dating ...
Deep feature extraction with tri-channel textual feature map for text classification
AbstractThe complexity and diversity of texts make it difficult for shallow text classification models to capture deeper text features. Therefore, this paper takes advantage of the BiLSTM-CNN hybrid network based on the self-attention mechanism to ...
Highlights- We propose a novel text feature representation in the form of a tri-channel textual feature map.
- We designed a deep feature extraction network to capture deeper features in the text.
- We construct a deep feature extraction text ...
Comments