Abstract
Meteorological reports are one of the most important means of recording the weather conditions of a place over a period of time, and the existence of a large number of meteorological reports creates a huge demand for text processing and information extraction. However, valuable data and information are still buried deep in the mountain of meteorological reports, and there is an urgent need for an automated information extraction technique to help people integrate data from multiple meteorological reports and perform data analysis for a more comprehensive understanding of a specific meteorological topic or domain. Named entity recognition (NER) technique can extract useful entity information from meteorological reports. By analyzing the characteristics of nested entities in meteorological reports, this paper further proposes to introduce Multi-Conditional Random Fields (Multi-CRF), which uses each layer of CRF to output the recognition results of each type of entities, which helps to solve the problem of identifying nested entities in meteorological reports. The experimental results show that our model achieves state-of-the-art results. The final recognition results provide effective data support for automatic text verification recognition in the meteorological domain and provide important practical value for the construction of knowledge graphs of related meteorological reports.
Similar content being viewed by others
References
Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70
Lafferty J, McCallum A, Pereira F.C (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data
Haojun F, Duan L, Zhang B, Jiangzhou L (2020) A collective entity linking method based on graph embedding algorithm. In: 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), pp. 1479–1482
Niklaus C, Cetto M, Freitas A, Handschuh S (2018) A survey on open information extraction. arXiv preprint arXiv:1806.05599
Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X, Sun L, Wu H (2022) Unified structure generation for universal information extraction. arXiv preprint arXiv:2203.12277
Li Q, Li J, Sheng J, Cui S, Wu J, Hei Y, Peng H, Guo S, Wang L, Beheshti A, et al. (2022) A survey on deep learning event extraction: Approaches and applications. IEEE Transactions on Neural Networks and Learning Systems
Li Q, Peng H, Li J, Wu J, Ning Y, Wang L, Philip SY, Wang Z (2021) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30:520–533
de Castro Júnior S.L, da Silva I.J.O, Alves-Souza S.N, de Souza L.S (2020) Quality of meteorological data used in the context of agriculture: An issue. In: 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6 . IEEE
Zheng L, Li X, Shi L, Qi S, Hu D, Chen Z (2019) Study on automatic and manual observation of precipitation weather phenomenon. In: 2019 International Conference on Meteorology Observations (ICMO), pp. 1–3 . IEEE
Chenglin Q, Qing S, Pengzhou Z, Hui Y (2018) Cn-makg: China meteorology and agriculture knowledge graph construction based on semi-structured data. In: 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), pp. 692–696 . IEEE
Sharnagat R (2014) Named entity recognition: A literature survey. Center For Indian Language Technology, 1–27
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:1910.11470
Akhundova N (2021) Named entity recognition for the azerbaijani language. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–7
Feilmayr C (2011) Text mining-supported information extraction: An extended methodology for developing information extraction systems. In: 2011 22nd International Workshop on Database and Expert Systems Applications, pp. 217–221
Liu C, Fan C, Wang Z, Sun Y (2020) An instance transfer-based approach using enhanced recurrent neural network for domain named entity recognition. IEEE Access 8:45263–45270
Qiu J, Zhou Y, Wang Q, Ruan T, Gao J (2019) Chinese clinical named entity recognition using residual dilated convolutional neural network with conditional random field. IEEE Trans Nanobiosci 18(3):306–315
Wang J, Shou L, Chen K, Chen G (2020) Pyramid: A layered model for nested named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5918–5928
Cao Y, Peng H, Yu P.S (2020) Multi-information source hin for medical concept embedding. In: Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II 24, pp. 396–408 . Springer
Yang Y, Yin X, Yang H, Fei X, Peng H, Zhou K, Lai K, Shen J (2021) Kgsynnet: A novel entity synonyms discovery framework with knowledge graph. In: Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part I 26, pp. 174–190 . Springer
Devlin J, Chang M.-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan Q, He L, et al (2023) A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419
Tian M.-J, Cui R.-Y, Huang Z.-H (2018) Automatic extraction method for specific domain terms based on structural features and mutual information. In: 2018 5th International Conference on Information Science and Control Engineering (ICISCE), pp. 147–150
Nakayama H, Kubo T, Kamura J, Taniguchi Y, Liang X (2018) doccano: Text annotation tool for human. Software available from https://github. com/doccano/doccano
GAN T, GAN Y, HE Y (2019) Subsequence-level entity attention lstm for relation extraction. In: 2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, pp. 262–265
Caruana R, Lawrence S, Giles L (2000) Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. NIPS’00, pp. 381–387. MIT Press, Cambridge, MA, USA
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Ju M, Miwa M, Ananiadou S (2018) A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1446–1459
Yu J, Bohnet B, Poesio M (2020) Named entity recognition as dependency parsing. arXiv preprint arXiv:2005.07150
Li J, Fei H, Liu J, Wu S, Zhang M, Teng C, Ji D, Li F (2022) Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10965–10973
Li X, Feng J, Meng Y, Han Q, Wu F, Li J (2019) A unified mrc framework for named entity recognition. arXiv preprint arXiv:1910.11476
Yu S, Duan H, Wu Y (2018) Corpus of multi-level processing for modern chinese. Available at: opendata. pku. edu. cn/dataset. xhtml
Sang E.F, De Meulder F (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050
Zhang Y, Yang J (2018) Chinese ner using lattice lstm. arXiv preprint arXiv:1805.02023
Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29:3504–3514
Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2020) Revisiting pre-trained models for Chinese natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 657–668. Association for Computational Linguistics, Online
Xu L, Zhang X, Dong Q (2020) Cluecorpus2020: A large-scale chinese corpus for pre-training language model. arXiv preprint arXiv:2003.01355
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant (No.42050102) and the Postgraduate Education Reform Project of Jiangsu Province under Grant (No. SJCX22_0343). Also, this work was supported by Dou Wanchun Expert Workstation of Yunnan Province (No. 202205AF150013).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cui, M., Huang, R., Hu, Z. et al. Semantic rule-based information extraction for meteorological reports. Int. J. Mach. Learn. & Cyber. 15, 177–188 (2024). https://doi.org/10.1007/s13042-023-01885-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01885-8