Skip to main content
Log in

Semantic rule-based information extraction for meteorological reports

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Meteorological reports are one of the most important means of recording the weather conditions of a place over a period of time, and the existence of a large number of meteorological reports creates a huge demand for text processing and information extraction. However, valuable data and information are still buried deep in the mountain of meteorological reports, and there is an urgent need for an automated information extraction technique to help people integrate data from multiple meteorological reports and perform data analysis for a more comprehensive understanding of a specific meteorological topic or domain. Named entity recognition (NER) technique can extract useful entity information from meteorological reports. By analyzing the characteristics of nested entities in meteorological reports, this paper further proposes to introduce Multi-Conditional Random Fields (Multi-CRF), which uses each layer of CRF to output the recognition results of each type of entities, which helps to solve the problem of identifying nested entities in meteorological reports. The experimental results show that our model achieves state-of-the-art results. The final recognition results provide effective data support for automatic text verification recognition in the meteorological domain and provide important practical value for the construction of knowledge graphs of related meteorological reports.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70

    Article  Google Scholar 

  2. Lafferty J, McCallum A, Pereira F.C (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data

  3. Haojun F, Duan L, Zhang B, Jiangzhou L (2020) A collective entity linking method based on graph embedding algorithm. In: 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), pp. 1479–1482

  4. Niklaus C, Cetto M, Freitas A, Handschuh S (2018) A survey on open information extraction. arXiv preprint arXiv:1806.05599

  5. Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X, Sun L, Wu H (2022) Unified structure generation for universal information extraction. arXiv preprint arXiv:2203.12277

  6. Li Q, Li J, Sheng J, Cui S, Wu J, Hei Y, Peng H, Guo S, Wang L, Beheshti A, et al. (2022) A survey on deep learning event extraction: Approaches and applications. IEEE Transactions on Neural Networks and Learning Systems

  7. Li Q, Peng H, Li J, Wu J, Ning Y, Wang L, Philip SY, Wang Z (2021) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30:520–533

    Article  Google Scholar 

  8. de Castro Júnior S.L, da Silva I.J.O, Alves-Souza S.N, de Souza L.S (2020) Quality of meteorological data used in the context of agriculture: An issue. In: 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6 . IEEE

  9. Zheng L, Li X, Shi L, Qi S, Hu D, Chen Z (2019) Study on automatic and manual observation of precipitation weather phenomenon. In: 2019 International Conference on Meteorology Observations (ICMO), pp. 1–3 . IEEE

  10. Chenglin Q, Qing S, Pengzhou Z, Hui Y (2018) Cn-makg: China meteorology and agriculture knowledge graph construction based on semi-structured data. In: 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), pp. 692–696 . IEEE

  11. Sharnagat R (2014) Named entity recognition: A literature survey. Center For Indian Language Technology, 1–27

  12. Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  13. Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:1910.11470

  14. Akhundova N (2021) Named entity recognition for the azerbaijani language. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–7

  15. Feilmayr C (2011) Text mining-supported information extraction: An extended methodology for developing information extraction systems. In: 2011 22nd International Workshop on Database and Expert Systems Applications, pp. 217–221

  16. Liu C, Fan C, Wang Z, Sun Y (2020) An instance transfer-based approach using enhanced recurrent neural network for domain named entity recognition. IEEE Access 8:45263–45270

    Article  Google Scholar 

  17. Qiu J, Zhou Y, Wang Q, Ruan T, Gao J (2019) Chinese clinical named entity recognition using residual dilated convolutional neural network with conditional random field. IEEE Trans Nanobiosci 18(3):306–315

    Article  Google Scholar 

  18. Wang J, Shou L, Chen K, Chen G (2020) Pyramid: A layered model for nested named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5918–5928

  19. Cao Y, Peng H, Yu P.S (2020) Multi-information source hin for medical concept embedding. In: Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II 24, pp. 396–408 . Springer

  20. Yang Y, Yin X, Yang H, Fei X, Peng H, Zhou K, Lai K, Shen J (2021) Kgsynnet: A novel entity synonyms discovery framework with knowledge graph. In: Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part I 26, pp. 174–190 . Springer

  21. Devlin J, Chang M.-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  22. Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan Q, He L, et al (2023) A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419

  23. Tian M.-J, Cui R.-Y, Huang Z.-H (2018) Automatic extraction method for specific domain terms based on structural features and mutual information. In: 2018 5th International Conference on Information Science and Control Engineering (ICISCE), pp. 147–150

  24. Nakayama H, Kubo T, Kamura J, Taniguchi Y, Liang X (2018) doccano: Text annotation tool for human. Software available from https://github. com/doccano/doccano

  25. GAN T, GAN Y, HE Y (2019) Subsequence-level entity attention lstm for relation extraction. In: 2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, pp. 262–265

  26. Caruana R, Lawrence S, Giles L (2000) Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. NIPS’00, pp. 381–387. MIT Press, Cambridge, MA, USA

  27. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958

    MathSciNet  Google Scholar 

  28. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

  29. Ju M, Miwa M, Ananiadou S (2018) A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1446–1459

  30. Yu J, Bohnet B, Poesio M (2020) Named entity recognition as dependency parsing. arXiv preprint arXiv:2005.07150

  31. Li J, Fei H, Liu J, Wu S, Zhang M, Teng C, Ji D, Li F (2022) Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10965–10973

  32. Li X, Feng J, Meng Y, Han Q, Wu F, Li J (2019) A unified mrc framework for named entity recognition. arXiv preprint arXiv:1910.11476

  33. Yu S, Duan H, Wu Y (2018) Corpus of multi-level processing for modern chinese. Available at: opendata. pku. edu. cn/dataset. xhtml

  34. Sang E.F, De Meulder F (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050

  35. Zhang Y, Yang J (2018) Chinese ner using lattice lstm. arXiv preprint arXiv:1805.02023

  36. Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29:3504–3514

    Article  Google Scholar 

  37. Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2020) Revisiting pre-trained models for Chinese natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 657–668. Association for Computational Linguistics, Online

  38. Xu L, Zhang X, Dong Q (2020) Cluecorpus2020: A large-scale chinese corpus for pre-training language model. arXiv preprint arXiv:2003.01355

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant (No.42050102) and the Postgraduate Education Reform Project of Jiangsu Province under Grant (No. SJCX22_0343). Also, this work was supported by Dou Wanchun Expert Workstation of Yunnan Province (No. 202205AF150013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolong Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, M., Huang, R., Hu, Z. et al. Semantic rule-based information extraction for meteorological reports. Int. J. Mach. Learn. & Cyber. 15, 177–188 (2024). https://doi.org/10.1007/s13042-023-01885-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01885-8

Keywords

Navigation