Skip to main content

Advertisement

Log in

Few-shot learning for name entity recognition in geological text based on GeoBERT

  • Research Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Geological reports are records of the geological elements and survey contents found in geological exploration, but it is difficult to extract useful concepts from such reports. In the process of information extraction, accurately identification of entities in unstructured geotext is a foundational task that is known as geological named entity recognition (Geo-NER). However, the existing methods generally require a large number of annotated corpora, and face problems with long entity recognition. Therefore, this paper proposes a two-stage fine-tuning method. In the first fine-tuning stage, we use a bidirectional encoder representations from transformers language model with geological domain knowledge (GeoBERT), which combines geological domain knowledge, on a pretrained BERT model, and in the second stage, we use a small number of samples to complete the NER task in the geological report based on GeoBERT. Our proposed model achieves a very high F1-score compared to baseline models on the constructed dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

Download references

Acknowledgments

This study was supported by the National Science Foundation of China (Grant No. 41871311, 42050101), Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (No. KLIGIP-2021A01), and the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (No. CUG2106116)). The authors thank the Development and Research Center of the China Geological Survey for providing technical support. We thank the National Engineering Research Center of Geographic Information System for providing hardware support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Wu.

Ethics declarations

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by: H. Babaie

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

• The study proposes a two-stage fine-tuning method for name entity recognition.

• Considering the problem of small sample datasets and long entity identification.

• Capturing long-distance dependency features within longer geological entities.

• The method has achieved better performances compared to other models.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Qiu, Q., Wu, L. et al. Few-shot learning for name entity recognition in geological text based on GeoBERT. Earth Sci Inform 15, 979–991 (2022). https://doi.org/10.1007/s12145-022-00775-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-022-00775-x

Keywords

Navigation