Skip to main content

Advertisement

Log in

Text visualization for geological hazard documents via text mining and natural language processing

  • Research Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

An increasing number of geological hazard documents about the mechanism and occurrence process of geological disasters contain unstructured geoscientific data that are not fully utilized. Text mining and visualization techniques offer opportunities to leverage this wealth of data and extract valuable information from dense, abstract geological disaster reports to quickly focus on the core information in geological reports and improve the efficiency of report usage. In this research, a flow framework for the automatic extraction of key information and its transformation to a simple and intuitive form for managers/researchers to quickly navigate, understand and make more informed decisions based on the key information are described. To automatically extract key information from text, an optimized term frequency-inverse document frequency algorithm is proposed to analyze text characteristics. The important information extracted from a case study document is demonstrated using a word cloud. Co-occurrence network analysis is used to present key content from geological reports and describe the correlations between words. We use the dependency grammar technique to extract triads of geological report text information and we visualize them using knowledge graphs. The results show that text visualization analysis can be used to identify the types and locations of geological disasters in reports, highlight key information from survey reports as an auxiliary public resource, and more rapidly analyze the key contents of a large number of geological disaster survey reports.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Andrienko N, Andrienko G, Fuchs G, Slingsby A, Turkay C, Wrobel S (2020) Visual analytics for understanding texts. Visual analytics for data scientists. Springer, Cham, pp 341–359

    Chapter  Google Scholar 

  • Card S, Mackinlay J, Schneiderman B (2014) Readings in information visualization: using vision to think. Morgan Kaufmann, Burlington

  • Chen G, Xiao L (2016) Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. J Infor 10(1):212–22

    Article  Google Scholar 

  • Chen J, Tao Y, Lin H (2018) Visual exploration and comparison of word embeddings. J Vis Lang Comput 48. https://doi.org/10.1016/j.jvlc.2018.08.008

  • Chi N, Lin K, El-Gohary N, Hsieh S (2016) Evaluating the strength of text classification categories for supporting construction field inspection. Autom Constr 64:78–88. https://doi.org/10.1016/j.autcon.2016.01.001

    Article  Google Scholar 

  • Chen C (2017) Improved TFIDF in big news retrieval: An empirical study. Pattern Recognit Lett 93:113–122

    Article  Google Scholar 

  • Cracknell MJ, Reading AM (2014) Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comp Geosci 63:22–33

    Article  Google Scholar 

  • Elhoseiny M, Elgammal A (2015) Text to multi-level MindMaps: A novel method for hierarchical visual abstractionof natural language text. Multim Tools Appl. https://doi.org/10.1007/s11042-015-2467-y

    Article  Google Scholar 

  • Fan R, WangL, Yan J, Song W, Zhu Y, Chen X (2020) Deep learning-based named entity recognition and knowledge graph construction for geological hazards. ISPRS Int J Geo-Inf 9(1):15

    Article  Google Scholar 

  • Figueres-Esteban M, Hughes P, Gulijk C (2016) Visual analytics for text-based railway incident reports. Saf Sci 89:72–76. https://doi.org/10.1016/j.ssci.2016.05.009

    Article  Google Scholar 

  • Gansner E, Hu Y, North S (2012) Visualizing streaming text data with dynamic graphs and maps. 439-450. https://doi.org/10.1007/978-3-642-36763-2_39

  • Holden E, Liu W, Horrocks T, Wang R, Wedge D, Duuring P, Beardsmore T (2019) GeoDocA–Fast analysis of geological content in mineral exploration reports: A text mining approach. Ore Geol Rev 111:102919

    Article  Google Scholar 

  • Jiang X, Zhang J (2016) A text visualization method for cross-domain research topic mining. J Vis 19(3):561–576

    Article  Google Scholar 

  • Khan A, Afreen K (2021) An approach to text analytics and text mining in multilingual natural language processing. Mater Today Proc. https://doi.org/10.1016/j.matpr.2020.10.861

  • King T, Quigley M, Clark D (2019) Surface-rupturing historical earthquakes in Australia and their environmental effects: new insights from re-analyses of observational data. Geosciences 9(10):408

    Article  Google Scholar 

  • Liao W, Zeng B, Liu J, Wei P, Cheng X, Zhang W (2021) Multi-level graph neural network for text sentiment analysis. Comput Electr Eng 92:107096

    Article  Google Scholar 

  • Lin H, Zhan X, Yao T (2000) Features navigation for Chinese text mining. Journal of Northeastrn University 21:240–243

    Google Scholar 

  • Li S, Chen J, Jie X (2018) Prospecting information extraction by text mining based on convolutional neural networks–a case study of the Lala copper deposit, China. IEEE Access 6:52286–52297

    Article  Google Scholar 

  • Liu L, Zhan H, Liu J, Man J (2019) Visual analysis of traffic data via spatio-temporal graphs and interactive topic modeling. J Vis 22(1):141–160

    Article  Google Scholar 

  • Li W, Wu L, Xie Z, Tao L, Zou K, Li F, Miao J (2019) Ontology-based question understanding with the constraint of Spatio-temporal geological knowledge. Earth Sci Inf 12. https://doi.org/10.1007/s12145-019-00402-2

  • Marszałkowski J, Mokwa D, Drozdowski M, Rusiecki Ł, Narożny H (2017) Fast algorithms for online construction of web tag clouds. Eng Appl Artif Intell 64:378–390

    Article  Google Scholar 

  • Ma X (2017) Linked Geoscience Data in practice: Where W3C standards meet domain knowledge, data visualization and OGC standards. Earth Sci Inf 10(4):429–441

    Article  Google Scholar 

  • Ma K, Tian M, Tan Y, Xie X, Qiu Q (2021) What is this article about? Generative summarization with the BERT model in the geosciences domain. Earth Sci Inform 1–16

  • Patrick J (2006) The scamseek project–text mining for financial scams on the internet. Data Mining. Springer, Berlin, Heidelberg, pp 295–302

    Chapter  Google Scholar 

  • Peters SE, Zhang C, Livny M, Re C (2014) A machine reading system for assembling synthetic paleontological databases. PLoS ONE 9(12):e113523

    Article  Google Scholar 

  • Qiu Q, Xie Z, Wu L, Tao L (2020a) Dictionary-based automated information extraction from geological documents using a deep learning algorithm. Earth Space Sci 7(3):e2019EA000993

  • Qiu Q, Xie Z, Wu L, Tao L (2020b) Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques. Earth Sci Inf 13(4):1393–1410

    Article  Google Scholar 

  • Qiu Q, Xie Z, Wu L, Tao L (2019a) GNER: A generative model for geological named entity recognition without labeled data using deep learning. Earth and Space Science 6(6):931–946

    Article  Google Scholar 

  • Qiu Q, Xie Z, Wu L, Li W (2019b) Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst Appl 125:157–169

    Article  Google Scholar 

  • Qiu Q, Xie Z, Wu L, Tao L, Li W (2019c) BiLSTM-CRF for geological named entity recognition from the geoscience literature. Earth Sci Inf 12(4):565–579

    Article  Google Scholar 

  • Qiu Q, Xie Z, Wu L (2018) DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain[J]. Comput Geosci 2018:1-11

  • Rose S, Engel D, Cramer N, Cowley W (2010) Automatic keyword extraction from individual documents. Text mining: applications and theory 1:1–20

  • Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  Google Scholar 

  • Seo S, Seo D, Jang M, Jeong J, Kang P (2020) Unusual customer response identification and visualization based on text mining and anomaly detection. Expert Syst Appl 144:113111

    Article  Google Scholar 

  • Sobral T, Dias T, Borges J (2020) An ontology-based approach to knowledge-assisted integration and visualization of urban mobility data. Expert Syst Appl 150:113260. https://doi.org/10.1016/j.eswa.2020.113260

    Article  Google Scholar 

  • Sun J, Lei K, Cao L, Zhong B, Wei Y, Li J, Yang Z (2020) Text visualization for construction document information management. Autom Constr 111:103048

    Article  Google Scholar 

  • Turney P, Yao Z (2000) (2020). Characteristics, challenges and suggestions of geological disaster prevention and control in China. In: IOP Conference Series: Earth and Environmental Science, vol 514, No 2, IOP Publishing, Bristol, p 022025

  • Vijayarani S, Ilamathi MJ, Nithya M (2015) Preprocessing techniques for text mining-an overview. Inter J Comp Sci Commun Netw 5(1):7–16

    Google Scholar 

  • Wang C, Ma X, Chen J (2018a) Ontology-driven data integration and visualization for exploring regional geologic time and paleontological information. Comput Geosci 115:12–19

    Article  Google Scholar 

  • Wang C, Ma X, Chen J, Chen J (2018b) Information extraction and knowledge graph construction from geoscience literature. Comput Geosci 112:112–120

    Article  Google Scholar 

  • Wang R, Liu W, McDonald C (2015) Using word embeddings to enhance keyword identification for scientific publications. In: Australasian Database Conference. Springer, Cham, pp 257-268

  • Wang Y, Li H, Wu Z (2019) Attitude of the Chinese public toward off-site construction: A text mining study. J Clean Prod 238:117926

    Article  Google Scholar 

  • Widyassari A, Rustad S, Shidik G, Noersasongko E, Syukur A, Affandy Setiadi D (2020) Review of automatic text summarization techniques & methods. J King Saud Univ - Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.05.006

    Article  Google Scholar 

  • Wu L, Xue L, Li C, Lv X, Chen Z, Jiang B, Xie Z (2017) A knowledge-driven geospatially enabled framework for geological big data. ISPRS Int J Geo-Inf 6(6):166

    Article  Google Scholar 

  • Xiao F, Chen Z, Chen J, Zhou Y (2016) A batch sliding window method for local singularity mapping and its application for geochemical anomaly identification. Comput Geosci 90:189–201

    Article  Google Scholar 

  • Yang J, Kim E, Hur M, Cho S, Han M, Seo I (2018) Knowledge extraction and visualization of digital design process. Expert Syst Appl 92:206–215

    Article  Google Scholar 

  • Yang N, MacEachren A, Domanico E (2020) Utility and usability of intrinsic tag maps. Cartogr Geogr Inf Sci 47(4):291–304

    Article  Google Scholar 

  • Yeon H, Kim S, Jang Y (2017) Predictive visual analytics of event evolution for user-created context. J Vis 20(3):471–486

    Article  Google Scholar 

  • Zhang F, Fleyeh H, Wang X, Lu M (2019) Construction site accident analysis using text mining and natural language processing techniques. Autom Constr 99:238–248

    Article  Google Scholar 

  • Zheng K, Xie M, Zhang J, Xie J, Xia S (2021) A knowledge representation model based on the geographic spatiotemporal process. Int J Geogr Inf Sci 1–18. https://doi.org/10.1080/13658816.2021.1962527

  • Zhu Y, Zhou W, Xu Y, Liu J, Tan Y (2017) Intelligent learning for knowledge graph towards geological data. Sci Programm 2017:1-13. https://doi.org/10.1155/2017/5072427

  • Zhuang C, Li W, Xie Z, Wu L (2021) A multi-granularity knowledge association model of geological text based on hypernetwork. Earth Sci Inf 14. https://doi.org/10.1007/s12145-020-00534-w

  • Yao Z (2020) Characteristics, challenges and suggestions of geological disaster prevention and control in China. In: IOP conference series: Earth and environmental science (vol 514, no 2). IOP Publishing, p 022025. https://doi.org/10.1088/1755-1315/514/2/022025

Download references

Acknowledgements

This study was financially supported by the National Natural Science Foundation of China (42050101, U1711267, 41871311, 41871305), National Key Research and Development Program (2018YFB0505500, 2018YFB0505504) and the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (No. CUG2106116)).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinjun Qiu.

Additional information

Communicated by: H. Babaie

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Xie, Z., Li, G. et al. Text visualization for geological hazard documents via text mining and natural language processing. Earth Sci Inform 15, 439–454 (2022). https://doi.org/10.1007/s12145-021-00732-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-021-00732-0

Keywords

Navigation