Abstract
An increasing number of geological hazard documents about the mechanism and occurrence process of geological disasters contain unstructured geoscientific data that are not fully utilized. Text mining and visualization techniques offer opportunities to leverage this wealth of data and extract valuable information from dense, abstract geological disaster reports to quickly focus on the core information in geological reports and improve the efficiency of report usage. In this research, a flow framework for the automatic extraction of key information and its transformation to a simple and intuitive form for managers/researchers to quickly navigate, understand and make more informed decisions based on the key information are described. To automatically extract key information from text, an optimized term frequency-inverse document frequency algorithm is proposed to analyze text characteristics. The important information extracted from a case study document is demonstrated using a word cloud. Co-occurrence network analysis is used to present key content from geological reports and describe the correlations between words. We use the dependency grammar technique to extract triads of geological report text information and we visualize them using knowledge graphs. The results show that text visualization analysis can be used to identify the types and locations of geological disasters in reports, highlight key information from survey reports as an auxiliary public resource, and more rapidly analyze the key contents of a large number of geological disaster survey reports.
Similar content being viewed by others
References
Andrienko N, Andrienko G, Fuchs G, Slingsby A, Turkay C, Wrobel S (2020) Visual analytics for understanding texts. Visual analytics for data scientists. Springer, Cham, pp 341–359
Card S, Mackinlay J, Schneiderman B (2014) Readings in information visualization: using vision to think. Morgan Kaufmann, Burlington
Chen G, Xiao L (2016) Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. J Infor 10(1):212–22
Chen J, Tao Y, Lin H (2018) Visual exploration and comparison of word embeddings. J Vis Lang Comput 48. https://doi.org/10.1016/j.jvlc.2018.08.008
Chi N, Lin K, El-Gohary N, Hsieh S (2016) Evaluating the strength of text classification categories for supporting construction field inspection. Autom Constr 64:78–88. https://doi.org/10.1016/j.autcon.2016.01.001
Chen C (2017) Improved TFIDF in big news retrieval: An empirical study. Pattern Recognit Lett 93:113–122
Cracknell MJ, Reading AM (2014) Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comp Geosci 63:22–33
Elhoseiny M, Elgammal A (2015) Text to multi-level MindMaps: A novel method for hierarchical visual abstractionof natural language text. Multim Tools Appl. https://doi.org/10.1007/s11042-015-2467-y
Fan R, WangL, Yan J, Song W, Zhu Y, Chen X (2020) Deep learning-based named entity recognition and knowledge graph construction for geological hazards. ISPRS Int J Geo-Inf 9(1):15
Figueres-Esteban M, Hughes P, Gulijk C (2016) Visual analytics for text-based railway incident reports. Saf Sci 89:72–76. https://doi.org/10.1016/j.ssci.2016.05.009
Gansner E, Hu Y, North S (2012) Visualizing streaming text data with dynamic graphs and maps. 439-450. https://doi.org/10.1007/978-3-642-36763-2_39
Holden E, Liu W, Horrocks T, Wang R, Wedge D, Duuring P, Beardsmore T (2019) GeoDocA–Fast analysis of geological content in mineral exploration reports: A text mining approach. Ore Geol Rev 111:102919
Jiang X, Zhang J (2016) A text visualization method for cross-domain research topic mining. J Vis 19(3):561–576
Khan A, Afreen K (2021) An approach to text analytics and text mining in multilingual natural language processing. Mater Today Proc. https://doi.org/10.1016/j.matpr.2020.10.861
King T, Quigley M, Clark D (2019) Surface-rupturing historical earthquakes in Australia and their environmental effects: new insights from re-analyses of observational data. Geosciences 9(10):408
Liao W, Zeng B, Liu J, Wei P, Cheng X, Zhang W (2021) Multi-level graph neural network for text sentiment analysis. Comput Electr Eng 92:107096
Lin H, Zhan X, Yao T (2000) Features navigation for Chinese text mining. Journal of Northeastrn University 21:240–243
Li S, Chen J, Jie X (2018) Prospecting information extraction by text mining based on convolutional neural networks–a case study of the Lala copper deposit, China. IEEE Access 6:52286–52297
Liu L, Zhan H, Liu J, Man J (2019) Visual analysis of traffic data via spatio-temporal graphs and interactive topic modeling. J Vis 22(1):141–160
Li W, Wu L, Xie Z, Tao L, Zou K, Li F, Miao J (2019) Ontology-based question understanding with the constraint of Spatio-temporal geological knowledge. Earth Sci Inf 12. https://doi.org/10.1007/s12145-019-00402-2
Marszałkowski J, Mokwa D, Drozdowski M, Rusiecki Ł, Narożny H (2017) Fast algorithms for online construction of web tag clouds. Eng Appl Artif Intell 64:378–390
Ma X (2017) Linked Geoscience Data in practice: Where W3C standards meet domain knowledge, data visualization and OGC standards. Earth Sci Inf 10(4):429–441
Ma K, Tian M, Tan Y, Xie X, Qiu Q (2021) What is this article about? Generative summarization with the BERT model in the geosciences domain. Earth Sci Inform 1–16
Patrick J (2006) The scamseek project–text mining for financial scams on the internet. Data Mining. Springer, Berlin, Heidelberg, pp 295–302
Peters SE, Zhang C, Livny M, Re C (2014) A machine reading system for assembling synthetic paleontological databases. PLoS ONE 9(12):e113523
Qiu Q, Xie Z, Wu L, Tao L (2020a) Dictionary-based automated information extraction from geological documents using a deep learning algorithm. Earth Space Sci 7(3):e2019EA000993
Qiu Q, Xie Z, Wu L, Tao L (2020b) Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques. Earth Sci Inf 13(4):1393–1410
Qiu Q, Xie Z, Wu L, Tao L (2019a) GNER: A generative model for geological named entity recognition without labeled data using deep learning. Earth and Space Science 6(6):931–946
Qiu Q, Xie Z, Wu L, Li W (2019b) Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst Appl 125:157–169
Qiu Q, Xie Z, Wu L, Tao L, Li W (2019c) BiLSTM-CRF for geological named entity recognition from the geoscience literature. Earth Sci Inf 12(4):565–579
Qiu Q, Xie Z, Wu L (2018) DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain[J]. Comput Geosci 2018:1-11
Rose S, Engel D, Cramer N, Cowley W (2010) Automatic keyword extraction from individual documents. Text mining: applications and theory 1:1–20
Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Seo S, Seo D, Jang M, Jeong J, Kang P (2020) Unusual customer response identification and visualization based on text mining and anomaly detection. Expert Syst Appl 144:113111
Sobral T, Dias T, Borges J (2020) An ontology-based approach to knowledge-assisted integration and visualization of urban mobility data. Expert Syst Appl 150:113260. https://doi.org/10.1016/j.eswa.2020.113260
Sun J, Lei K, Cao L, Zhong B, Wei Y, Li J, Yang Z (2020) Text visualization for construction document information management. Autom Constr 111:103048
Turney P, Yao Z (2000) (2020). Characteristics, challenges and suggestions of geological disaster prevention and control in China. In: IOP Conference Series: Earth and Environmental Science, vol 514, No 2, IOP Publishing, Bristol, p 022025
Vijayarani S, Ilamathi MJ, Nithya M (2015) Preprocessing techniques for text mining-an overview. Inter J Comp Sci Commun Netw 5(1):7–16
Wang C, Ma X, Chen J (2018a) Ontology-driven data integration and visualization for exploring regional geologic time and paleontological information. Comput Geosci 115:12–19
Wang C, Ma X, Chen J, Chen J (2018b) Information extraction and knowledge graph construction from geoscience literature. Comput Geosci 112:112–120
Wang R, Liu W, McDonald C (2015) Using word embeddings to enhance keyword identification for scientific publications. In: Australasian Database Conference. Springer, Cham, pp 257-268
Wang Y, Li H, Wu Z (2019) Attitude of the Chinese public toward off-site construction: A text mining study. J Clean Prod 238:117926
Widyassari A, Rustad S, Shidik G, Noersasongko E, Syukur A, Affandy Setiadi D (2020) Review of automatic text summarization techniques & methods. J King Saud Univ - Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.05.006
Wu L, Xue L, Li C, Lv X, Chen Z, Jiang B, Xie Z (2017) A knowledge-driven geospatially enabled framework for geological big data. ISPRS Int J Geo-Inf 6(6):166
Xiao F, Chen Z, Chen J, Zhou Y (2016) A batch sliding window method for local singularity mapping and its application for geochemical anomaly identification. Comput Geosci 90:189–201
Yang J, Kim E, Hur M, Cho S, Han M, Seo I (2018) Knowledge extraction and visualization of digital design process. Expert Syst Appl 92:206–215
Yang N, MacEachren A, Domanico E (2020) Utility and usability of intrinsic tag maps. Cartogr Geogr Inf Sci 47(4):291–304
Yeon H, Kim S, Jang Y (2017) Predictive visual analytics of event evolution for user-created context. J Vis 20(3):471–486
Zhang F, Fleyeh H, Wang X, Lu M (2019) Construction site accident analysis using text mining and natural language processing techniques. Autom Constr 99:238–248
Zheng K, Xie M, Zhang J, Xie J, Xia S (2021) A knowledge representation model based on the geographic spatiotemporal process. Int J Geogr Inf Sci 1–18. https://doi.org/10.1080/13658816.2021.1962527
Zhu Y, Zhou W, Xu Y, Liu J, Tan Y (2017) Intelligent learning for knowledge graph towards geological data. Sci Programm 2017:1-13. https://doi.org/10.1155/2017/5072427
Zhuang C, Li W, Xie Z, Wu L (2021) A multi-granularity knowledge association model of geological text based on hypernetwork. Earth Sci Inf 14. https://doi.org/10.1007/s12145-020-00534-w
Yao Z (2020) Characteristics, challenges and suggestions of geological disaster prevention and control in China. In: IOP conference series: Earth and environmental science (vol 514, no 2). IOP Publishing, p 022025. https://doi.org/10.1088/1755-1315/514/2/022025
Acknowledgements
This study was financially supported by the National Natural Science Foundation of China (42050101, U1711267, 41871311, 41871305), National Key Research and Development Program (2018YFB0505500, 2018YFB0505504) and the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (No. CUG2106116)).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: H. Babaie
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, Y., Xie, Z., Li, G. et al. Text visualization for geological hazard documents via text mining and natural language processing. Earth Sci Inform 15, 439–454 (2022). https://doi.org/10.1007/s12145-021-00732-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-021-00732-0