skip to main content
10.1145/3555776.3577211acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

Student Research Abstract: Unsupervised Key Term Extraction of Tornado Narratives from NOAA Storm Events Database

Published: 07 June 2023 Publication History

Abstract

Disaster records are often composed of key metrics of the disaster event and its monetary cost. While the cost of the event is helpful for insurance adjustments, the monetary impact of a disaster is not conducive for disaster preparedness planners to build community resilience. Often, there are natural language narratives about the disaster event, like in the National Oceanic and Atmospheric Administration (NOAA) Storm Events Database, that contain essential information but are not easily retrievable because of their unstructured nature. These narratives need to be text mined in order to retrieve the impacts of the disasters in order to structure the data for further use. The method proposed in this abstract is a critical first step in the process. It is an unsupervised key term extraction method using sentence transformers to create embeddings that are then clustered, and assigned key terms by utilizing the highest term frequency-inverse document frequency (tf-idf) scores for the sentences in the narratives.

References

[1]
Ricardo J. G. B. Campello, Davoud Moulavi, Arthur Zimek, and Jorg Sander. 2015. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10, 1, Article 5 (July 2015), 51 pages.
[2]
Federica Cappelli, Valeria Costantini, and Davide Consoli. 2021. The trap of climate change-induced "natural" disasters and inequality. Global Environmental Change 70, 13 pages.
[3]
Ari Chanen. 2016. Deep Learning for Extracting Word-Level Meaning from Safety Report Narratives. In 2016 Integrated Communications Navigation and Surveillance (ICNS), April 19-21, 2016, Herndon, VA, USA. IEEE.
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv: 1810.04805.
[5]
John D. Hunter. 2007. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. 9, 3, 90--95.
[6]
Hyogo Framework for Action 2005--2015. 2005. United Nations-Headquarters, United Nations Office for Disaster Risk Reduction.
[7]
Farid Kadri, Babiga Birregah, and Eric Châtelet. 2014, The impact of natural disasters on critical infrastructures: A domino effect-based study. Journal of Homeland Security and Emergency Management 11, 2 (2014), 217--241.
[8]
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, Carol Willing, Jupyter Development Team. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas. Fernando Loizides and Birgit Schmidt (Eds). IOS Press, 87--90.
[9]
Kenneth E. Kunkel, Thomas R. Karl, Harold Brooks, James Kossin, Jay H. Lawrimore, Derek Arndt, Lance Bosart, David Changnon, Susan L. Cutter, Nolan Doesken, Kerry Emanuel, Pavel Ya. Groisman, Richard W. Katz, Thomas Knutson, James O'Brien, Christopher J. Paciorek, Thomas C. Peterson, Kelly Redmond, David Robinson, Jeff Trapp, Russell Vose, Scott Weaver, Michael Wehner, Klaus Wolter, and Donald Wuebbles. 2013. Monitoring and understanding trends in extreme storms: State of knowledge. Bulletin of the American Meteorological Society 94, 4 (Apr. 2013), 499--514.
[10]
Leland McInnes, John Healy, and Steve Astels. 2017. HDBSCAN: Hierarchical density based clustering. J. Open Source Software 2, 11 (March 2017), 205.
[11]
Leland McInnes, John Healy, and James Melville. 2018. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv: 1802.03426.
[12]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.
[13]
Marcin M. Mirończuk. 2019. Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study. Fire Technology 56, 2 (July 2019), 545--581.
[14]
National Climatic Data Center; National Environmental Satellite, Data, and Information Service; National Oceanic Atmospheric Administration; U.S. Department of Commerce. NOAA Storm Events Database. Date Retrieved: January 2021.
[15]
Amirreza Niakanlahiji, Jinpeng Wei, and Bei-Tseng Chu. 2018. A Natural Language Processing Based Trend Analysis of Advanced Persistent Threat Techniques. In 2018 IEEE International Conference on Big Data (Big Data), December 10-13, 2018, Seattle, WA, USA. IEEE.
[16]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011 Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research.12, 85, 2825--2830.
[17]
Python Software Foundation. Python Language Reference, version 3.9.7. Available at http://www.python.org
[18]
Jeff Reback, Wes McKinney, jbrockmendel, Joris Van den Bossche, Tom Augspurger, Phillip Cloud, gfyoung, Simon Hawkins, Sinhrks, Matthew Roeschke, Adam Klein, Terji Petersen, Jeff Tratner, Chang She, William Ayd, Shahar Naveh, Marc Garcia, Jeremy Schendel, patrick, Andy Hayden, Daniel Saxton, Vytautas Jancauskas, Ali McMaster, Marco Gorelli, Pietro Battiston, Skipper Seabold, Kaiqi Dong, chris-b1, h-vetinari, and Stephan Hoyer. 2021 Pandas-Dev/Pandas: Pandas 1.2.2.
[19]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. November 3--7, 2019, Hong Kong, China. Association for Computational Linguistics.
[20]
Sendai Framework for Disaster Risk Reduction 2015--2030. 2015. United Nations-Headquarters, United Nations Office for Disaster Risk Reduction.
[21]
Sentence-Transformers/all-MiniLM-L6-v2. Hugging Face. Accessed at https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
[22]
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. MPNet: Masked and permuted pre-training for language understanding. In 34th International Conference on Neural Information Processing Systems (NIPS'20). December 6-12, 2020, Vancouver, BC, Canada. Curran Associates Inc. Red Hook, NY, USA, 16857--67.
[23]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In 31st Conference in Neural Information Processing Systems (NIPS'17) December 4-9, 2017, Long Beach, CA, USA, Curran Associates Inc. Red Hook, NY, USA, 6000--6010.
[24]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers." In 34th International Conference on Neural Information Processing Systems (NIPS'20). December 6-12, 2020, Vancouver, BC, Canada. Curran Associates Inc. Red Hook, NY, USA, 5776--5788.
[25]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv 1910.03771

Index Terms

  1. Student Research Abstract: Unsupervised Key Term Extraction of Tornado Narratives from NOAA Storm Events Database

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
    March 2023
    1932 pages
    ISBN:9781450395175
    DOI:10.1145/3555776
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2023

    Check for updates

    Author Tags

    1. informational retrieval
    2. unsupervised methods
    3. key term extraction
    4. disaster informatics
    5. transformers

    Qualifiers

    • Poster

    Conference

    SAC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 56
      Total Downloads
    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media