skip to main content
10.1145/3422713.3422741acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdtConference Proceedingsconference-collections
research-article

Chinese News Event Corpus Construction Method Based on Syntax Tree

Published: 23 October 2020 Publication History

Abstract

At present, the weakly supervised model is usually used for the expansion of the event corpus, which avoids the expensive manual annotation process. However, the weakly supervised model relies on the knowledge base and a small part of manually annotated corpus data, which makes the model have the problems of poor portability. In order to solve this problem, we construct a public domain event extraction model using syntax tree. In this paper, we propose a classification structure of Chinese syntax tree according to the view of event extraction, and put forward an event extraction algorithm for various syntax tree types. Moreover, in the construction algorithm of trigger word dictionary, we use cross-corpus dictionary information to construct Chinese trigger word dictionary from the perspective of semantics. As a result, we obtain 40,128 Chinese news events, which initially constituted the corpus of Chinese new events.

References

[1]
Chen, Zheng, and Heng Ji. "Language specific issue and feature exploration in Chinese event extraction." Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers. 2009.
[2]
Yang, Hang, et al. "Dcfee: A document-level chinese financial event extraction system based on automatically labeled training data." Proceedings of ACL 2018, System Demonstrations. 2018.
[3]
Rao, Sudha, et al. "Biomedical event extraction using abstract meaning representation." BioNLP 2017. 2017.
[4]
Zeng, Ying, et al. "Scale up event extraction learning via automatic training data generation. " Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
[5]
Li, Wei, et al. "Joint event extraction based on hierarchical event schemas from framenet." IEEE Access 7 (2019): 25001--25015.
[6]
Liu, Shulin, et al. "Leveraging framenet to improve automatic event detection." (2016).
[7]
Chen, Yubo, et al. "Automatically labeled data generation for large scale event extraction." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017.
[8]
Araki, Jun, and Teruko Mitamura. "Open-domain event detection using distant supervision." Proceedings of the 27th International Conference on Computational Linguistics. 2018.
[9]
Abney, Steven. "Bootstrapping." Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002.
[10]
Liao, Shasha, and Ralph Grishman. "Can document selection help semi-supervised learning: a case study on event extraction." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2. Association for Computational Linguistics, 2011.
[11]
Ferguson, James, et al. "Semi-supervised event extraction with paraphrase clusters." arXiv preprint arXiv:1808.08622 (2018).
[12]
Wang, Xiaozhi, et al. "Adversarial training for weakly supervised event detection." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
[13]
Gerner, Deborah J., et al. "Conflict and mediation event observations (CAMEO): A new event data framework for the analysis of foreign policy interactions." International Studies Association, New Orleans (2002).
[14]
Azar, Edward E. "The conflict and peace data bank (COPDAB) project." Journal of Conflict Resolution 24.1 (1980): 143--152.
[15]
Leetaru, Kalev, and Philip A. Schrodt. "Gdelt: Global data on events, location, and tone, 1979-2012." ISA annual convention. Vol. 2. No. 4. Citeseer, 2013.
[16]
Osorio, Javier, et al. "Translating CAMEO verbs for automated coding of event data." International Interactions 45.6 (2019): 1049--1064.
[17]
Manning, Christopher D., et al. "The Stanford CoreNLP natural language processing toolkit." Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 2014.
[18]
Li, Jingyang, and Maosong Sun. "Scalable term selection for text categorization." Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 2007.
[19]
Norris, Clayton, Philip Schrodt, and John Beieler. "PETRARCH2: Another event coding program." Journal of Open Source Software 2.9 (2017): 133.

Index Terms

  1. Chinese News Event Corpus Construction Method Based on Syntax Tree

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICBDT '20: Proceedings of the 3rd International Conference on Big Data Technologies
    September 2020
    250 pages
    ISBN:9781450387859
    DOI:10.1145/3422713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Event extraction
    2. Trigger word dictionary
    3. syntax trees

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICBDT 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 41
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media