ABSTRACT
The existing datasets are mostly composed of official documents, statements, news articles, and so forth. So far, only a little attention has been paid to the numerals in financial social comments. Therefore, this paper presents CFinNumAttr, a financial numeral attribute dataset in Chinese via annotating the stock reviews and comments collected from social networking platform. We also conduct several experiments on the CFinNumAttr dataset with state-of-the-art methods to discover the importance of the financial numeral attributes. The experimental results on the CFinNumAttr dataset show that the numeral attributes in social reviews or comments contain rich semantic information, and the numeral clue extraction and attribute classification tasks can make a great improvement in financial text understanding.
Supplemental Material
Available for Download
- Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2020. NLP in FinTech Applications: Past, Present and Future. CoRR abs/2005.01320 (2020). arXiv:2005.01320https://arxiv.org/abs/2005.01320Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805http://arxiv.org/abs/1810.04805Google Scholar
- Cuiyun Han, Jinchuan Zhang, Xinyu Li, Guojin Xu, Weihua Peng, and Zengfeng Zeng. 2022. DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction. In Natural Language Processing and Chinese Computing, Wei Lu, Shujian Huang, Yu Hong, and Xiabing Zhou (Eds.). Cham, 172–183.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning(ICML ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282–289.Google Scholar
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. CoRR abs/1909.11942 (2019). arXiv:1909.11942http://arxiv.org/abs/1909.11942Google Scholar
- Guozheng Li, Peng Wang, Jiafeng Xie, Ruilong Cui, and Zhenkai Deng. 2022. FEED: A Chinese Financial Event Extraction Dataset Constructed by Distant Supervision. In Proceedings of the 10th International Joint Conference on Knowledge Graphs (Virtual Event, Thailand) (IJCKG ’21). Association for Computing Machinery, New York, NY, USA, 45–53. https://doi.org/10.1145/3502223.3502229Google ScholarDigital Library
- Xiao Liu, Zhunchen Luo, and Heyan Huang. 2018. Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1247–1256. https://doi.org/10.18653/v1/D18-1156Google ScholarCross Ref
- Yuxuan Liu, Maofu Liu, and Mengjie Wu. 2022. Numeral Tense Detection in Chinese Financial News(WWW ’22). Association for Computing Machinery, New York, NY, USA, 604–609. https://doi.org/10.1145/3487553.3524639Google ScholarDigital Library
- Junxiang Ren, Sibo Wang, Ruilin Song, Yuejiao Wu, Yizhou Gao, Borong An, Zhen Cheng, and Guoqiang Xu. 2022. IREE: A Fine-Grained Dataset for Chinese Event Extraction in Investment Research. In Knowledge Graph and Semantic Computing: Knowledge Graph Empowers the Digital Economy, Maosong Sun, Guilin Qi, Kang Liu, Jiadong Ren, Bin Xu, Yansong Feng, Yongbin Liu, and Yubo Chen (Eds.). Springer Nature Singapore, Singapore, 205–210.Google Scholar
- Shun Zheng, Wei Cao, Wei Xu, and Jiang Bian. 2019. Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 337–346. https://doi.org/10.18653/v1/D19-1032Google ScholarCross Ref
- Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. 2021. A Robustly Optimized BERT Pre-training Approach with Post-training. In Proceedings of the 20th Chinese National Conference on Computational Linguistics. Chinese Information Processing Society of China, Huhhot, China, 1218–1227. https://aclanthology.org/2021.ccl-1.108Google Scholar
Index Terms
- A Chinese Fine-grained Financial Event Extraction Dataset
Recommendations
FEED: A Chinese Financial Event Extraction Dataset Constructed by Distant Supervision
IJCKG '21: Proceedings of the 10th International Joint Conference on Knowledge GraphsAs an essential task in information extraction, event extraction (EE) provides abundant and valuable structured information and has been shown to be useful sources of background knowledge for applications in various domains, such as finance, ...
A Chinese Dataset for Exploring Financial Numeral Attributes
WWW '21: Companion Proceedings of the Web Conference 2021The existing datasets are mostly composed of official documents, statements, news articles, and so forth. So far, only a little attention has been paid to the numerals in financial social comments. Therefore, this paper presents CFinNumAttr, a ...
Fine-grained document-level financial event argument extraction approach
AbstractDocument-level financial event argument extraction aims to extract a set of structured financial information related to particular financial events from a financial document. This task is challenging because there is complex fine-grained semantic ...
Comments