1 Introduction

Causality analysis is a common and important task in the finance domain, it can be useful for risk management, investment decisions, and portfolio optimization. However, experts lack an entire image of all possible reasons or effects. Therefore, the Financial expertise depiction via Causality Knowledge Graph and Domain Ontology (FinCaKG-Onto) is proposed to fill this gap by learning causality from current textual resources and depicting a greater breadth of expertise accordingly. Current knowledge graphs are prone to incorporate a greater number of triples in order to expand in size. However, the volume of knowledge graph doesn’t necessarily equate to a greater volume of actual knowledge, especially expert knowledge (aka. expertise). Many domain-oriented knowledge graphs have been created to present the concepts of expertise. But in general, concepts are linked with too many types of relations, which introduces challenges in establishing meaningful connections between them. To tackle this predicament, we turn to utilizing exclusively causal relationships as the linkage and connect the domain keywords if the causality exists. We anticipate that connections founded on causality will be better suited for presenting the underlying logic within a given domain.

Causal relations have been studied and stored in many commonsense knowledge bases, e.g. WikiData [1], ConceptNet [2]. Even the causality-dedicated knowledge graphs have been generated by many works recently, e.g. CausalNet [3], Cause Effect Graph [4], CauseNet [5] and ATOMIC [6]. Unlike commonsense knowledge bases, where nodes typically come with unique identifiers, the nodes in these causality-dedicated graphs are represented as raw strings. This approach, however, overlooks issues like redundancy, ambiguity, and incompleteness in the string expressions. The strength of these graphs lies in their potential to organize knowledge logically, but due to the raw and imprecise string nodes, the logical structure often becomes confusing and poorly directed. To address these limitations of existing causal KGs, we aim to build a causal KG step by step from text, perform entity linking with context retrieval, and apply a domain-specific ontology to organize the graph logically. To highlight the improvements in our constructed KGs, we conducted comparisons with existing causal KGs based on statistics, scope, and feasibility. FinCaKG-Onto demonstrated the highest retrieval rate, a strong domain-specific focus, and superior feasibility in logical path and relation inference.

In this paper, we outline the resources and methodology for building this knowledge graph in Section 3. Sequentially, we delineate the schema of graph structure in Section 4, incorporating a comprehensive assessment of the resulting knowledge graph in Section 5. Following this, we delve into the practical applications of FinCaKG-Onto in Section 6, examining its usefulness in the context of causality investigation and inference exploration. Ultimately, we would like to make the final knowledge graph (FinCaKG-Onto) and its associated resources accessible on the webpages.Footnote 1

2 Related work

The creation of a knowledge graph from text involves text identification, relation extraction and graph construction. We framed the initial two phases as causality detection works in Section 2.1 and subsequently investigate the latest advancements in constructing causality knowledge graphs in Section 2.2. Besides conventional techniques, we also explore how generative models, now widely adopted for handling varied input formats and extracting relationships without supervision, are applied in these sections.

2.1 Causality detection

Pattern-based approach: The earliest approaches relied on finding the explicit causality in sentences. The explicit clues are a collection of causative verbs, causation adv./adj., causal links, and resultative constructions [7]. From summarizing the idiomatic causality, experts could generate various handcrafted rules with syntactic, lexical, and grammatical patterns [3, 8,9,10]. Besides the classical patterns, statistically, the probabilities of cause-effect pairs are also explored to support unsupervised causal prediction [11].

Shallow machine learning approach: Machine learning-based approaches embed the causality characteristics into feature vectors and then apply algorithms to interpret features in different manners, e.g. regression, classification, and clustering [7]. In general, we select the causality features by pattern-based approaches and use machine learning approaches to capture the implicit causality [12, 13]. Additionally, Hidey et al. introduced the distant supervision technique to apply to a larger corpus efficiently [14].

Deep learning approach: Getting rid of the crafty procedures at feature engineering, deep learning approaches can learn features in training automatically. General relation extraction techniques are extensively utilized, independent of relation type. REBEL [15] introduced autoregressive models for extracting relation triplets from raw text. UniRel [16] unified entity and relation representations and used a self-attention-based Interaction Map for unified triplet extraction. ReLiK [17] leveraged a retrieve-and-read paradigm to link entities and extract triplets in one pass, resulting in quicker inference. However, when handling causality relations, the performance of popular RNN or CNN is limited [18, 19]. Therefore, more specialized models were examined and explored [20]. The typical LSTM [21] allows the past elements to be considered as context for an element under scrutiny. Additionally, considering the future sequence context, bidirectional LSTMs (Bi-LSTM) were demonstrated to outperform in identifying cause-effect from text [22]. Therefore, it is widely regarded as a baseline model for recent works. In our causality detection module, we also applied a quite competitive model based on Bi-LSTM [23]. This model introduced a graph construction technique to solve the shifting problems between cause and effect spans when the syntactic clues are absent in the text. We resort to utilizing this proposal as our cause-effect span identification model and any explanations of the forthcoming model refer to this algorithm. More technical details in Section 3.3. Aside from semantic causality, some studies addresses causality within temporal and spatial contexts. Ali et al. [24, 25], for example, design a network that leverages GCN [26] to dynamically learn spatio-temporal patterns, thus capturing causality and improving traffic flow predictions.

Generative approach: Annotating datasets for supervised deep learning is often costly and time-intensive. Generative models, however, can be easily fine-tuned for various domains and tasks [27]. There are two main methodologies for structural information extraction: 1) task-dependent approach: sequential use of generative models for tasks like NER, Relation Extraction, and Entity Linking to build knowledge graphs; 2) end-to-end approach: extracting triplets directly from text using generative models. In task-dependent approaches, LSTM-CRF models [28, 29] effectively handle NER by employing hard attention on tokens, similar to attention mechanisms in generative models. In relation extraction and entity linking, KnowGL [30] uses a REBEL-like fine-tuned model to detect entity pairs and generate facts, including entity labels, types, and relationships with ranking scores. This shows how both tasks are handled through knowledge generation. Similarly, GENRE [31] and mGENRE [32] generate unique entity names autoregressively, improving fine-grained interaction between context and entity names. In the end-to-end method, DeepStruct [33] pretrains language models on task-agnostic corpora using multi-task training to generate structures from text, enabling effective knowledge transfer and achieving state-of-the-art performance across many tasks. UIE [34] follows a similar path, creating a unified text-to-structure framework that simplifies the adaptation process and applies broadly to various IE tasks.

2.2 Causality knowledge graph

The straightforward causality is presented by the pairwise cause-effect text. It can be the word pairs, i.e. bacteriadisease, or it could be textual span pairs. i.e. Because of their wrong investment,they went bankrupt. For textual span pairs, Frattini et al. investigated the statistic of causality in text and they denote that textual span pairs have greater quantity than word pairs [35]. Additionally, textual span pairs are capable of expressing more concrete causality.

Causal KGs from referable resources: Popular methodologies tend to generate the causal knowledge graph from word pairs. From scratch, many works collected abundant causal word pairs and assigned causal relations with weighted occurrences [4, 5, 36]. Besides, relying on the existing knowledge bases, Khatiwada et al. adopt the link prediction techniques to predict the new causal relations among the predefined entities in WikiData [37]. To benefit both resources, CausalNet on the one hand accumulated word pairs by the linguistic patterns from web-crawled contents, on the other hand, it harvested the existing causal pairs from Wikipedia [3]. It displayed some examples of causal word pairs in multiple hops (i.e. 1-3 hops) and demonstrated the feasibility of connecting causal pairs. Beyond causal-specific graphs, certain works extract only causal relations from common sense knowledge graphs, such as ConceptNet-CC [38], which isolates causal links from the ConceptNet knowledge graph. Among the aforementioned knowledge graphs, nodes are presented by terms or mentions regardless of whether some nodes refer to the same concept or instance. It introduces redundant nodes and intertwined relations to the graph, which brings challenges in querying and visualization. Moreover, it turns to be impossible to interpret the causality logic on a flat knowledge graph. Since human needs taxonomy or conceptual structure to be navigated for a broader view. Especially in a specific domain, i.e. finance, current causality knowledge graphs are incompetent to even cover sufficient technical terms.

Causal KGs from generative model: Generative models assist in information extraction in various ways. KG-S2S [39] consolidates separate tasks into a "text-text" generation task, proving effective for handling knowledge graph structures and excelling in graph completion. Still, sequential tasks are unavoidable for KG construction. By employing prompt engineering with GPT-4 [40], G-T2KG [41] generated sequential results for a final structural representation, while iText2KG [42] adapted this approach using Langchain [43] to extract distinct concept entities, avoiding semantic confusion. Unlike the sole methodology, FinDKG [44] also published their knowledge graphs automatically extracted from financial news articles based the predefined relations and entity types. However, these construction methods lack evaluations for knowledge graph quality, missing both detailed and overall assessments. Additionally, they do not incorporate existing ontologies, complicating knowledge reuse and navigation, and resulting in knowledge graphs that are not tailored to causality and may contain minimal causal knowledge.

3 Methodology

Fig. 1
figure 1

The framework of FinCaKG-Onto construction

We aim to construct FinCaKG-Onto from plain text automatically but also guarantee the expertise and trustworthiness of FincaKG. Expert resources are embodied to provide sufficient guidance for automatic learning modules and the final FinCaKG-Onto construction procedure. Correspondingly, we draw the entire framework of constructing FinCaKG-Onto as shown in Fig. 1:

  1. 1.

    collecting the taxonomic relations and financial vocabulary from expert resources, at depicted on the top of this figure.

  2. 2.

    uncovering hidden causal pairs from text by three main modules trained on the trial resources, signified as the three sequential blue rectangles and the green box at bottom of Fig. 1:

    • Causality Detection Module: identify causality sentences from the financial reports and detect the causal span and effect span for those causal sentences; detect cause spans and effect spans based on causal sentences.

    • Entity Linking Module: locate the financial mentions in textual spans and link mentions to WikiData entities.

    • Causality Bonding Module: extract financial mention pairs and their linked entities from causality span pairs and align the mentions pairs to entity pairs.

  3. 3.

    linking all entity pairs from tail to head, distinguishing the nodes into concepts (T-Box in Fig. 1) and instances (A-Box in Fig. 1), and organizing them by taxonomic relations for FinCaKG-Onto generation.

3.1 Expert Resources

We refer expert resources as the collections of predefined domain knowledge, which exhibits uniqueness across different domains. Within the context of FinCaKG-Onto, as shown on top of Fig. 1, we will utilize the taxonomic relations and terminology sourced from the FIBO ontology, along with additional terminology from Investopedia vocabulary.

FIBO Ontology: Financial Business Ontology (FIBO) is generated from an industry-wide initiative to address the data integration problem, and it represents a consensus of the common concepts developed by a community of experts. In this paper, we use the metadata of FIBO from the 2022 Q4 releaseFootnote 2, which has defined 1100 concepts and 32,134 instances, regarded as the first source of vocabulary. In this paper, we harvest their taxonomic relations, including rdfs: SubClassOf among concept nodes and rdf: type between concept and instance nodes.

Investopedia Vocabulary: Investopdia is famous as a dictionary for financial concepts. Accordingly, we consider its vocabularyFootnote 3 as the second source of financial vocabulary, which contains 6259 unique noun phrases.

3.2 Trial Resources

We denote trial resources as the targeting corpus for FinCaKG-Onto construction and the accompanying training data for model optimization. Those materials can be modified if we intend to apply this suggested methodology to different target datasets or even diverse domains.

Financial Reports: In finance, the annual reports of listed companies are open to the public, where 10-k is one of the formats of the financial report in The U.S. Securities and Exchange Commission (SEC) website. We limit our scope on the reports of the top 3000 companyFootnote 4 in the last 5 years. To directly acquire the machine-readable text, we only deal with the reports available in the xbrl extension, for which 5093 reports are remaining.

Labeled Data: In the shared tasks of FinCausalFootnote 5, they provide the labels for sentences from financial reports. We merge their datasetFootnote 6 from different years and deploy them on the causality detection module.

WikiData: It stores rich information about entities, like textual labels, identifiers, redirected titles and explanations, etc. It is regarded as an entity dictionary in our entity linking tasks. For example, we exploited the two existing mapping tables to facilitate the mapping from mentions to entityFootnote 7 and from entity to WikiData IdentifierFootnote 8.

3.3 Causality detection module

This module consists of two sequential operations: the first entails the identification of causal sentences within the corpus, while the second focuses on detecting cause spans and effect spans within those identified causal sentences. Each of these distinctive tasks is undertaken by specialized models, which will be discussed upon in forthcoming subsections.

Sentence Extraction Model: Employing the labeled data, we proceed to train a linear classifier, leveraging the capabilities of the BERT model [45], in order to predict the existence or non-existence of causality within sentence structures. Leveraging the adeptness of the trained model applied to financial reports, we have the capacity to discriminate between sentences exhibiting causality and those that do not. Subsequently, we utilize exclusively the causal sentences as the input for the subsequent model.

Span Identification Model: Suppose we have a causal sentence ‘Zhao found himself 60 million yuan indebted after losing 9,000 BTC in a single day (February 10, 2014)’, we aim to identify ‘losing 9,000 BTC in a single day (February 10, 2014)’ as the cause span while we annotate ‘Zhao found himself 60 million yuan indebted’ as the effect span. According to the aforementioned example, we have to locate the concrete occurrences of causality, aka. causal span and its effect span in causal sentences, which remains challenging in current research. In the related competition [46], the ilab-FinCau model [23] achieved a top-2 ranking with 0.94 precision score. Especially it shows a remarkable performance in distinguishing cause span from effect span in a given sentence. This ilab-FinCau model proposes a graph builder to embed extra causality relations and generates the graph embedding by using GNN and Bi-LSTM. Except for BERT embedding, it concatenates graph embedding as causality-enriched features for training and prediction procedures. We resort to this model for better span identification performance and only preserve the complete pairs of cause spans and effect spans for the following modules.

Table 1 The statistics of vocabulary entity linking
Fig. 2
figure 2

The overview of FinCaKG-Onto schema

3.4 Entity linking module

Inside the cause and effect span, we want to link the potential mentions to the aforementioned financial vocabulary (Section 3.1), and then map vocabulary to Wikidata entities (Section 3.2). In the most straightforward method, denotes EL1, we could apply exact string matching techniques to link mentions to vocabulary and then to entities. However, the low recall would give rise to a sparsity problem in the final graph. Accordingly, we introduce WikiData entities as the intermediate connection to proceed with this mapping, denoted as EL2, which allows the inflections and synonyms of a term to be mapped together.

For the mapping from vocabulary to WikiData entities, however, the collected financial vocabulary does not publish any information about the authorized entities correspondingly. Given the financial vocabulary and their explanation as context, we can deploy GENRE [31] (Generative ENtity REtrieval) model to link this vocabulary to WikiData entities. To acquire the explanation of FIBO vocabulary, we explore the main annotation properties in FIBO ontology:

  • extract the value of skos:definition as the main component of explanation;

  • append the contents of rdfs:isDefinedBy, adapted fromFootnote 9, skos:example and explanatory noteFootnote 10 if their value exist.

As for the context of Investopedia vocabulary, we consider the first sentence of their page content as the related context. Table 1 shows the statistics of entity linking results. The raw text column indicates the number of vocabulary from different sources. The column linked entity gives the number of entities that the GENRE model could link to the existing entities. The last column shows whether those entities could correctly map to WikiData ID.

Due to the fact that most mentions are noun phrases, they appear as diverse combinations of sequential tokens, thus aligning the textual occurrences to their most suitable entity is quite challenging. We would take advantage of two aforementioned methods and summarize them here:

EL1::

find the target mentions by exact string matching to vocabulary and directly align those mentions to the WikiData entities of their matched vocabulary.

EL2::

apply mGENRE [32] model to map all mentions to their entities if exist, where we do not indicate where is our target mentions but make full usage of the capability of the mGENRE model. Comparatively, the GENRE model we used in the last step can only retrieve entities when the raw text is specified in advance. In the end, we filter out those mentions whose entities are not in the predefined vocabulary entities.

From both methods, we are able to complete the mapping from mentions to financial vocabulary and to WikiDatata entities. Eventually, we merge the both findings and preserve the entities linking results for the following module.

3.5 Causality bonding module

Suppose that the connections of the domain-specific entities of cause span to that of effect span could be considered as causal relations. We could link mention1 (from causal span) to mention2 (from its corresponding effect span) as causality, and link mention2 to mention3 as causality in a similar case. In this manner, we could extract financial mention pairs from causality span pairs. From the results of the entity linking modules, we then align the mentions pairs to entity pairs. By linking from tail to head between causal pairs, entities are all connected with causal relations.

4 FinCaKG-Onto

Following those modules, FinCaKG-Onto turns out to be a fairly large graph with enriched connections. However, the current knowledge graph lacks a human-understandable structure. Therefore we incorporate FIBO ontology to navigate the knowledge presentation in taxonomic structure, differentiate nodes into concept level and instance level, and link terms to WikiData identifier to enable the connection from FinCaKG-Onto to the world of Linked Open Data.

In this section, we start by introducing the overview of FinCaKG-Onto accompanied by real examples. The detailed properties of each component are explained and listed in the following sections.

4.1 Overview

FinCaKG-Onto is composed of a group of financial entities and their mentions from the text. As shown in Fig. 2, we define the mentions as Financial_instance_term, who is rdf:type of their related entities Financial_concept_term. For example in Fig. 3, we list different mentions in the white box and they are all linked to "Bad debt" entity. Back to Fig. 2, the latter class holds owl:sameAs to FIBO_concept and to Investo_concept. FIBO_concept not only holds a collection of concepts but also inherits the rdfs:subClassOf from FIBO ontology. Correspondingly, FIBO_instance owns rdf:type to FIBO_concept.

Fig. 3
figure 3

The instances in FinCaKG-Onto schema

Table 2 The properties of FinCaKG-Onto schema

On the one hand, we assign reflexive relation fincakg:cause to Financial_concept_term and to its related concepts: FIBO_concept and Investo_concept. It allows us to establish this relationship between different classes and instances. For example, in Fig. 3, "Deriative(finance)" in yellow box causes "Bad debt", also it is a subclass of another yellow box - "Financial Instrument". Next to it, the green box "Apple Inc. common stock" is shown as an instance of "Financial Instrument".

On the other hand, we develop relation dg:wikidata_item_id from Financial_concept_term and its related concepts to WikiData and enable others to inherit this extra linkage. As an example in Fig. 3, "Bad debt" has wikidata ID to "Q1365583" in purple box.

4.2 Schema

We provide the following prefixes and namespaces used in the FinCaKG-Onto schema:

textbf: http://www.w3.org/2002/07/owl#rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#dg: https://w3id.org/dingo#fincakg: https://www.ai.iee.e.titech.ac.jp/fincakg#

Except for the predefined namespaces or vocabulary, we carry out our namespace fincakg to serve its characteristics. In Table 2, we demonstrate the schema of classes, instances, and relations in detail. In the Properties column, rdfs:label stores entity names, rdfs:isDefinedBy records the original IRI from expert resources, and dg:wikidata_item_id saves the WikiData ID if exists. The rest properties illustrate the existence of causality:

  1. 1)

    fincakg:occurAsCause records the list of sentence serial numbers the vocabulary occurs in cause spans;

  2. 2)

    fincakg:occurAsEffect records that of effect spans;

  3. 3)

    fincakg:co-occurInCausality records the list of sentence serial numbers where this causality occurs;

  4. 4)

    fincakg:freq stores the total number of sentences where this causality occurs.

In the following columns, we present the examples of each property and indicate the datatype for storage.

Table 3 The statistics of FinCaKG-Onto

As we mentioned in Section 3.4, we apply two entity linking methods to support for causality alignment module. In Table 3, we sum up the statistics of FinCaKG-Onto schema. In the EL1 column, it indicates that we utilize the string matching techniques to map textual occurrence to the entity; comparatively, in the EL2 column, we conduct entity alignment by the full capabilities of mGENRE; In the EL_mix column, we merge the nodes and relations generated by both entity linking methods. The last column demonstrates the final FinCaKG-Onto after removing the duplicates and only keeping fincakg:cause relations whose frequency is bigger than 1. The bold rows are the statistic summary for each block. We notice that, though many taxonomic relations and vocabulary are introduced in expert resources, their occurrence in our inputting financial reports is not as high as expected. Fortunately, they still form the main structure of FinCaKG-Onto and organize the causality graph in a straightforward and meaningful way.

To tackle dynamic, domain-specific financial terminology, we leverage the keyword identification capabilities of generative models, such as GENRE and mGENRE, which allow us to consider all morphological variations of relevant keywords. To refine these keywords, we incorporate financial domain resources like FIBO and Investopedia. While these resources weren’t used to improve the entity linking module through ontologies, we expect that our current integration sufficiently uses domain-specific resources for accurate keyword identification. When new financial terms arise in our databases or ontologies, instead of regularly retraining the entity linking models, we expand the collection of keywords to process the generated outputs. It ensures that new financial terms are included without modifying the pretrained models.

5 Evaluation of FinCaKG-Onto

This section outlines the performance of our causality detection modules, including the sentence extraction and span identification models. We then offer a comprehensive analysis of the entity linking module and qualitative evaluation metrics, such as trustworthiness, interoperability, and accessibility. To conclude, we compare FinCaKG-Onto with existing causal KGs based on statistics, scope, and feasibility, showcasing the distinctive strengths of our work.

Table 4 The performance of sentences extraction model during validation and inference

5.1 The performance of sentence extraction model

We aim to find causal sentences with high purity from the text. To achieve it, we need to: 1) evaluate the training models on Labeled Data - shown in validation row of Table 4; 2) select the best-trained models to predict financial reports - shown in inference row of Table 4. We present the precision, recall, and F1-score of binary classification on the causal sentence extraction task. In fact, within 10 epochs, we can only observe a slight increase after 1st epoch. Here we show all the results of the validation process in the 10-th epochs. It shows that in validation process, this model achieves rather high scores around 0.96. Nevertheless, we select the final trained models and use them for the inference task. The prediction process yields somewhat lower scores than the validation process, yet they are still deemed of sufficient quality for identifying causal sentences.

In brief, we reported the inference performance of this model in the financial reports as 0.96 in precision value.

5.2 The performance of span identification model

Table 5 The comparison results across multiple span identification models on training data
Fig. 4
figure 4

The performance of span identification model during inference

The target here is to find the cause span and effect span given a causal sentence. To evaluate the prediction of the complete spans in a sentence, including both the boundaries and the label, we apply the sequence labeling metrics - seqevalFootnote 11.

Causality extraction from text can be approached through three main paradigms: sequence tagging models, span prediction models, and classification-based models. The sequence tagging method involves labeling each word according to a specific schema, such as the BIO scheme. In the span prediction paradigm, the focus is on identifying the start and end positions of argument spans. Meanwhile, the classification-based approach classifies the provided argument candidates into distinct labels. By leveraging the labeled data outlined in Section 3.2, we assess the ilab-FinCau model in comparison to various transformer-based or generative models across these paradigms. Drawing from the comparative results outlined in the published report [46], we categorized each model into their respective paradigms in Table 5. Each model was assessed for precision (P), recall (R), F1 score (F1), and exact match (EM). The highest-performing model is highlighted in bold, and the second-best model is underlined.

We present the pretrained language models in the PLMs column of Table 5 and note that, aside from MNLP [51] which uses the generative T5 model, all other approaches employ transformer-based models. From the Paradigm column, we see that the sequence tagging (ST) paradigm consistently outperforms the span prediction (SP) paradigm, while the classification-based (CB) paradigm performs the least effectively. Additionally, ensemble learning with ST involves training multiple identical models and determining results through majority voting. The Spock model [47] employs this strategy to achieve top performance, but it incurs significantly higher computational costs. In contrast, our ilab-FinCau model may compromise slightly on metric values but requires fewer computational resources and less time, making it more practical than the Spock model.

After discussing model performance on the training data, we now focus on the influence of random variation in inference. We run experiments using the ilab-FinCau model with three different random states, labeled random 1, random 2, random 3 in Fig. 4. Also, the overall evaluation for all identified textual spans are shown in precision, recall and F-1 score. We observe the increasing scores along with larger epochs, and the general trends of different random states are similar. It suggests that the model keeps learning the features until around 0.8 in the f1-score.

Table 6 The check table for the evaluation of FinCaKG-Onto

5.3 Comprehensive analysis

We aim to summarize the inference performance of the discussed models within the integrated check table. In Table 6, we present the precision values for the sentence extraction and span identification models. Given the similar performance across precision, recall, and F-1 scores noted in Fig. 4 and Table 4, we report only the precision values to optimize space. The other results that rely solely on precision are conducted by human evaluations due to the lack of gold standards. Human evaluators prioritize precision, as it emphasizes correctness while lessening the emphasis on subjective interpretations.

Except for the aforementioned fine-tuned models, we also conduct the evaluations of the pre-trained models, e.g. GENRE and mGENRE, which are used in entity linking module. Ultimately, we managed to attain a precision of 0.91 through the application of the GENRE model for aligning target vocabulary with the right entities, while only achieving a 0.76 precision with mGENRE for seeking and linking any possible sentence mentions to the correct entity. Table 6 contains the reported and managed values.

Regarding the Trustworthiness of FinCaKG-Onto usage, in mapping level, we also examine the mapping precision to WikiData regarding to different terminology resources, such as from Investopedia concepts, from FIBO concepts and from FIBO instances. Then at the case study level, we examine the quality of query results from subjective aspects (Further discussion in Section 6). On the one hand, we summarize the 1-hop neighbor nodes of the toy example in Fig. 5 and assign 0.91 precision for this case. On the other hand, we analyze the 2-hops neighbor nodes of the toy example in Fig. 6 and the precision is determined as 0.83. Besides, we list the various intermediate evidence that is recorded to support tracing back relation extraction resources. You can also locate these values documented and structured in Table 6.

Fig. 5
figure 5

The inference exploration in FinCaKG-Onto

Fig. 6
figure 6

The causality investigation results with 2-hops causality

We proceed to analyze the Interoperability by the widely-accepted conventional vocabulary and the possible interconnectivity to one of the biggest knowledge base - WikiData. Then we finish by illustrating the Accessibility to re-use FinCaKG-Onto in different manners. More details are depicted in the section of Supplementary information.

5.4 Assessment of various causal KGs

We have explored diverse techniques for constructing KGs in Section 2.2, while not all studies publish their resulting KGs, we have chosen a few that specifically include "causality" relations. For example, we identified the causal subgraph from ConceptNet [2] and termed it ConceptNet@cause [38], alongside selecting the relevant data from FinDKG [44] as FinDKG@cause. In addition to the existing causal KGs, we also plan to compare FinCaKG-Onto, excluding ontology and entity linking module as part of an ablation study, which we will refer to simply as FinCaKG. This section will analyze these various causal KGs, focusing on their size statistics, content, and feasibility assessments.

The statistics for the various causal KGs are compiled in Table 7. The Source column reveals that, apart from ConceptNet@cause, which utilizes crowdsourced structural knowledge, the other causal KGs are derived directly from original texts, including news articles and financial reports. Then we list the number of nodes, relations, and documents used for retrieval. Additionally, we calculate the Retrieval Rate to assess the effectiveness of the KG construction methodology in detecting causality. FinCaKG-Onto stands out with the highest retrieval rate, about three times that of FinCaKG, highlighting the role of ontology and the entity linking module in effectively identifying different morphological inflections of predefined terminology. In contrast, FinDKG@cause illustrates that the generative model is not fully optimized given the limited resources available.

There are notable differences in the contents and extensibility of these causal KGs. As shown in Table 8, different KGs utilize distinct terminology to present information. For example, FinDKG@cause is centered on news entities, while FinCaKG-Onto addresses both entities and concepts derived from the FIBO ontology, whereas the others are limited to concepts alone. In terms of extensibility, both ConceptNet@cause and FinCaKG-Onto facilitate matching nodes with existing identifiers from Linked Open Data (LOD), which may include WikiData, DBpedia, and WordNet. This capability enables the knowledge within these causal KGs to be interconnected and leveraged as open-source information.

At the end, we analyze the feasibility of the causal KGs. Table 9 summarizes whether the causal KGs have been quantitatively and qualitatively assessed. We refer to quantitative evaluation as the assessment of the correctness of relations in KGs, accompanied by confidence values for reuse. Due to varying KG construction methodologies, we categorize the evaluations into sequential and final; the former occurs during the intermediate steps, while the latter takes place only at the end. We observed that both FinCaKG and FinCaKG-Onto performed sequential quantitative evaluations but did not execute a final evaluation. Although ConceptNet@cause conducted the final evaluation, this experiment solely relied on "word relatedness" disregarding the performance of relation extraction. Consequently, it can only be considered a partial final evaluation. Thus, we assert that final quantitative evaluations of KGs are prohibitively expensive, as they necessitate human involvement.

Furthermore, we define qualitative evaluation as assessing how easily users can comprehend and interact with the knowledge graph. In Table 9, all KGs have effectively addressed this criterion. Beyond the original evaluation, the downstream applications of these KGs play a vital role in assessing their practical capabilities. We note that ConceptNet@cause is widely used for node and link prediction, given its role as a commonsense knowledge graph. In contrast, FinDKG@cause, which is based on economic news, is well-suited for sentiment analysis. FinCaKG and FinCaKG-Onto excel in logic path depiction, particularly within financial expertise. Additionally, because of its inclusion of FIBO ontology, FinCaKG-Onto is adept at relation inference, particularly for causal chain inference.

Table 7 The statistics of different causal knowledge graphs
Table 8 The scope of different causal knowledge graphs

6 Case study

We conduct a case study to present the usability of FinCaKG-Onto in causality investigation and inference exploration. Besides, we also carry out the comparison between our outputs to that of the prevalent large language model - ChatGPT [55]. Prior to a comprehensive analysis, we would like to visualize the relative causality relationships of an anchor term in FinCaKG-Onto.

6.1 Causality visualization

The massive causality relations in FinCaKG-Onto bring difficulties in visualizing the nodes and relations entirely. In this case, we start by querying one node, i.e. Bad debt and seek all the possible causes. For the sake of a readable view, in Fig. 5, we color the nodes by their classes, i.e. blue for Investo_concept and green for FIBO_concept. Also, we draw causality in red arcs and taxonomic relation in green arcs. The size of the nodes indicates how many neighbor nodes are connected to it. The width of the edge signifies the frequency of this causality that occurs in the financial reports.

6.2 Inference exploration

In Fig. 5, we explore the possibility of inferring extra causality from taxonomic relations. Among the 1-hop neighbors of "Bad debt", we expand all nodes if they hold taxonomic relations to other FIBO_concepts, e.g. "Commodity", "Derivative (finance)" etc. Specifically, we can even get three more sub-classes pointing to "Derivative (finance)": Forward contract, Currency swap, and Entitlement. Since "Derivative (finance)" is one of the factors that cause "Bad debt", it brings possibilities to infer that, those sub-classes of "Derivative (finance)" might also cause "Bad debt". Worth mentioning that, besides this toy example with the anchor term "bad debt", users are capable of exploring causality chains from any other anchor terms in our vocabulary.

Table 9 The feasibility check of different causal knowledge graphs
Fig. 7
figure 7

The output of ChatGPT on causality query, accessed on 2023 May 25th

6.3 Causality investigation

Once we have listed the possible factors within 1-hop neighbors to "Bad debt", then we randomly select two small nodes in 1-hop neighbors and expand them to get the 2-hop neighbors. Please find details in Fig. 6. Meanwhile, we transform this query to a question in natural language from ChatGPT [55], we notice the results from FinCaKG-Onto and ChatGPT are somehow overlapping, see Fig. 7. Based on the anchor term "bad debt", we query its causes on ChatGPT platform and restrict the answer’s context to "accounting report", which is the exact expression of the input data - financial report (Section 3.2).

The screenshots are shown in Fig. 7, which indicates that we were using ChatGPT 2023 May 3 version. They both mention the keywords "Credit", "Economic conditions", "Insolvency" and "Bankruptcy". While ChatGPT lists the extra activities of customer behaviors, FinCaKG-Onto is prone to give reasons in the company’s operation manner: "Lease payment", "Deriarives (finance)", "Rebate (marketing)", "Mergers and acquisitions" and so on. We demonstrate FinCaKG-Onto is capable of digging out fine-grained factors, presenting them in tiers and support to provide detailed concepts or instances for further explanation. Unfortunately, ChatGPT has reached its limitation on in-depth domain knowledge.

Besides, we test the entity-linking performance of ChatGPT. Though it gives a good explanation of our query financial terms, unfortunately, the WikiData identifier given by ChatGPT is a random result. We omit the screenshots here for the sake of page limit but it is a common issue in the current discussion. Large language models show their shortcomings in presenting trustworthy and in-depth knowledge, whereas knowledge graphs in Linked Open Data prove to be much more trustworthy.

7 Discussion

In the representation of FinCaKG-Onto, we allocate weights to the edges based on the frequency of causality occurrences within our corpus. We posit that these link assignments reflect a discernible causality preference, with the weight values serving as indicators of the strength of the identified causal relationships. As a consequence, even within the same domain, the resultant FinCaKG-Onto may exhibit subtle variations when constructed from diverse corpora. For instance, financial reports authored by a company’s executives and those generated by investment analysts may present disparate viewpoints, at times even featuring contradictory information. Nonetheless, the FinCaKG-Onto framework possesses the capacity to capture the nuanced causality and logical chains that cater to various interested groups. Each procedure in our framework is linked to dedicated datasets, where we report our models’ performance. Users can easily substitute their pre-trained models and evaluate them on the same datasets, which are publicly available as outlined in Section 3.2. This design fosters reproducibility in causal knowledge generation.

Aside from its applicability within the financial domain, this methodology can be generalized to various other know-ledge-intensive domains, including but not limited to medical diagnosis, legal text analysis, and e-learning recommendation. The process of achieving this versatility entails the adjustments of domain-specific ontologies, keyword dictionaries, and labeled datasets tailored to the specific domain for specific tasks.

Thanks to the causality bonding modules, the presented methodology demonstrates its capacity to discern complex interrelationships, encompassing multiple causes and effects among keywords within a single sentence, commonly referred to as intra-sentence causality. Nevertheless, we acknowledge that the broader spectrum of causality, as conveyed through the concatenation of multiple sequential sentences, remains unaddressed. We defer this aspect to future investigations.

Our corpus size is limited due to the use of financial reports from the top 3,000 large companies as our original source. To reflect the most current insights, we focus on data from 2017 to 2021 to present the most recent understanding of causal relationships. In the future, we intend to examine evolving causal dynamics over larger time frames.

8 Conclusion

In this paper, we focus on the finance domain and present a framework to automatically construct FinCaKG-Onto from plain text. We introduce the resources and methodology for FinCaKG-Onto construction, present the schema of FinCaKG-Onto, and provide a way to visualize the final knowledge graph FinCaKG-Onto. Beyond these outcomes, on the one hand, we conduct a check table to briefly illustrate the quality of FinCaKG-Onto. Besides that, a case study is carried out to show the possible findings in real user cases, it shows that FinCaKG-Onto can provide detailed domain knowledge with a clear logic path and support to instantiate a concept with concrete examples. While ChatGPT [55] is prone to cover common but shallow knowledge.

Currently, FinCaKG-Onto could present the fine-grained causality for a given term. If we increase the volume of the financial corpus, we believe it will provide broader and more complete expertise in causality investigation and inference exploration. In the future, we are also interested in causality vanishing issues along causal chains and try to detect to which extent we should cut off the connection path, so as to provide much concrete causality chains. Furthermore, we would like to focus on the impact of temporal factors on causality discovery.

One limitation of this study lies in the challenge of evaluating causal relationships in finance due to the absence of a definitive ground truth. Unlike predictive models, where performance can be benchmarked against known outcomes, causal inferences in financial contexts are often subjective, leading to variations in assessments based on the perspectives of different annotators. Furthermore, the inherent complexity of financial systems exacerbates this difficulty. Real-world financial scenarios are riddled with confounding biases, and the observed causality may shift depending on whether these confounders are properly addressed. This introduces considerable hurdles in designing robust evaluation metrics and ensuring their accuracy.

9 Supplementary information

In our webpage, we provide variant resources to share, utilize and visualize the resultant FinCaKG-Ontos:

  1. 1.

    The dump files of FinCaKG-Onto

    1. (a)

      FinCaKG-Onto schema in Turtle/RDF format

    2. (b)

      entity linking results in JSON format

      1. (i)

        the linkages between FIBO concepts to entities from source text

      2. (ii)

        the linkages between FIBO instances to entities from source text

      3. (iii)

        the linkages between Investopedia concepts to entities from source text

  2. 2.

    A video demonstration We record the video to show the processes of querying, managing and visualizing FinCaKG-Onto.