Metallic materials ontology population from LOD based on conditional random field

doi:10.1016/j.compind.2018.03.032

Computers in Industry

Volume 99, August 2018, Pages 140-155

https://doi.org/10.1016/j.compind.2018.03.032 Get rights and content

Highlights

•
This paper presents an approach to populate metallic materials ontology with LOD.
•
The filling position is obtained by the Conditional Random Fields algorithm.
•
This approach enriches the knowledge of metallic materials ontology.
•
Experiments show the feasibility and effectiveness of the method.

Abstract

In recent years, with the rapid development of ontology technology, many relatively perfect domain ontologies have emerged gradually and achieved favorable applications. However, for the existing metallic materials ontologies, such as the metallic materials ontology created by Ashino, MatonTO and ONTORULE, the knowledge of their instances is comparatively insufficient. Additionally, for the users, they hope that not only a large number of the materials instances are included in the ontology, but also the properties of the instances are desired. Linked Open Data (LOD) provides huge open knowledge bases which contain ample materials knowledge. Thus, we expect the knowledge of LOD can be inserted into a specific ontology. Obviously, it is not an easy work, since the LOD is very large, and its structure is inconsistent with ontology’s. Therefore, a method is proposed to populate a specific metallic materials ontology with the metallic materials information in the LOD. Firstly, in the LOD, we determine the information that can be filled into the existing metallic materials ontology. Then, we convert the LOD to Chain Triples (CHTs) according to the filling information. We use conditional random field (CRF) to achieve CHTs' filling positions in the specified metallic materials ontology. Finally, we insert the information into the ontology. The approach is evaluated in light of F-measure, and the experiment results demonstrate that the proposed approach can be effective to populate a specific ontology with the metallic materials data in LOD. This approach not only enriches the existing metallic materials ontology, but also greatly saves the manual efforts on the process of ontology population.

Introduction

With the development of Linked Open Data (LOD) [1,2], domain ontologies are rapidly built [3,4] in a variety of ways, which leads to a rapid increase in the number of ontologies in various fields. At present, relatively perfect domain ontologies have been established in the fields of environment [[5], [6], [7]], chemistry [[8], [9], [10]] and biomedicine [[11], [12], [13]], and applied in their respective fields. In addition, along with continuous development of industrial technology, immeasurable amounts of metallic materials data have accumulated. Metallic materials refer to a substance (or a mixture of substances) such as steel and alloy, which are indispensable to our life and the basis of the industry. Meanwhile, there are also corresponding ontologies in the field of metallic materials, such as MOA (the metallic materials ontology created by Ashino) [14] and STSM [15]. For the existing metallic materials ontologies, although their schemata are relatively complete, their instances need appending gradually [16,17]. However, for users, they hope that not only the domain ontology has a relatively complete schema, but also it contains rich instances. Thus, enriching their instances is necessary. Meanwhile, there are a lot of triples in LOD, such as DBpedia[18,19], Wikipedia and Yago[20,21], and the knowledge of metallic materials that is covered by these data sets can be used to populate the domain ontology. However, there are differences between the LOD and the domain ontology. Therefore, we come up with an idea to populate a specific metallic materials ontology with the metallic materials data in the LOD.

For the semantic web [22,23], the data is associated with each other, instead of existing alone. So in the field of data integration, when the data needs to be integrated into existing structured data, it is indispensable to not only specify explicitly the data type, but also know exactly where the data is integrated. Meanwhile, for domain ontology population, we need to understand the field of the integrated data, and what's more crucial is to obtain the insertion location. In the process of ontology population, the existing methods of obtaining the filling position are mostly manual. In this paper, we present an approach to populate a specific ontology with the metallic materials data from LOD. The data types in LOD are not single, including the concept and property, and even the data is more numerous. Obviously, it's arduous to populate a specific ontology with LOD by using the existing methods. Hence, we endeavor to design a population strategy which uses the machine learning algorithm to obtain the filling positions of the knowledge that needs to be inserted into a specific ontology.

In summary, this paper uses the machine learning algorithm to populate ontology with the metallic materials data in LOD. First of all, in LOD, we determine the data that needs to be inserted as an instance and obtain its related data. Meanwhile, we use CHT (Chain Triple) to describe the structured data which contains the population data that can be filled into the ontology and its related data extracted from LOD, and the detailed definition is given in Section 3. Then, we obtain the filling position in the ontology according to the CRF algorithm. Finally, the data is inserted into the ontology. For experiment testing, we insert the metallic materials data in DBpedia and Yago into existing metallic materials ontologies, such as STSM and MOA. The experiment results show that our method can not only obtain high accuracy and F-measure, but also still achieve higher F-measure when changing the material ontology needs to populate. Meanwhile, it costs a relatively short time to obtain the filling position of the CHT.

The contributions of our work can be summarized as follows:

(1)
For the existing approaches of ontology population, they usually focus more on analyzing natural language text and often neglect other more appropriate sources of information, such as the structured and semantically rich sets of LOD. Being different from the existing approaches, this paper proposes using LOD to populate a specific metallic materials ontology.
(2)
When the LOD is inserted into a specific ontology, the types of data inserted into the ontology are identified. Meanwhile, the data that needs to be filled into the ontology is also determined. In order to obtain the filling position where the data is populated into the ontology, we transform the LOD into an army of CHTs according to the determined filling data, and we specify the format of the CHT, which contains classes, instance and properties. For the CHT, we should note that its instance and properties are the data that is populated into the ontology, and its classes are the information for judging the filling position. In this way, the filling position can be determined by the information of the corresponding CHT, instead of by the whole data of LOD.
(3)
In our proposed approach, the filling position of instance and property in the CHT is obtained by using the CRF algorithm. This approach not only avoids manual statistics and designing the rules of the data which needs to be inserted into the specific ontology, but also achieves ontology population faster. In addition, a generation strategy that combines the specific ontology and CHTs is designed, and the strategy transforms them into the input data set. The users can utilize this strategy to generate the input data set which can be recognized by the CRF algorithm directly.
(4)
We evaluate our approach using precision, recall and F-measure, and its experiment results are satisfactory. Furthermore, as the scale of the data sets increases, the F-measure is constantly increasing. Moreover, we conduct experiments using different LOD data sets and existing metallic materials ontologies, and the results are acceptable.

The rest of the paper is organized as follows: in Section 2, we discuss related work. Section 3 describes the problems and defines the concepts. Following that, Section 4 introduces the approach and process in this paper. In Section 5, we describe detailed implementation method. In Section 6, the experiment evaluation is given and discussed. Finally, Section 7 provides the conclusion and future work.

Section snippets

Related work

In domain ontology, the classes usually constitute the knowledge framework of the whole ontology. For the existing domain ontologies, most research issues focus on the construction and relevance of the whole class knowledge framework. However, the users of ontology desire not only the schema is perfect, but also there is a large number of instances in the domain ontology. Therefore, more and more researches pour attention into the population of instances in domain ontology.

At present, the data

Problem description

In the existing metallic materials ontologies, most of them have satisfactory schemata, but the instance knowledge needs appending increasingly. For example, STSM [15] is an metallic materials ontology, which is developed for the integration of heterogeneous materials data and covers the basic knowledge of metallic materials. STSM contains some basic concepts, e.g. Element, Property, Steel and Unit, which are mainly used to represent the knowledge related to metallic materials. Element is

Approach overview

In the paper, we propose an approach to populate STSM with the metallic materials data in the LOD. Meanwhile, the filling positions are obtained by using the CRF algorithm. Fig. 3 illustrates the process of filling the LOD into the ontology. The steps are as follows.

Step 1. Getting the CHTs from the LOD. Firstly, we determine the node which needs be inserted into the ontology from the LOD. And then, we obtain the related information in the LOD by the node, which contains properties and other

Methodology

In this section, we introduce our proposed method in detail, which is about inserting the metallic materials data in the LOD into STSM [15]. In the method, the filling positions are obtained by using the CRF algorithm.

Experiment environment and performance metrics

All the experiments are run on JDK 1.7 which is deployed on the Intel I7 CPU with 12GB memory on the Windows 7 64 bit version.

We use precision, recall, F-measure and time performance to evaluate our approach.

As shown in Eq. (2), Precision (P) denotes that the correct identification results account for the proportion in all identification results, where |CFP| denotes the number of CHTs which get the correct filling position, and |NCFP| is the number of CHTs which get the uncorrected filling

Conclusion and future work

In this paper, in order to continuously improve the knowledge in the instance of existing metallic materials ontology and provide users with a relatively rich domain knowledge in the ontology, we have proposed an approach to populate existing metallic materials ontology with the metallic materials data in LOD. First and foremost, the LOD is huge and complex, and there exist differences between the LOD and the domain ontology. Thus, we insert the selected specific information into the

Acknowledgments

This work is supported by National Natural Science Foundation of China [No. 51271033, 71271076</GN1>]; Natural Science Foundation of Hebei Province [No. F2018208116, F2013208107]; Hebei Science and Technnology Support Program [No. 16210312D]; and Natural Science Foundation of Hebei Education Department [No. QN2015207].

References (56)

T. Myers et al.
Eco-informatics modeling via semantic inference
Inf. Syst.
(2013)
X. Wang et al.
Ontology-based supply chain decision support for steel manufacturers in China
Expert Syst. Appl.
(2013)
J. Hoffart et al.
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia
Artif. Intell.
(2013)
P. Ristoski et al.
Semantic Web in data mining and knowledge discovery: a comprehensive survey
Web Semant. Sci. Serv. Agents World Wide Web
(2016)
De Vries et al.
Substructure counting graph kernels for machine learning from RDF data
Web Semant. Sci. Serv. Agents World Wide Web
(2015)
X. Zhang et al.
MMKG: an approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia
Comput. Phys. Commun.
(2017)
X. Zhang et al.
MMOY: towards deriving a metallic materials ontology from Yago
Adv. Eng. Inf.
(2016)
F. Ali et al.
Opinion mining based on fuzzy domain ontology and Support Vector Machine: a proposal to automate online review classification
Appl. Soft Comput.
(2016)
J.D. Nielsen et al.
Supervised classification using probabilistic decision graphs
Comput. Stat. Data Anal.
(2009)
K. Liu et al.
Ontology-based sequence labelling for automated information extraction for supporting bridge data analytics
Procedia Eng.
(2016)

C. Bizer et al.

Linked data: the story so far

Int. J. Semant. Web Inf. Syst.

(2009)

T. Heath et al.

Linked data evolving the web into a global data space

Mol. Ecol.

(2011)

N. Guarino

Understanding, building and using ontologies

Int. J. Hum. Comput. Stud.

(1988)

N. Guarino

Formal ontologies and information systems

FOIS’98 Conference

(1998)

K. Llic et al.

The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant

Plant Physiol.

(2007)

B.P. Luigi et al.

The environment ontology: contextualising biological and biomedical entities

J. Biomed. Semant.

(2013)

J. Hastings et al.

The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web

PLoS One

(2011)

J.Y. Choi et al.

A semantic web ontology for small molecules and their biological targets

J. Chem. Inf. Model.

(2010)

P. Sankar et al.

Model tool to describe chemical structures in XML format utilizing structural fragments and chemical ontology

J. Chem. Inf. Model.

(2010)

L.M. Schriml et al.

Disease Ontology: a backbone for disease semantic integration

Nucleic Acids Res.

(2012)

G. Sherlock

Gene Ontology: tool for the unification of biology

Can. Inst. Food Sci. Technol. J.

(2009)

S. Maere et al.

BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks

Bioinformatics

(2005)

T. Ashino

Materials ontology an infrastructure for exchanging materials information and knowledge

Data Sci. J.

(2010)

X. Zhang et al.

STSM: an infrastructure for unifying steel knowledge and discovering new knowledge

Int. J. Database Theory Appl.

(2014)

Ontology for the Steel Domain. http://ontorule-project.eu/resources/steel.html, 2009 (accessed...

P.N. Mendes et al.

DBpedia-A multilingual cross-domain knowledge base

J. Lehmann et al.

DBpedia-A large-scale, multilingual knowledge base extracted from wikipedia

Semantic Web

(2015)

J. Biega et al.

Inside YAGO2s: a transparent information extraction architecture

Cited by (4)

A novel knowledge graph development for industry design: A case study on indirect coal liquefaction process
2022, Computers in Industry
Citation Excerpt :
Secondly, considering the particularity of HAZOP text, we skillfully conceive a novel and reliable information extraction model (HAINEX) based on deep learning in combination with data science, HAINEX can extract the ISK in HAZOP reports based on the ISKSF, which is a practical application that can extend the perspective of data science in engineering design about the industrial information with strong structure and logic. Briefly, HAINEX consists of three modules: an optimized pre-training language model termed IBERT for extracting semantic features, an encoder for obtaining the context features through the bidirectional long short-term memory network (BiLSTM) (Hochreiter et al., 1997; Lindemann et al., 2021), and a decoder based on conditional random field (CRF) (Sutton and Mccallum, 2006; Zhang et al., 2018) with an improved industrial loss function termed IL. HAINEX improves the efficiency of ISK extraction by treating features as a candidate set and screening them.
Hazard and operability analysis (HAZOP) is a remarkable representative in industrial safety engineering. However, a great storehouse of industrial safety knowledge (ISK) in HAZOP reports has not been thoroughly exploited. In order to reuse and unlock the value of ISK and optimize HAZOP, we have developed a novel knowledge graph for industrial safety (ISKG) with HAZOP as the carrier through bridging data science and engineering design. Specifically, firstly, considering that the knowledge contained in HAZOP reports of different processes in industry is not the same, we creatively developed a general ISK standardization framework, it provides a practical scheme for integrating HAZOP reports from various processes and uniformly representing the ISK with diverse expressions. Secondly, we conceive a novel and reliable information extraction model based on deep learning combined with data science, it can effectively mine ISK from HAZOP reports, which alleviates the obstacle of ISK extraction caused by the particularity of HAZOP text. Finally, we build ISK triples and store them in the Neo4j graph database. We take indirect coal liquefaction process as a case study to develop ISKG, and its oriented applications can optimize HAZOP and mine the potential of ISK, which is of great significance to improve the security of the system and enhance prevention awareness for people. ISKG containing the ISK standardization framework and the information extraction model sets an example of the interaction between data science and engineering design, which can enlighten other researchers and extend the perspectives of industrial safety.
A novel knowledge graph development for industry design: A case study on indirect coal liquefaction process
2021, arXiv
A comprehensive review of conditional random fields: variants, hybrids and applications
2020, Artificial Intelligence Review
From vision to content: Construction of domain-specific multi-modal knowledge graph
2019, IEEE Access

View full text

Metallic materials ontology population from LOD based on conditional random field

Highlights

Abstract

Introduction

Section snippets

Related work

Problem description

Approach overview

Methodology

Experiment environment and performance metrics

Conclusion and future work

Acknowledgments

Inf. Syst.

Expert Syst. Appl.

Artif. Intell.

Web Semant. Sci. Serv. Agents World Wide Web

Web Semant. Sci. Serv. Agents World Wide Web

Comput. Phys. Commun.

Adv. Eng. Inf.

Appl. Soft Comput.

Comput. Stat. Data Anal.

Procedia Eng.

Linked data: the story so far

Int. J. Semant. Web Inf. Syst.

Linked data evolving the web into a global data space

Mol. Ecol.

Understanding, building and using ontologies

Int. J. Hum. Comput. Stud.

Formal ontologies and information systems

FOIS’98 Conference

The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant

Plant Physiol.

The environment ontology: contextualising biological and biomedical entities

J. Biomed. Semant.

The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web

PLoS One

A semantic web ontology for small molecules and their biological targets

J. Chem. Inf. Model.

Model tool to describe chemical structures in XML format utilizing structural fragments and chemical ontology

J. Chem. Inf. Model.

Disease Ontology: a backbone for disease semantic integration

Nucleic Acids Res.

Gene Ontology: tool for the unification of biology

Can. Inst. Food Sci. Technol. J.

BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks

Bioinformatics

Materials ontology an infrastructure for exchanging materials information and knowledge

Data Sci. J.

STSM: an infrastructure for unifying steel knowledge and discovering new knowledge

Int. J. Database Theory Appl.

DBpedia-A multilingual cross-domain knowledge base

DBpedia-A large-scale, multilingual knowledge base extracted from wikipedia

Semantic Web

Inside YAGO2s: a transparent information extraction architecture