Full length articleMMOY: Towards deriving a metallic materials ontology from Yago
Introduction
With the development of materials informatics, knowledge base plays a critical role for intelligent applications of materials science and engineering, which could accelerate the design and discover of materials, and meanwhile reduce commercialization cycle for new materials [1], [2], [3]. At present, the available volume of knowledge in materials science domain is rapidly growing in different types, which makes it possible to generate knowledge base of materials science by integrating this knowledge in different types. Recently, ontologies and semantic web technologies [4] are used widely in various domains, and semantic representation has become one of the important ways for information integration. So far, in materials domain there have emerged some ontologies [5], e.g. PREMΛP ontology [6] and FreeClassOWL [7], which can provide abstract model of materials science and engineering. However, most of them are designed manually, and provide useful but relatively limited materials knowledge. In contrast, in recent years, Linked Open Data (LOD) [8] cloud has increased significantly and many linked data and ontologies are published in various domains, such as, Yago [9], [10], [11], DBpedia [12] and Freebase [13] which are comprised of knowledge spanning varies domains. These huge datasets acting as open knowledge bases contain a lot of materials concepts, and some of them also provide complete knowledge structure that makes it possible for us to generate materials ontology by extracting the knowledge from these huge datasets. DBpeida and Freebase contain a large amount of metallic materials instances, but the classes are relatively less, while Yago contains much more classes than DBpedia and Freebase in metallic materials domain. In addition, the accuracy of Yago has been manually evaluated and the correctness is greater than 95%, which provides a reliable guarantee on data quality. Thus, in this paper, we extract the materials knowledge as well as the related knowledge, especially in metallic materials domain, from Yago, so as to generate a metallic materials ontology (named MMOY).
Yago is a huge dataset which is comprised of more than 10 million entities (e.g., persons, organizations, cities) as well as more than 120 million facts about these entities, and it classifies these entities to more than 350,000 classes [14] by the taxonomies of WordNet [15] and Wikipedia [16] category system. Therefore, Yago contains not only the knowledge structure of metallic materials based on the taxonomies of WordNet and Wikipedia, but also a large amount of metallic materials concepts, for example, alloy, copper and steel. In addition, the basic properties (e.g., chemical, physical, electrical and mechanical property) and the process technologies (e.g., heat treatment) are also contained in Yago. Meanwhile, it also consists of some related entities (e.g., auto company and metalware).
The motivation of our work is to use Yago to facilitate the semantic integration in metallic materials domain. As we know, there are various ways available to access Yago, e.g. SPARQL endpoint, graph browser and thematic dumps. However, there may be still some inconvenience in some cases for the domain-specific semantic integration, if we access Yago directly by SPARQL. For instance, if we measure the similarity of two concepts, we might usually want to know about the paths between the two concepts in Yago to support the computation of the distance of them. Therefore, dumping the required domain knowledge from Yago may be better for our requirement. Nevertheless, there is not appropriate theme available for dumping metallic materials knowledge (together with the related knowledge) from Yago, so we cannot use thematic dump to extract metallic materials knowledge directly. If SPARQL endpoint is used to extract metallic materials knowledge and related knowledge, we should design a large number of SPARQL query statements and analyze results one by one, and meanwhile we also need build the MMOY manually. Obviously, it’s a hard work. Hence, we try to design an extraction strategy to generate a metallic materials ontology from Yago (MMOY), which can act as a domain background knowledge for the semantic applications in metallic materials domain.
However, there exist many difficulties in the process of extracting metallic materials knowledge, e.g. (1) the knowledge structure of metallic materials in Yago is implicit and (2) the naming of metallic materials concepts is unknown. Hence, in order to extract the knowledge from such a huge and complex dataset accurately, our approach combines similarity algorithm with the Yago structure [11]. The contributions of our work can be summarized as follows:
- (1)
An approach is proposed to derive MMOY from Yago. First, candidate keywords are defined and string matching algorithm is used to initially identify the metallic materials concepts in Yago. Then, based on the matching results, both hierarchical structure and non-hierarchical structure of Yago are used to acquire the domain knowledge structure as completely as possible, and the former is for metallic materials knowledge and the latter is for the related knowledge. In the proposed approach, just a small number of keywords are required, which can speed up the matching process, and taking full advantage of Yago structure makes up the limitation of the string matching strategy.
- (2)
In our proposed approach, a set of rules is designed to extract the metallic materials knowledge and related knowledge according to the features of Yago structure, and each rule is represented in predicate logic language.
- (3)
We evaluate our approach using precision, recall, F1-measure and time performance. The experimental results demonstrate that our method returns expected precision, recall and F1-measure. Furthermore, with the increasing of scale of the datasets, the time cost has not significantly increased. Thus, the proposed approach can extract the metallic materials knowledge and related knowledge from Yago effectively, and the time performance is acceptable.
- (4)
A prototype system is designed to visually display the knowledge structure of MMOY. In this system, the relations between concepts can be displayed obviously and users can have a better understanding of the metallic materials concepts and knowledge structure.
Although MMOY contains a lot of metallic materials concepts and a comparatively comprehensive knowledge structure, it lacks specific digital description for concepts. Based on these features, MMOY can be used in the following aspects. (1) In the materials community, lots of materials knowledge is hidden in non-structure and semi-structure data (e.g., text files and web pages), and due to the materials knowledge scatters in the natural language text, it is hard to exploit them. Since MMOY contains most of the terms in metallic materials domain as well as the relations between them, it can be utilized to facilitate the identification of high-value materials knowledge. (2) Traditional materials datasets (e.g., relational databases and Excel documents) focus more on the data description and value in a specific aspect, and the PSPP (Processing-structure-property-performance) linkages [17] are implicit or very weak. If MMOY is associated with traditional materials datasets, the relations in MMOY can give the materials datasets a semantic enrichment which can provide a better support for materials experts to do their research work such as materials selection [18]. Moreover as a domain ontology, MMOY’s reasoning ability can help traditional datasets check consistency and discover new knowledge hidden in the datasets. (3) MMOY has a complete knowledge structure with rich hyponymy relations, so it could act as a background knowledge base in materials science domain to support materials ontology matching [19] which is the key point for resolving the heterogeneity across different materials information sources. In addition, (4) Due to lack of semantic information, traditional keyword-based search methods for metallic materials domain have their limitation. MMOY holds a domain-specific knowledge graph, so it can be used to expand user’s query [20] so as to improve the retrieval effectiveness.
The rest of the paper is organized as follows: in Section 2 we discuss related work. Section 3 defines problems and introduces the approach and process in this paper. Following that Section 4 introduces detailed implementation method. In Section 5, the experimental evaluation is given and discussed. Section 6 describes a prototype system. Finally, Section 7 provides the conclusion.
Section snippets
Related work
For extracting or retrieving domain knowledge from LOD, the recognition of target concepts is one of the main challenges. Currently, there exist some approaches or strategies to identify target concepts. For example, Calegari and Pasi [21] use “bags of words” related to the users’ interests to identify the similar entities in Yago with exact string matching and partial string matching so as to generate purpose ontology. In KFM [22], the string matching scores of predicate sets are used to find
Problem definition
The main purpose of this paper is to generate a metallic materials ontology (MMOY), which contains metallic materials knowledge and related knowledge from Yago. Metallic materials knowledge mainly refers to metallic materials classification, basic property of metallic materials, processing technology (e.g. heat treatment), chemical composition and organization structure, etc. Related knowledge, such as, manufacturers and metalwork, refers to the concepts which are relevant to metallic materials
Methodology
In this section, we introduce our proposed method in detail, and a set of rules is given for deriving MMOY from Yago.
Experimental environment and performance metrics
All the experiments are run on JDK 1.7 which is deployed on the Intel I5 CPU with 8 GB memory on the OS X 10.9 64 bit version. Moreover, the experimental dataset is Yago2s [9] and its size is about 7.5 GB.
We use precision, recall, F1-measure and time performance to evaluate our approach.
As shown in Eq. (11), Precision (P) denotes that the correct search results account for the proportion in all search results, where |TC| denotes the number of target classes, and |NTC| is the number of
Prototype system
We have designed a system based on the metallic materials ontology (MMOY) extracted from Yago to visually show the relations between two metallic materials concepts. Fig. 16 shows the concepts of steel knowledge of the metallic materials and the relations between two concepts with the chord graph and force-directed graph. As shown in Fig. 16, Fig 16① displays the triples of the metallic materials ontology which contains definitions of metallic materials concepts, the descriptions of concepts
Conclusion and future work
In this paper, we have proposed an approach to generate a metallic materials ontology (MMOY) based on the structure of Yago and string matching algorithm. Since the dataset is huge and complex, we could not sufficiently understand what the names of metallic materials concepts in Yago are. String matching algorithm is utilized to obtain metallic materials concepts in Yago, then seven rules are designed to extract the metallic materials knowledge and related knowledge according to the Yago
Acknowledgements
This work is supported by National Natural Science Foundation of China (Nos. 51271033, 71271076), and Natural Science Foundation of Hebei Province (No. F2013208107).
References (47)
- et al.
A survey on knowledge representation in materials science and engineering: an ontological perspective
Comput. Ind.
(2015) - et al.
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia, Artif. Intell., Special Issue on Artificial Intelligence
Wikipedia Semi-Struct. Resour.
(2013) - et al.
An ontology-based knowledge framework for engineering material selection
Adv. Eng. Inform.
(2015) - et al.
Ontology matching: a literature review
Expert Syst. Appl.
(2015) - et al.
A query expansion method for retrieving online BIM resources based on Industry Foundation Classes
Automat. Constr.
(2015) - et al.
Personal ontologies: generation of user profiles based on the YAGO ontology
Inf. Process. Manage. Int. J.
(2013) - et al.
A chemogenomic analysis of the human proteome: application to enzyme families
J. Biomol. Screen.
(2007) - et al.
A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain
J. Biomed. Inform.
(2014) - et al.
Matching large ontologies: a divide-and-conquer approach
Data Knowl. Eng.
(2008) - et al.
User’s profile ontology-based semantic framework for personalized food and nutrition recommendation
Procedia Comput. Sci.
(2014)
A semantic similarity measure based on information distance for ontology alignment
Inf. Sci.
A novel insight into Gene Ontology semantic similarity
Genomics
Materials science with large-scale data and informatics: unlocking new opportunities
MRS Bull.
Role of materials data science and informatics in accelerated materials innovation
MRS Bull.
Materials data science: current status and future outlook
Annu. Rev. Mater. Res.
The semantic web
Sci. Am. Mag.
PREMΛP: knowledge driven design of materials and engineering process
BauDataWeb: the Austrian building and construction materials market as linked data
Linked open data
Inside YAGO2s: a transparent information extraction architecture
DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia
Semantic Web
Cited by (9)
Toward a digital materials mechanical testing lab
2023, Computers in IndustryMetallic materials ontology population from LOD based on conditional random field
2018, Computers in IndustryCitation Excerpt :For example, an approach is proposed to build metallic materials knowledge graph based on DBpedia and Wikipedia [30]. According to Yago's structure and string matching algorithm, a metallic materials ontology is generated [31]. At present, the researches on ontology population by using LOD are underway [32].
The Intersection Between Semantic Web and Materials Science
2023, Advanced Intelligent SystemsConstruction of a 3D Model Knowledge Base Based on Feature Description and Common Sense Fusion
2023, Applied Sciences (Switzerland)Ontology of Lithography-Based Processes in Additive Manufacturing with Focus on Ceramic Materials
2022, Lecture Notes in Networks and Systems