Abstract
The eXtensible Markup Language (XML) has reached a wide acceptance as the relevant standardization for representing and exchanging data on the Web. Unfortunately, XML covers the syntactic level but lacks semantics, and thus cannot be directly used for the Semantic Web. Currently, finding a way to utilize XML data for the Semantic Web is challenging research. As we have known that ontology can formally represent shared domain knowledge and enable semantics interoperability. Therefore, in this paper, we investigate how to represent and reason about XML with ontologies. Firstly, we give formalized representations of XML data sources, including Document Type Definitions (DTDs), XML Schemas, and XML documents. On this basis, we propose formal approaches for transforming the XML data sources into ontologies, and we also discuss the correctness of the transformations and provide several transformation examples. Furthermore, following the proposed approaches, we implement a prototype tool that can automatically transform XML into ontologies. Finally, we apply the transformed ontologies for reasoning about XML, so that some reasoning problems of XML may be checked by the existing ontology reasoners.
Similar content being viewed by others
References
An Y, Mylopoulos J (2005) Translating XML web data into ontologies. In: OTM workshops, pp 967–976
An Y, Borgida A, Mylopoulos J (2005) Constructing complex semantic mappings between XML data and ontologies. In: Proceeding of ISWC 2005. Springer, Heidelberg, pp 6–20
Anicic N, Ivezic N, Marjanovic Z (2007) Mapping XML schema to OWL. In: Enterprise interoperability. Springer, Berlin, pp 243–252. Part V
Antoniou G, van Harmelen F (2008) A semantic web primer, 2nd edn. MIT Press, Cambridge
Aussenac-Gilles N, Kamel M (2009) Ontology learning by analyzing XML document structure and content. In: Proc of the int’l conf on knowledge engineering and ontology development, INSTICC—Institute for Systems and Technologies of Information, Control and Communication, pp 159–165
Baader F, Calvanese D, McGuinness D, Nardi D, Patel-Schneider PF (eds) (2003) The description logic handbook: theory, implementation, and applications. Cambridge University Press, Cambridge
Battle S (2004) Round-tripping between XML and RDF. In: Proceeding of ISWC 2004, Hiroshima, Japan
Baumeister J, Reutelshoefer J, Puppe F (2011) KnowWE: a semantic wiki for knowledge engineering. Appl Intell 35(3):323–344
Bedini I, Gardarin G, Nguyen B (2008) Deriving ontologies from XML schema. In: Proceedings EDA 2008, Toulouse, France, vol B-4, pp 3–17
Bedini I, Gardarin G, Nguyen B (2011) Transforming XML schema to OWL using patterns. In: 5th IEEE international conference on semantic computing (ICSC), Palo Alto, United States, pp 102–109
Berardi D, Calvanese D, De Giacomo G (2005) Reasoning on UML class diagrams. Artif Intell 168(1–2):70–118
Bohring H, Auer S (2005) Mapping XML to OWL ontologies. In: Proceeding of Marktplatz Internet: von e-Learning bis ePayment. Leipziger Informatik-Tage (LIT2005), Leipzig, Germany, pp 147–156
Bosch T, Mathiak B (2011) XSLT transformation generating OWL ontologies automatically based on XML schemas. In: IEEE 6th international conference for Internet technology and secured transactions (ICITST 2011), pp 660–667
Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (2008) Extensible Markup Language (XML) 1.0, 5th edn. W3C Recommendation, 26 November 2008. http://www.w3.org/TR/REC-xml/
Calvanese D, Giacomo GD, Lenzerini M (1999) Representing and reasoning about XML documents: a description logic approach. J Log Comput 9(3):295–318
Carroll JJ, Pan JZ (2004) XML schema datatypes in RDF and OWL. Technical report, W3C Semantic Web Best Practices and Development Group, November 2004. http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw/
Castillo-Barrera FE, Durán-Limón HA, Médina-Ramírez C, Rodriguez-Rocha B (2013) A method for building ontology-based electronic document management systems for quality standards-the case study of the ISO/TS 16949: 2002 automotive standard. Appl Intell 38(1):99–113
Cuenca Grau B, Horrocks I, Motik B, Parsia B, Patel-Schneider P, Sattler U (2008) OWL 2: the next step for OWL. Web Semant Sci Serv Agents World Wide Web 6(4):309–322
Erdmann M, Studer R (1999) Ontologies as conceptual models for XML documents. In: Proceedings of the 12th international workshop on knowledge acquisition, modelling and management (KAW’99), Banff, Canada, October 1999
Eyharabide V, Amandi A (2012) Ontology-based user profile learning. Appl Intell 36(4):857–869
Ferdinand M, Zirpins C, Transtour D (2004) Lifting XML schema to OWL. In: Proceeding of ICWE 2004, Munich, Germany, pp 354–358
Garcia R, Perdrix F, Gil R (2006) Ontological infrastructure for a semantic newspaper. In: Proceedings of semantic web annotations for multimedia workshop World Wide Web conference, Edinburgh, UK
Ghawi R, Cullot N (2009) Building ontologies from XML data sources. In: 1st international workshop on modelling and visualization of XML and semantic web data (MoViX ’09), Linz, Austria, pp 480–484, held in conjunction with DEXA’09
HiTSoftware. http://www.hitsw.com/xml_utilites/
Horrocks I (2008) Ontologies and the semantic web. Commun ACM 51(11):58–67
Horrocks I, Patel-Schneider PF, van Harmelen F (2003) From SHIQ and RDF to OWL: the making of a web ontology language. J Web Semant 1(1)
Kim H-R, Chan P (2008) Learning implicit user interest hierarchy for context in personalization. Appl Intell 28:153–166
Klein MCA (2002) Interpreting XML documents via an RDF schema ontology. In: Proceeding of the 13th international workshop on database and expert systems applications, pp 889–894
Klein M, Fensel D, van Harmelen F, Horrocks I (2001) The relation between ontologies and XML schemas. Linköp Electron Artic Comput Inf Sci 6(4)
Klímek J, Nečaský M (2010) Reverse-engineering of XML schemas: a survey. In: DATESO 2010. CEUR workshop proceedings, vol 567, pp 96–107
Knauer C, Urbansky D, Meinecke J, Schuster D, Katz P, Schill A (2011) Semi-automatic semantic lifting of XML to a target ontology. In: The joint international symposium on natural language processing and agriculture ontology service (SNLP-AOS), Bangkok, Thailand
Knublauch H Protégé-OWL: user-defined datatypes. http://protege.stanford.edu/plugins/owl/xsp.html
Kobeissy N, Genet MG, Zeghlache D (2007) Mapping XML to OWL for seamless information retrieval in context-aware environments. In: Proceedings of SIPE’07, the 2nd IEEE international workshop on services integration in pervasive environments, Istanbul, Turkey, pp 349–354
Kunfermann P, Drumm C (2005) Lifting XML schemas to ontologies—the concept finder algorithm. In: The first international workshop on mediation in semantic web services, Dec 2005, pp 113–122
Lethi P, Frankhauser P (2004) XML data integration with OWL: experiences and challenges. In: Proceedings of the symposium on applications and the Internet (SAINT’04). IEEE, Los Alamitos, pp 160–170
Maedche A, Staab S (2001) Ontology learning for the semantic web. IEEE Intell Syst 16(2):72–79
Missikoff M, Velardi P, Fabriani P (2003) Text mining techniques to automatically enrich a domain ontology. Appl Intell 18(3):323–340
Mousavi A, Nordin MJ, Othman ZA (2012) Ontology-driven coordination model for multiagent-based mobile workforce brokering systems. Appl Intell 36(4):768–787
O’Connor MJ, Das AK (2011) Acquiring OWL ontologies from XML documents. In: Proc 6th int conf knowledge capture, New York
OWL reasoners. http://www.w3.org/2007/OWL/wiki/Implementations
OWL: Ontology Web Language. http://www.w3.org/2004/OWL/
Pan J, Horrocks I (2011) OWL-Eu: adding customised datatypes into OWL. Web Semant Sci Serv Agents World Wide Web 4(1)
Reif G, Gall H, Jazayeri M (2005) WEESA: web engineering for semantic web applications. In: Proceeding of the 14th international conference on World Wide Web, Chiba, Japan, pp 722–729
Rodrigues T, Rosa P, Cardoso J (2006) Mapping XML to existing OWL ontologies. In: Proc IADIS international conference WWW/Internet 2006, pp 72–77
Subhashin R, Akilandeswari J (2011) A survey on ontology construction methodologies. Int J Enterp Comput Bus Syst 1(1) (online)
Thuy PTT, Lee YK, Lee S (2009) DTD2OWL: automatic transforming XML documents into OWL ontology. In: Proceedings of the 2nd international conference on interaction sciences, New York, NY, USA, pp 125–131
Toman D, Weddell G (2005) On reasoning about structural equality in XML: a description logic approach. Theor Comput Sci 336(11):181–203
Tsinaraki C, Christodoulakis S (2007) XS2OWL: a formal model and a system for enabling XML schema applications to interoperate with OWL-DL domain knowledge and SW Tools. Delos 137–146
Wermter S (2000) Knowledge extraction from transducer neural networks. Appl Intell 12(1–2):27–42
Wood D (1995) Standard generalized markup language: mathematical and philosophical issues. Comput Sci Today 344–365
Wu X, Ratcliffe D, Cameron M (2008) XML schema representation and reasoning: a description logic method. In: 2008 IEEE congress on services, pp 487–494
Xiao L, Zhang L, Huang G, Shi B (2004) Automatic mapping from XML documents to ontologies. In: Proceedings of the fourth international conference on computer and information technology, Washington, USA, pp 321–325
Xu J, Li W (2007) Using relational database to build OWL ontology from XML data sources. In: Proceedings of the 2007 international conference on computational intelligence and security workshops. IEEE Computer Society, Washington, pp 124–127
Yahia N, Mokhtar SA, Ahmed A (2012) Automatic generation of OWL ontology from XML data source. Int J Comput Sci Issues 9(2), March
Yang K, Steele R, Lo A (2007) An ontology for XML schema to ontology mapping representation. In: Proc iiWAS2007, New York, pp 8–16
Zhang F, Ma ZM, Wang X Wang Y (2010) Formal approach and automated tool for constructing ontology from object-oriented database model. In: Proc. of the 19th ACM conference on information and knowledge management (CIKM 2010), pp 1329–1332
Zhang F, Ma ZM, Yan L (2011) Construction of ontologies from object-oriented database models. Integr Comput-Aided Eng 18(4):327–347
Zhang F, Yan L, Ma ZM, Cheng J (2011) Knowledge representation and reasoning of XML with ontology. In: Proceedings of the 2011 ACM symposium on applied computing (SAC 2011). ACM, New York, pp 1705–1710
Acknowledgements
The authors wish to thank the anonymous referees for their valuable comments and suggestions, which improved the technical content and the presentation of the paper. The work is supported by National Natural Science Foundation of China (61073139, 60873010, and 61202260) and by Program for New Century Excellent Talents in University (NCET-05-0288) and by the Fundamental Research Funds for the Central Universities (N090504005, N120404005).
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs of Theorems
Appendix: Proofs of Theorems
Proof of Theorem 1
In the following we give the proof of Theorem 1. As mentioned in Sect. 3.1.3, a XML document d conforms to a DTD D=(r,P) in Definition 1 if d∈d(D), where d(D) is a set of XML document instances over D, which can be inductively defined as follows: If r is a terminal T∈T in Definition 1, then d(r)=T∈d(D); If r is an element type E→(α,A)∈P, then d(r)∈d(D) is a set of sequences <E>d 1…d k </E>, where <E> and </E> are the start and end tags of the element E∈E in Definition 1, and d 1,…,d k are document instances satisfying the constraints of the content model (α, A).
On this basis, the following first proves the first part of Theorem 1. Let d∈d(D) be a XML document conforming to the DTD D, then a model satisfying the axioms of φ(D)μ(d)=(Δμ(d),•μ(d)) can be inductively defined as follows:
-
(a)
If d is a terminal T∈T, then Δμ(d)=(φ(T))μ(d);
-
(b)
If d is a sequence of the form <E>d 1,…,d n </E>, where d i is an instance conforming to the content model E→(α,A) in the DTD D, then the tree-model μ(d) can be constructed as follows:
$$\begin{aligned} &\Delta ^{\mu (d)} = \{ o,o_{b},o_{1}, \ldots,o_{n},o_{e}\} \cup \bigcup _{1 \le i \le n} \Delta ^{\mu (d_{i})} \\ &\mathit{Tag}^{\mu (d)} = \{ o_{b},o_{e}\} \cup \bigcup _{1 \le i \le n}\mathit{Tag}^{\mu (d_{i})} \\ & \mathit{Start}E^{\mu (d)} = \{ o_{b}\} \cup \bigcup _{1 \le i \le n} \mathit{Start}E^{\mu (d_{i})} \\ &\mathit{End}E^{\mu (d)} = \{ o_{e}\} \cup \bigcup _{1 \le i \le n} \mathit{End}E^{\mu (d_{i})} \\ & f^{\prime \mu ({d})} = \bigl\{ (o,o_{b}),\bigl(o_{1},o_{1}' \bigr),\ldots,\bigl(o_{n},o_{n}'\bigr)\bigr\} \cup \bigcup_{1 \le i \le n} f^{\prime \mu ({d}_{{i}})} \\ &r^{\prime \mu ({d})} = \bigl\{ (o,o_{1}),(o_{1},o_{2}), \ldots,(o_{n - 1},o_{n}),(o_{n},o_{e}) \bigr\} \\ &\hphantom{r^{\prime \mu (d)} =} \cup \bigcup_{1 \le i \le n} r^{\prime \mu (d_{i})} \end{aligned}$$where: (i) an class identifier Tag is used to represent all the tags in the XML document d; (ii) for each element type E∈E, two class identifiers StartE and EndE are used to represent the start tag and end tag of E, respectively; (iii) o denotes the root element of d, o b and o e denote the start and end tags of the root element, respectively, o i denotes the ith component of d, and \(o_{i}'\) denotes the root element of d i , i∈{1,…,n}; (iv) in the model μ(d), f′ and r′ are used to denote the property identifiers, where f′ represents the start tag of an element and r′ represents the other components of the element in the tree structure of the XML document d.
The second part of Theorem 1 can be proved similarly for the first part mentioned above, and they are a mutually inverse process. Given a model of \(\varphi (D) \mathcal{I} = (\Delta^{ \mathcal{I}}, \bullet^{ \mathcal{I}})\) and an object \(o \in \varphi (r)^{ \mathcal{I}} \in \Delta^{ \mathcal{I}}\), a XML document instance λ(o) can be inductively defined as follows:
-
(a)
If \(o \in \varphi (T)^{ \mathcal{I}}\), where φ(T) is an identifier in φ(D), then λ(o)=T∈T;
-
(b)
If \(o \in \varphi (E)^{\mathcal{I}}\), where φ(E) is an identifier in φ(D), and there are some integer n≥0 and objects \(o _{{b}}, o _{{i}}, o _{{i}}', o _{{e}}\), such that \(o _{{b}} \in \mathit{StartE}^{\mathcal{I}}, o _{{e}} \in \mathit{EndE}^{\mathcal{I}}, (o, o _{{b}}), (o _{1}, o _{1}'), \dots, (o _{{n}}, o _{{n}}') \in f ^{\prime\mathcal{I}}\), and \((o, o _{1}), (o _{1}, o _{2}), \dots, (o_{n-1}, o _{{n}}), (o _{{n}}, o _{{e}} ) \in r ^{\prime\mathcal{I}}\), then λ(o)=<E>\(\lambda(o _{1}')\dots \lambda(o _{{n}}')\)</E> and E∈E. Moreover, as mentioned in the areas of description logics and ontologies [6, 25], a model of a description logic knowledge base or an ontology can be represented by a tree model. The tree representation of the model \(\mathcal{I}\) with the object o is shown in the following Fig. 10.
And next we further prove that the XML document instance λ(o) exists and conforms to the DTD D. Let S be a symbol in T∪E, then \(o \in \varphi (S)^{\mathcal{I}}\) if and only if λ(o) is defined and λ(o)=d(S)∈d(D). Note that, the symbol d() has been defined at the beginning of the proof of Theorem 1. We proceed by induction on the number of f′-steps on the path from o in \(\mathcal{I}\) (see Fig. 10), which may be divided into two cases: (i) Base case, i.e., there are no f′-steps in the path. Then \(o \in \varphi (S)^{\mathcal{I}}\) is a terminal node as shown in Fig. 10. The case is easy and direct, according to the proposed transformation approach in Sect. 4.1.1, it shows that a data range identifier φ(S) corresponds to a terminal S∈T in D. Therefore, if S=T∈T, then \(o \in \varphi (S)^{\mathcal{I}}\) and also λ(o)=d(S)=T∈d(D) according to the definition of d(D) mentioned above. Otherwise \(o \notin \varphi (S)^{\mathcal{I}}\) and also λ(o)∉d(D); (ii) Inductive case. There are f′-steps in the path. Let o 1,…,o n be the objects along the r′⋅r ′∗-path satisfying the constraints in φ(E), and let \(o _{{i}}'\) be the f′-successor of o i , for i∈{1,…,n}, where E is an element type such that λ(o)=<E>\(\lambda(o _{1}')\dots \lambda(o _{{n}}')\)</E>. If there are symbols S 1,…,S n in T∪E such that \(o _{{i}}' \in \varphi (S _{{i}})^{\mathcal{I}}\), for i∈{1,…,n}, then by induction hypothesis \(\lambda(o _{{i}}') = d(S _{{i}} ) \in d(D)\), and if S∈E and the content model α generates the string S 1…S n , then \(o \in \varphi (S)^{\mathcal{I}}\) and also λ(o)=d(S)∈d(D). Otherwise \(o \notin \varphi (S)^{\mathcal{I}}\) and also λ(o)∉d(D). □
Proof of Theorem 4
Being similar to the DTD mentioned in Theorem 1, for a XML Schema and its transformed OWL ontology as shown in Sect. 4.2.1, there may be mappings between instance documents of the XML Schema and models of the transformed OWL ontology. Formally, assuming that for a XML document d conforming to the XML Schema, there is σ(d) which is a model of the transformed OWL ontology; and for a model \(\mathcal{I}\) of the transformed OWL ontology, there is \(\tau (\mathcal{I})\) which is a XML document conforming to the XML Schema. Therefore, the proof of this theorem can be shown as follows: If \(S _{1}\not\sqsubseteq S _{2}\), according to Definition 7, we have d(S 1)⊈d(S 2), i.e., there is at least one XML document instance d with d∈d(S 1) and d∉d(S 2). According to the above assumption, σ(d) is a model of O with o∈f(r 1)σ(d) and o∉f(r 2)σ(d), where o is the individual corresponding to the root node of d. That is, O⊭f(r 1)⊑f(r 2), and there is a contradiction, so S 1⊑S 2; If O⊭f(r 1)⊑f(r 2), then there is \(o \in \Delta^{\mathcal{I}}\) such that \(o \in (f(r _{1}))^{\mathcal{I}}\) and \(o \notin (f(r _{2}))^{\mathcal{I}}\), where \(\mathcal{I}\) is a model of O. According to the above assumption again, τ (\(\mathcal{I}\)) is a XML document instance for S with τ(o)∈d(S 1) and τ(o)∉d(S 2). That is, \(S _{1} \not\sqsubseteq S _{2}\), and there is a contradiction, so O⊨f(r 1)⊑f(r 2). □
Proof of Theorem 5
The proof of this theorem is similar to the proof of Theorem 4. If O⊭f(r 1)≡f(r 2), then there is \(o \in \Delta^{\mathcal{I}}\) such that \(o \in (f(r _{1}))^{\mathcal{I}}\) and \(o \notin (f(r _{2}))^{\mathcal{I}}\) or \(o \notin (f(r _{1}))^{\mathcal{I}}\) and \(o \in (f(r _{2}))^{\mathcal{I}}\), where \(\mathcal{I}\) is a model of O. According to the assumption in Theorem 4, τ (\(\mathcal{I}\)) is a XML document instance for S with τ(o)∈d(S 1) and τ(o)∉d(S 2) or τ(o)∉d(S 1) and τ(o)∈d(S 2). That is, \(S _{1}\not\equiv S _{2}\), and there is a contradiction, so O⊨f(r 1)≡f(r 2); If \(S _{1} \not\equiv S _{2}\), then there is at least one XML document instance d with d∈d(S 1) and d∉d(S 2) or d∉d(S 1) and d∈d(S 2). Also, according to the assumption in Theorem 4, σ(d) is a model of O with o∈f(r 1)σ(d) and o∉f(r 2)σ(d) or o∉f(r 1)σ(d) and o∈f(r 2)σ(d), where o is the individual corresponding to the root node of d. That is, O⊭f(r 1)≡f(r 2), and there is a contradiction, so S 1≡S 2. □
Proof of Theorem 6
We will work in a similar way as in the previous proofs. If O⊭f(r 1)⊓f(r 2)⊑⊥, then there is at least one \(o \in \Delta^{\mathcal{I}}\) such that \(o \in (f(r _{1}))^{\mathcal{I}}\) and \(o \in (f(r _{2}))^{\mathcal{I}}\), where \(\mathcal{I}\) is a model of O. According to the assumption in Theorem 4, τ (\(\mathcal{I}\)) is a XML document instance for S with τ(o)∈d(S 1) and τ(o)∈d(S 2). That is, d(S 1)∩d(S 2)≠∅, and there is a contradiction, so O⊨f(r 1)⊓f(r 2)⊑⊥; If S 1 is not disjoint from S 2, according to Definition 9, we have d(S 1)∩d(S 2)≠∅, i.e., there is at least one XML document instance d with d∈d(S 1) and d∈d(S 2). Also, according to the assumption in Theorem 4, σ(d) is a model of O with o∈f(r 1)σ(d) and o∈f(r 2)σ(d), where o is the individual corresponding to the root node of d. That is, O⊭f(r 1)⊓f(r 2)⊑⊥, and there is a contradiction, so S 1⊗S 2. □
Rights and permissions
About this article
Cite this article
Zhang, F., Ma, Z.M. Representing and Reasoning About XML with Ontologies. Appl Intell 40, 74–106 (2014). https://doi.org/10.1007/s10489-013-0446-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0446-4