Skip to main content
Log in

Representing and Reasoning About XML with Ontologies

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The eXtensible Markup Language (XML) has reached a wide acceptance as the relevant standardization for representing and exchanging data on the Web. Unfortunately, XML covers the syntactic level but lacks semantics, and thus cannot be directly used for the Semantic Web. Currently, finding a way to utilize XML data for the Semantic Web is challenging research. As we have known that ontology can formally represent shared domain knowledge and enable semantics interoperability. Therefore, in this paper, we investigate how to represent and reason about XML with ontologies. Firstly, we give formalized representations of XML data sources, including Document Type Definitions (DTDs), XML Schemas, and XML documents. On this basis, we propose formal approaches for transforming the XML data sources into ontologies, and we also discuss the correctness of the transformations and provide several transformation examples. Furthermore, following the proposed approaches, we implement a prototype tool that can automatically transform XML into ontologies. Finally, we apply the transformed ontologies for reasoning about XML, so that some reasoning problems of XML may be checked by the existing ontology reasoners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. An Y, Mylopoulos J (2005) Translating XML web data into ontologies. In: OTM workshops, pp 967–976

    Google Scholar 

  2. An Y, Borgida A, Mylopoulos J (2005) Constructing complex semantic mappings between XML data and ontologies. In: Proceeding of ISWC 2005. Springer, Heidelberg, pp 6–20

    Google Scholar 

  3. Anicic N, Ivezic N, Marjanovic Z (2007) Mapping XML schema to OWL. In: Enterprise interoperability. Springer, Berlin, pp 243–252. Part V

    Chapter  Google Scholar 

  4. Antoniou G, van Harmelen F (2008) A semantic web primer, 2nd edn. MIT Press, Cambridge

    Google Scholar 

  5. Aussenac-Gilles N, Kamel M (2009) Ontology learning by analyzing XML document structure and content. In: Proc of the int’l conf on knowledge engineering and ontology development, INSTICC—Institute for Systems and Technologies of Information, Control and Communication, pp 159–165

    Google Scholar 

  6. Baader F, Calvanese D, McGuinness D, Nardi D, Patel-Schneider PF (eds) (2003) The description logic handbook: theory, implementation, and applications. Cambridge University Press, Cambridge

    Google Scholar 

  7. Battle S (2004) Round-tripping between XML and RDF. In: Proceeding of ISWC 2004, Hiroshima, Japan

    Google Scholar 

  8. Baumeister J, Reutelshoefer J, Puppe F (2011) KnowWE: a semantic wiki for knowledge engineering. Appl Intell 35(3):323–344

    Article  Google Scholar 

  9. Bedini I, Gardarin G, Nguyen B (2008) Deriving ontologies from XML schema. In: Proceedings EDA 2008, Toulouse, France, vol B-4, pp 3–17

    Google Scholar 

  10. Bedini I, Gardarin G, Nguyen B (2011) Transforming XML schema to OWL using patterns. In: 5th IEEE international conference on semantic computing (ICSC), Palo Alto, United States, pp 102–109

    Google Scholar 

  11. Berardi D, Calvanese D, De Giacomo G (2005) Reasoning on UML class diagrams. Artif Intell 168(1–2):70–118

    Article  MATH  Google Scholar 

  12. Bohring H, Auer S (2005) Mapping XML to OWL ontologies. In: Proceeding of Marktplatz Internet: von e-Learning bis ePayment. Leipziger Informatik-Tage (LIT2005), Leipzig, Germany, pp 147–156

    Google Scholar 

  13. Bosch T, Mathiak B (2011) XSLT transformation generating OWL ontologies automatically based on XML schemas. In: IEEE 6th international conference for Internet technology and secured transactions (ICITST 2011), pp 660–667

    Google Scholar 

  14. Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (2008) Extensible Markup Language (XML) 1.0, 5th edn. W3C Recommendation, 26 November 2008. http://www.w3.org/TR/REC-xml/

  15. Calvanese D, Giacomo GD, Lenzerini M (1999) Representing and reasoning about XML documents: a description logic approach. J Log Comput 9(3):295–318

    MATH  Google Scholar 

  16. Carroll JJ, Pan JZ (2004) XML schema datatypes in RDF and OWL. Technical report, W3C Semantic Web Best Practices and Development Group, November 2004. http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw/

  17. Castillo-Barrera FE, Durán-Limón HA, Médina-Ramírez C, Rodriguez-Rocha B (2013) A method for building ontology-based electronic document management systems for quality standards-the case study of the ISO/TS 16949: 2002 automotive standard. Appl Intell 38(1):99–113

    Article  Google Scholar 

  18. Cuenca Grau B, Horrocks I, Motik B, Parsia B, Patel-Schneider P, Sattler U (2008) OWL 2: the next step for OWL. Web Semant Sci Serv Agents World Wide Web 6(4):309–322

    Article  Google Scholar 

  19. Erdmann M, Studer R (1999) Ontologies as conceptual models for XML documents. In: Proceedings of the 12th international workshop on knowledge acquisition, modelling and management (KAW’99), Banff, Canada, October 1999

    Google Scholar 

  20. Eyharabide V, Amandi A (2012) Ontology-based user profile learning. Appl Intell 36(4):857–869

    Article  Google Scholar 

  21. Ferdinand M, Zirpins C, Transtour D (2004) Lifting XML schema to OWL. In: Proceeding of ICWE 2004, Munich, Germany, pp 354–358

    Google Scholar 

  22. Garcia R, Perdrix F, Gil R (2006) Ontological infrastructure for a semantic newspaper. In: Proceedings of semantic web annotations for multimedia workshop World Wide Web conference, Edinburgh, UK

    Google Scholar 

  23. Ghawi R, Cullot N (2009) Building ontologies from XML data sources. In: 1st international workshop on modelling and visualization of XML and semantic web data (MoViX ’09), Linz, Austria, pp 480–484, held in conjunction with DEXA’09

    Google Scholar 

  24. HiTSoftware. http://www.hitsw.com/xml_utilites/

  25. Horrocks I (2008) Ontologies and the semantic web. Commun ACM 51(11):58–67

    Article  Google Scholar 

  26. Horrocks I, Patel-Schneider PF, van Harmelen F (2003) From SHIQ and RDF to OWL: the making of a web ontology language. J Web Semant 1(1)

  27. Kim H-R, Chan P (2008) Learning implicit user interest hierarchy for context in personalization. Appl Intell 28:153–166

    Article  Google Scholar 

  28. Klein MCA (2002) Interpreting XML documents via an RDF schema ontology. In: Proceeding of the 13th international workshop on database and expert systems applications, pp 889–894

    Google Scholar 

  29. Klein M, Fensel D, van Harmelen F, Horrocks I (2001) The relation between ontologies and XML schemas. Linköp Electron Artic Comput Inf Sci 6(4)

  30. Klímek J, Nečaský M (2010) Reverse-engineering of XML schemas: a survey. In: DATESO 2010. CEUR workshop proceedings, vol 567, pp 96–107

    Google Scholar 

  31. Knauer C, Urbansky D, Meinecke J, Schuster D, Katz P, Schill A (2011) Semi-automatic semantic lifting of XML to a target ontology. In: The joint international symposium on natural language processing and agriculture ontology service (SNLP-AOS), Bangkok, Thailand

    Google Scholar 

  32. Knublauch H Protégé-OWL: user-defined datatypes. http://protege.stanford.edu/plugins/owl/xsp.html

  33. Kobeissy N, Genet MG, Zeghlache D (2007) Mapping XML to OWL for seamless information retrieval in context-aware environments. In: Proceedings of SIPE’07, the 2nd IEEE international workshop on services integration in pervasive environments, Istanbul, Turkey, pp 349–354

    Google Scholar 

  34. Kunfermann P, Drumm C (2005) Lifting XML schemas to ontologies—the concept finder algorithm. In: The first international workshop on mediation in semantic web services, Dec 2005, pp 113–122

    Google Scholar 

  35. Lethi P, Frankhauser P (2004) XML data integration with OWL: experiences and challenges. In: Proceedings of the symposium on applications and the Internet (SAINT’04). IEEE, Los Alamitos, pp 160–170

    Google Scholar 

  36. Maedche A, Staab S (2001) Ontology learning for the semantic web. IEEE Intell Syst 16(2):72–79

    Article  Google Scholar 

  37. Missikoff M, Velardi P, Fabriani P (2003) Text mining techniques to automatically enrich a domain ontology. Appl Intell 18(3):323–340

    Article  MATH  Google Scholar 

  38. Mousavi A, Nordin MJ, Othman ZA (2012) Ontology-driven coordination model for multiagent-based mobile workforce brokering systems. Appl Intell 36(4):768–787

    Article  Google Scholar 

  39. O’Connor MJ, Das AK (2011) Acquiring OWL ontologies from XML documents. In: Proc 6th int conf knowledge capture, New York

    Google Scholar 

  40. OWL reasoners. http://www.w3.org/2007/OWL/wiki/Implementations

  41. OWL: Ontology Web Language. http://www.w3.org/2004/OWL/

  42. Pan J, Horrocks I (2011) OWL-Eu: adding customised datatypes into OWL. Web Semant Sci Serv Agents World Wide Web 4(1)

  43. Reif G, Gall H, Jazayeri M (2005) WEESA: web engineering for semantic web applications. In: Proceeding of the 14th international conference on World Wide Web, Chiba, Japan, pp 722–729

    Chapter  Google Scholar 

  44. Rodrigues T, Rosa P, Cardoso J (2006) Mapping XML to existing OWL ontologies. In: Proc IADIS international conference WWW/Internet 2006, pp 72–77

    Google Scholar 

  45. Subhashin R, Akilandeswari J (2011) A survey on ontology construction methodologies. Int J Enterp Comput Bus Syst 1(1) (online)

  46. Thuy PTT, Lee YK, Lee S (2009) DTD2OWL: automatic transforming XML documents into OWL ontology. In: Proceedings of the 2nd international conference on interaction sciences, New York, NY, USA, pp 125–131

    Google Scholar 

  47. Toman D, Weddell G (2005) On reasoning about structural equality in XML: a description logic approach. Theor Comput Sci 336(11):181–203

    Article  MATH  MathSciNet  Google Scholar 

  48. Trang. http://www.thaiopensource.com/relaxng/trang.html

  49. Tsinaraki C, Christodoulakis S (2007) XS2OWL: a formal model and a system for enabling XML schema applications to interoperate with OWL-DL domain knowledge and SW Tools. Delos 137–146

  50. Wermter S (2000) Knowledge extraction from transducer neural networks. Appl Intell 12(1–2):27–42

    Article  Google Scholar 

  51. Wood D (1995) Standard generalized markup language: mathematical and philosophical issues. Comput Sci Today 344–365

  52. Wu X, Ratcliffe D, Cameron M (2008) XML schema representation and reasoning: a description logic method. In: 2008 IEEE congress on services, pp 487–494

    Chapter  Google Scholar 

  53. Xiao L, Zhang L, Huang G, Shi B (2004) Automatic mapping from XML documents to ontologies. In: Proceedings of the fourth international conference on computer and information technology, Washington, USA, pp 321–325

    Google Scholar 

  54. Xu J, Li W (2007) Using relational database to build OWL ontology from XML data sources. In: Proceedings of the 2007 international conference on computational intelligence and security workshops. IEEE Computer Society, Washington, pp 124–127

    Chapter  Google Scholar 

  55. Yahia N, Mokhtar SA, Ahmed A (2012) Automatic generation of OWL ontology from XML data source. Int J Comput Sci Issues 9(2), March

  56. Yang K, Steele R, Lo A (2007) An ontology for XML schema to ontology mapping representation. In: Proc iiWAS2007, New York, pp 8–16

    Google Scholar 

  57. Zhang F, Ma ZM, Wang X Wang Y (2010) Formal approach and automated tool for constructing ontology from object-oriented database model. In: Proc. of the 19th ACM conference on information and knowledge management (CIKM 2010), pp 1329–1332

    Google Scholar 

  58. Zhang F, Ma ZM, Yan L (2011) Construction of ontologies from object-oriented database models. Integr Comput-Aided Eng 18(4):327–347

    MathSciNet  Google Scholar 

  59. Zhang F, Yan L, Ma ZM, Cheng J (2011) Knowledge representation and reasoning of XML with ontology. In: Proceedings of the 2011 ACM symposium on applied computing (SAC 2011). ACM, New York, pp 1705–1710

    Chapter  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the anonymous referees for their valuable comments and suggestions, which improved the technical content and the presentation of the paper. The work is supported by National Natural Science Foundation of China (61073139, 60873010, and 61202260) and by Program for New Century Excellent Talents in University (NCET-05-0288) and by the Fundamental Research Funds for the Central Universities (N090504005, N120404005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Z. M. Ma.

Appendix: Proofs of Theorems

Appendix: Proofs of Theorems

Proof of Theorem 1

In the following we give the proof of Theorem 1. As mentioned in Sect. 3.1.3, a XML document d conforms to a DTD D=(r,P) in Definition 1 if dd(D), where d(D) is a set of XML document instances over D, which can be inductively defined as follows: If r is a terminal TT in Definition 1, then d(r)=Td(D); If r is an element type E→(α,A)∈P, then d(r)∈d(D) is a set of sequences <E>d 1d k </E>, where <E> and </E> are the start and end tags of the element EE in Definition 1, and d 1,…,d k are document instances satisfying the constraints of the content model (α, A).

On this basis, the following first proves the first part of Theorem 1. Let dd(D) be a XML document conforming to the DTD D, then a model satisfying the axioms of φ(D)μ(d)=(Δμ(d),•μ(d)) can be inductively defined as follows:

  1. (a)

    If d is a terminal TT, then Δμ(d)=(φ(T))μ(d);

  2. (b)

    If d is a sequence of the form <E>d 1,…,d n </E>, where d i is an instance conforming to the content model E→(α,A) in the DTD D, then the tree-model μ(d) can be constructed as follows:

    $$\begin{aligned} &\Delta ^{\mu (d)} = \{ o,o_{b},o_{1}, \ldots,o_{n},o_{e}\} \cup \bigcup _{1 \le i \le n} \Delta ^{\mu (d_{i})} \\ &\mathit{Tag}^{\mu (d)} = \{ o_{b},o_{e}\} \cup \bigcup _{1 \le i \le n}\mathit{Tag}^{\mu (d_{i})} \\ & \mathit{Start}E^{\mu (d)} = \{ o_{b}\} \cup \bigcup _{1 \le i \le n} \mathit{Start}E^{\mu (d_{i})} \\ &\mathit{End}E^{\mu (d)} = \{ o_{e}\} \cup \bigcup _{1 \le i \le n} \mathit{End}E^{\mu (d_{i})} \\ & f^{\prime \mu ({d})} = \bigl\{ (o,o_{b}),\bigl(o_{1},o_{1}' \bigr),\ldots,\bigl(o_{n},o_{n}'\bigr)\bigr\} \cup \bigcup_{1 \le i \le n} f^{\prime \mu ({d}_{{i}})} \\ &r^{\prime \mu ({d})} = \bigl\{ (o,o_{1}),(o_{1},o_{2}), \ldots,(o_{n - 1},o_{n}),(o_{n},o_{e}) \bigr\} \\ &\hphantom{r^{\prime \mu (d)} =} \cup \bigcup_{1 \le i \le n} r^{\prime \mu (d_{i})} \end{aligned}$$

    where: (i) an class identifier Tag is used to represent all the tags in the XML document d; (ii) for each element type EE, two class identifiers StartE and EndE are used to represent the start tag and end tag of E, respectively; (iii) o denotes the root element of d, o b and o e denote the start and end tags of the root element, respectively, o i denotes the ith component of d, and \(o_{i}'\) denotes the root element of d i , i∈{1,…,n}; (iv) in the model μ(d), f′ and r′ are used to denote the property identifiers, where f′ represents the start tag of an element and r′ represents the other components of the element in the tree structure of the XML document d.

The second part of Theorem 1 can be proved similarly for the first part mentioned above, and they are a mutually inverse process. Given a model of \(\varphi (D) \mathcal{I} = (\Delta^{ \mathcal{I}}, \bullet^{ \mathcal{I}})\) and an object \(o \in \varphi (r)^{ \mathcal{I}} \in \Delta^{ \mathcal{I}}\), a XML document instance λ(o) can be inductively defined as follows:

  1. (a)

    If \(o \in \varphi (T)^{ \mathcal{I}}\), where φ(T) is an identifier in φ(D), then λ(o)=TT;

  2. (b)

    If \(o \in \varphi (E)^{\mathcal{I}}\), where φ(E) is an identifier in φ(D), and there are some integer n≥0 and objects \(o _{{b}}, o _{{i}}, o _{{i}}', o _{{e}}\), such that \(o _{{b}} \in \mathit{StartE}^{\mathcal{I}}, o _{{e}} \in \mathit{EndE}^{\mathcal{I}}, (o, o _{{b}}), (o _{1}, o _{1}'), \dots, (o _{{n}}, o _{{n}}') \in f ^{\prime\mathcal{I}}\), and \((o, o _{1}), (o _{1}, o _{2}), \dots, (o_{n-1}, o _{{n}}), (o _{{n}}, o _{{e}} ) \in r ^{\prime\mathcal{I}}\), then λ(o)=<E>\(\lambda(o _{1}')\dots \lambda(o _{{n}}')\)</E> and EE. Moreover, as mentioned in the areas of description logics and ontologies [6, 25], a model of a description logic knowledge base or an ontology can be represented by a tree model. The tree representation of the model \(\mathcal{I}\) with the object o is shown in the following Fig. 10.

    Fig. 10
    figure 10

    The tree representation of the model \(\mathcal{I}\) with the object o mentioned in the second part of Theorem 1

And next we further prove that the XML document instance λ(o) exists and conforms to the DTD D. Let S be a symbol in TE, then \(o \in \varphi (S)^{\mathcal{I}}\) if and only if λ(o) is defined and λ(o)=d(S)∈d(D). Note that, the symbol d() has been defined at the beginning of the proof of Theorem 1. We proceed by induction on the number of f′-steps on the path from o in \(\mathcal{I}\) (see Fig. 10), which may be divided into two cases: (i) Base case, i.e., there are no f′-steps in the path. Then \(o \in \varphi (S)^{\mathcal{I}}\) is a terminal node as shown in Fig. 10. The case is easy and direct, according to the proposed transformation approach in Sect. 4.1.1, it shows that a data range identifier φ(S) corresponds to a terminal ST in D. Therefore, if S=TT, then \(o \in \varphi (S)^{\mathcal{I}}\) and also λ(o)=d(S)=Td(D) according to the definition of d(D) mentioned above. Otherwise \(o \notin \varphi (S)^{\mathcal{I}}\) and also λ(o)∉d(D); (ii) Inductive case. There are f′-steps in the path. Let o 1,…,o n be the objects along the r′⋅r ′∗-path satisfying the constraints in φ(E), and let \(o _{{i}}'\) be the f′-successor of o i , for i∈{1,…,n}, where E is an element type such that λ(o)=<E>\(\lambda(o _{1}')\dots \lambda(o _{{n}}')\)</E>. If there are symbols S 1,…,S n in TE such that \(o _{{i}}' \in \varphi (S _{{i}})^{\mathcal{I}}\), for i∈{1,…,n}, then by induction hypothesis \(\lambda(o _{{i}}') = d(S _{{i}} ) \in d(D)\), and if SE and the content model α generates the string S 1S n , then \(o \in \varphi (S)^{\mathcal{I}}\) and also λ(o)=d(S)∈d(D). Otherwise \(o \notin \varphi (S)^{\mathcal{I}}\) and also λ(o)∉d(D). □

Proof of Theorem 4

Being similar to the DTD mentioned in Theorem 1, for a XML Schema and its transformed OWL ontology as shown in Sect. 4.2.1, there may be mappings between instance documents of the XML Schema and models of the transformed OWL ontology. Formally, assuming that for a XML document d conforming to the XML Schema, there is σ(d) which is a model of the transformed OWL ontology; and for a model \(\mathcal{I}\) of the transformed OWL ontology, there is \(\tau (\mathcal{I})\) which is a XML document conforming to the XML Schema. Therefore, the proof of this theorem can be shown as follows: If \(S _{1}\not\sqsubseteq S _{2}\), according to Definition 7, we have d(S 1)⊈d(S 2), i.e., there is at least one XML document instance d with dd(S 1) and dd(S 2). According to the above assumption, σ(d) is a model of O with of(r 1)σ(d) and of(r 2)σ(d), where o is the individual corresponding to the root node of d. That is, Of(r 1)⊑f(r 2), and there is a contradiction, so S 1S 2; If Of(r 1)⊑f(r 2), then there is \(o \in \Delta^{\mathcal{I}}\) such that \(o \in (f(r _{1}))^{\mathcal{I}}\) and \(o \notin (f(r _{2}))^{\mathcal{I}}\), where \(\mathcal{I}\) is a model of O. According to the above assumption again, τ (\(\mathcal{I}\)) is a XML document instance for S with τ(o)∈d(S 1) and τ(o)∉d(S 2). That is, \(S _{1} \not\sqsubseteq S _{2}\), and there is a contradiction, so Of(r 1)⊑f(r 2). □

Proof of Theorem 5

The proof of this theorem is similar to the proof of Theorem 4. If Of(r 1)≡f(r 2), then there is \(o \in \Delta^{\mathcal{I}}\) such that \(o \in (f(r _{1}))^{\mathcal{I}}\) and \(o \notin (f(r _{2}))^{\mathcal{I}}\) or \(o \notin (f(r _{1}))^{\mathcal{I}}\) and \(o \in (f(r _{2}))^{\mathcal{I}}\), where \(\mathcal{I}\) is a model of O. According to the assumption in Theorem 4, τ (\(\mathcal{I}\)) is a XML document instance for S with τ(o)∈d(S 1) and τ(o)∉d(S 2) or τ(o)∉d(S 1) and τ(o)∈d(S 2). That is, \(S _{1}\not\equiv S _{2}\), and there is a contradiction, so Of(r 1)≡f(r 2); If \(S _{1} \not\equiv S _{2}\), then there is at least one XML document instance d with dd(S 1) and dd(S 2) or dd(S 1) and dd(S 2). Also, according to the assumption in Theorem 4, σ(d) is a model of O with of(r 1)σ(d) and of(r 2)σ(d) or of(r 1)σ(d) and of(r 2)σ(d), where o is the individual corresponding to the root node of d. That is, Of(r 1)≡f(r 2), and there is a contradiction, so S 1S 2. □

Proof of Theorem 6

We will work in a similar way as in the previous proofs. If Of(r 1)⊓f(r 2)⊑⊥, then there is at least one \(o \in \Delta^{\mathcal{I}}\) such that \(o \in (f(r _{1}))^{\mathcal{I}}\) and \(o \in (f(r _{2}))^{\mathcal{I}}\), where \(\mathcal{I}\) is a model of O. According to the assumption in Theorem 4, τ (\(\mathcal{I}\)) is a XML document instance for S with τ(o)∈d(S 1) and τ(o)∈d(S 2). That is, d(S 1)∩d(S 2)≠∅, and there is a contradiction, so Of(r 1)⊓f(r 2)⊑⊥; If S 1 is not disjoint from S 2, according to Definition 9, we have d(S 1)∩d(S 2)≠∅, i.e., there is at least one XML document instance d with dd(S 1) and dd(S 2). Also, according to the assumption in Theorem 4, σ(d) is a model of O with of(r 1)σ(d) and of(r 2)σ(d), where o is the individual corresponding to the root node of d. That is, Of(r 1)⊓f(r 2)⊑⊥, and there is a contradiction, so S 1S 2. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, F., Ma, Z.M. Representing and Reasoning About XML with Ontologies. Appl Intell 40, 74–106 (2014). https://doi.org/10.1007/s10489-013-0446-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0446-4

Keywords

Navigation