An encoding scheme based on fractional number for querying and updating XML data

https://doi.org/10.1016/j.jss.2012.02.054Get rights and content

Abstract

In order to facilitate the XML query processing, several labeling schemes have been proposed to directly determine the structural relationships between two arbitrary XML nodes without accessing the original XML documents. However, the existing XML labeling schemes have to re-label the pre-existing nodes or re-calculate the label values when a new node is inserted into the XML document during an update process. In this paper, we devise a novel encoding scheme based on the fractional number to encode the labels of the XML nodes. Moreover, we propose a mapping method to convert our proposed fractional number based encoding scheme to bit string based encoding scheme with the intention to minimize the label size and save the storage space. By applying our proposed bit string encoding scheme to the range-based labeling scheme and the prefix labeling scheme, the process of re-labeling the pre-existing nodes can be avoided when nodes are inserted as leaf nodes and sibling nodes without affecting the order of XML nodes. In addition, we propose an algorithm to control the increment of label size when new nodes are inserted frequently at a fix place of an XML tree. Experimental results show that our proposed bit string encoding scheme provides efficient support to the process of XML updating without sacrificing the query performance when it is applied to the range-based labeling schemes.

Highlights

► We devise a fractional number based encoding scheme to encode XML nodes’ labels. ► We are able to insert new nodes without re-labeling XML nodes using proposed scheme. ► We propose a mapping method to convert the proposed scheme to bit string scheme. ► We are able to minimize the label size by exploiting the proposed mapping method. ► We devise an algorithm to control label size increment when nodes are inserted frequently.

Introduction

With the growing popularity of XML for representing and exchanging data over the Internet, the need for systems which can store and query the XML data efficiently seems necessary. Several query languages like XPath (Clark and DeRose, 1999) and XQuery (Boag et al., 2007) have been proposed to process the XML data. These query languages are based on the regular path expression to query the XML data. In order to facilitate the XML query processing, the structural relationships between XML nodes should be directly determined without accessing the XML documents. Several researches have been done to label the XML nodes in such a way that the structural relationships between two arbitrary nodes can be determined which include studies by Amagasa et al. (2003), Cohen et al. (2002), Li and Ling (2005), Li et al., 2006, Li et al., 2008, Li and Moon (2001), Min et al., 2007, Min et al., 2009, O’Neil et al. (2004), Tatarinov et al. (2002), Wu et al. (2004), and Zhang et al. (2001).

Generally, XML labeling schemes can be categorized into two groups: static labeling schemes (Amagasa et al., 2003, Li and Moon, 2001, Tatarinov et al., 2002, Zhang et al., 2001) and dynamic labeling schemes (Cohen et al., 2002, Li and Ling, 2005, Li et al., 2006, Li et al., 2008, Min et al., 2007, Min et al., 2009, O’Neil et al., 2004, Wu et al., 2004). Static labeling schemes are adequate when the XML documents are not updatable while dynamic labeling schemes are more suitable when the XML documents can be updated. The advantage of using static labeling schemes is that they need small storage space. However, inserting a new node into the XML tree may involve re-labeling a large number of pre-existing nodes. In dynamic labeling schemes, re-labeling the pre-existing nodes is avoided or at least smaller than static labeling schemes but the length of labels increases dramatically when new nodes are inserted into the XML tree especially when new nodes are inserted frequently at a fix place.

The aim of this paper is to devise a novel encoding scheme to encode the labels of the XML nodes so that the need for re-labeling the pre-existing nodes can be avoided. To achieve this, we rely on fractional numbers. The key feature of fractional numbers is that infinite number of fractional numbers can be inserted between any two unequal fractional numbers. We believe that the problem of re-labeling the pre-existing nodes during the XML updating process can be solved if the XML nodes are encoded with the fractional numbers. Hence, the main contributions of this paper are summarized as follows:

  • We devise a novel encoding scheme based on the fractional numbers to encode the labels of the XML nodes. With this new scheme, we are able to insert new nodes between two consecutive nodes without affecting the order of pre-existing nodes.

  • We propose a mapping method to convert our proposed fractional number based encoding scheme to bit string based encoding scheme with the intention to minimize the label size and save the storage space.

  • We apply our proposed bit string encoding scheme to the range-based labeling scheme and the prefix labeling scheme and perform several experiments to demonstrate the benefit of our proposed bit string encoding scheme for querying and updating the XML data.

  • We devise an algorithm to control the label size increment of our proposed bit string encoding scheme when new nodes are inserted frequently at a fix place of an XML tree.

The remainder of the paper is organized as follows: in Section 2, existing XML labeling and encoding schemes are investigated. In Section 3, our proposed fractional number based encoding scheme is presented. In Section 4, we explain how to convert our proposed fractional number based encoding scheme to bit string encoding scheme in order to reduce the storage space for storing the labels. In Section 5, we apply our proposed bit string encoding scheme to the range-based labeling scheme and the prefix labeling scheme. In Section 6, the process of XML querying is described. In Section 7, the process of XML updating is described. In Section 8, the process of skewed insertions is explained. Experimental results are illustrated in Section 9. Finally, the paper is concluded in Section 10.

Section snippets

Related works

In this section, we first explain existing XML labeling schemes which are categorized into three groups: range-based labeling scheme (Amagasa et al., 2003, Dietz, 1982, Li and Moon, 2001, Min et al., 2007, Min et al., 2009, Zhang et al., 2001), prefix labeling scheme (Cohen et al., 2002, O’Neil et al., 2004, Tatarinov et al., 2002), and prime number labeling scheme (Wu et al., 2004). Then existing encoding schemes (Li and Ling, 2005, Li et al., 2006, Li et al., 2008, Min et al., 2007, Min et

Fractional number based encoding scheme

The key feature of fractional numbers is that infinite number of fraction numbers can be inserted between any two unequal fractional numbers. Therefore, the problem of re-labeling the pre-existing nodes during the XML updating can be solved if the XML nodes are labeled with the fractional numbers.

Here, we first give an example to illustrate how the fractional numbers can be assigned to a set of ordinal decimal numbers in order to easily understand the fractional number generation (FNG)

The proposed bit string encoding scheme

Although the proposed fractional number based encoding scheme can generate a new fractional number between two existing fractional numbers, its drawback is the need of large storage space to store the nominators and denominators. In this section, we propose a mapping method to convert a fractional number in the form of i2j to a corresponding bit string code with the aim of minimizing the required storage space.

Indeed, the total number of bits required to represent the nominator i where i < 2j in

Applying the proposed bit string encoding scheme to different labeling schemes

We can apply our proposed bit string encoding scheme to the containment labeling scheme by replacing the start and end values of Fig. 1 with the bit string codes in Table 1. Therefore, a containment labeling scheme based on our proposed bit string encoding scheme is formed called BS-Containment labeling scheme. An XML tree labeled based on the BS-Containment labeling scheme is illustrated in Fig. 6.

Similarly, when the pre and post values of Fig. 2 are replaced with the bit string codes

The process of XML querying

In order to facilitate the XML query processing, the structural relationships between two arbitrary XML nodes must be determined without directly accessing the XML document. The BS-Containment labeling scheme, the BS-Pre/Post labeling scheme, and the BS-Prefix labeling scheme are able to determine the structural relationships between two arbitrary XML nodes as well as support all XPath axes. In the following, we explain how the BS-Containment labeling scheme and the BS-Prefix labeling scheme

The process of XML updating

In this section, we only explain the process of node deletion and insertion based on the BS-Containment labeling scheme and the BS-Prefix labeling scheme since the process of node deletion and insertion based on the BS-Pre/Post labeling scheme is similar to those in the BS-Containment labeling scheme.

The process of skewed insertions

In this section, we explain the behavior of our proposed bit string encoding scheme when new nodes are always inserted at a fix place of an XML tree. This kind of insertion is called skewed insertion.

In our proposed bit string encoding scheme, when new nodes are always inserted at a fix place, the length of bit string codes increases dramatically. The following example shows this fact.

Example 8.1

Assume that we wish to insert 10 new sibling nodes between node B and node C in Fig. 8. Node B is labeled as

Performance study

We first applied our proposed bit string encoding scheme to the different labeling schemes in order to empirically compare the performance of our proposed encoding scheme to the different labeling schemes for querying and updating the XML data. Then, we compared the behavior of our proposed encoding scheme to the different encoding schemes in the absence of labeling schemes in the case of frequent deletion and insertion and frequent skewed insertions since these kinds of processes are different

Conclusion and future works

In this paper, we first devised a novel encoding scheme based on the fractional number in order to keep the order of XML nodes during XML updating. Then, a bit string encoding scheme was proposed with the aim to minimize the storage space of node labels. Moreover, we applied our proposed bit string encoding scheme to the range-based labeling scheme and the prefix labeling scheme. The experimental results showed that our proposed bit string encoding scheme is efficient when it is applied to the

Meghdad Mirabi is currently is a PhD candidate at the Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM). He obtained his master in computer science from the Universiti Putra Malaysia (UPM), Malaysia in 2009. His research interests include XML data management, XML Query Processing, and XML Access Control.

References (16)

There are more references available in the full text version of this article.

Cited by (17)

  • Pentagonal scheme for dynamic XML prefix labelling

    2020, Knowledge-Based Systems
    Citation Excerpt :

    However, the other form of XML node insertion is uniform, where the node is inserted between random pairs of sequential nodes [42,43]. Some labelling schema only tested their schemes over skewed insertions [34,44,45]. Labelling schemes consider four cases of insertion: inserting before the leftmost sibling, inserting after the rightmost sibling, inserting between two siblings and inserting a child into a leaf node [26,27,29,38,39,42,46,47].

  • A new structure and access mechanism for secure and efficient XML data broadcast in mobile wireless networks

    2017, Journal of Systems and Software
    Citation Excerpt :

    Using the Lineage Encoding, mobile clients have ability to find the results of twig pattern XML queries with performing the bitwise operation on the lineage codes in the relevant G-Nodes. In Mirabi et al. (2014), a novel structure is proposed for streaming the XML data called PS+Pre/Post by integrating the path summary technique (Al-Khalifa et al., 2002) and the pre/post labeling scheme (Mirabi et al., 2012). The proposed XML stream structure exploits the benefits of the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the XML stream.

  • PS+Pre/Post: A novel structure and access mechanism for wireless XML stream supporting twig pattern queries

    2014, Pervasive and Mobile Computing
    Citation Excerpt :

    However, these indexing methods are not suitable to be used for XML data since they are designed for flat data records which are identified by keys while XML data are semi-structured and can be accessed by XML query languages like XPath [10] and XQuery [11]. A lot of researches have been done to efficiently process XML queries over XML data [12–25] however their approaches are not suitable for processing XML queries over an XML stream in a mobile wireless broadcast channel for the following reasons: Recently, several indexing methods have been proposed to selectively access XML data over XML streams in mobile wireless networks [2,26–28].

  • An energy conservation indexing method for secure XML data broadcast in mobile wireless networks

    2014, Pervasive and Mobile Computing
    Citation Excerpt :

    However, these indexing methods are not applicable to be used for the XML data since they are designed for flat data records identified by keys while the XML data are semi-structured and can be accessed by XML query languages like XPath [12] and XQuery [13]. A lot of researches have been done to efficiently process XML queries over the XML data in wired networks with high bandwidth and server processing capacity such as the approaches proposed by [14–28] but they cannot be used for XML query processing over an XML stream in a wireless broadcast channel since the wireless broadcast channel has low bandwidth capacity and mobile clients carry battery powered hand-held devices with limited data processing capacity. Recently, several indexing methods have been proposed to selectively access the XML data over an XML stream in a mobile wireless broadcast network such as the methods proposed by [29–31].

View all citing articles on Scopus

Meghdad Mirabi is currently is a PhD candidate at the Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM). He obtained his master in computer science from the Universiti Putra Malaysia (UPM), Malaysia in 2009. His research interests include XML data management, XML Query Processing, and XML Access Control.

Hamidah Ibrahim is currently an associate professor at the Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM). She obtained her PhD in computer science from the University of Wales Cardiff, UK in 1998. Her current research interests include databases (distributed, parallel, mobile, bio-medical, XML) focusing on issues related to integrity constraints checking, cache strategies, integration, access control, transaction processing, and query processing and optimization; data management in grid and knowledge-based systems.

Nur Izura Udzir is an associate professor at the Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM) since 1998. She received her Bachelor of Computer Science (1996) and Master of Science (1998) from UPM, and her PhD in Computer Science from the University of York, UK (2006). She is a member of IEEE Computer Society. Her areas of specialization are access control, secure operating systems, intrusion detection systems, coordination models and languages, and distributed systems. She is currently the Leader of the Information Security Group at the faculty.

Ali Mamat is an associate professor at the Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM). He obtained his PhD degree in Computer Science from the University of Bradford, U.K. in 1992. His research interests include data management, XML and ontology engineering.

View full text