Skip to main content

Advertisement

Log in

A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A massive amount of extensible markup language (XML) data from various areas is available on the Web. Answering structural queries against XML data is important, as it is the core of information retrieval systems for XML data. Labeling scheme has been suggested for rapid query processing of massive XML data. Interval-based, prefix-based, and prime number labeling scheme exist. Of these, the prime number labeling scheme has the advantage of query processing by arithmetic operations. Recently, the repetitive prime number labeling scheme was proposed; this scheme produces a smaller label size than conventional prime number labeling using prime numbers repetitively. However, a parallel algorithm for the repetitive prime number labeling scheme does not exist; therefore, this scheme is difficult to apply to massive XML data. In this paper, a dynamic and parallel approach of XML labeling algorithm that works with MapReduce is proposed for, particularly, the repetitive prime number labeling scheme. Two optimization techniques are devised: the label assignment order adjustment to further reduce the label size and the upper tree compressing technique to reduce the memory requirements during the labeling process. Experiments over real-world XML data confirmed that the techniques are effective than the previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Choi H, Lee KH, Lee YJ (2014) Parallel labeling of massive XML data with MapReduce. J Supercomput 67(2):408–437

    Article  Google Scholar 

  2. Christophides V, Karvounarakis G, Plexousakis D, Scholl M, Tourtounis S (2004) Optimizing taxonomic semantic web queries using labeling schemes. Web Semant Sci Serv Agents World Wide Web 1(2):207–228

    Article  Google Scholar 

  3. Clark J, DeRose S et al (2004) XML Path Language (XPath) Version 1.0, W3C Recommendation, 1999

  4. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  5. Klaib A, Joan L (2014) Investigation into indexing XML data techniques. In: Proceedings. The International Conference on Internet Computing (ICOMP), 21–24 July 2014, Las Vegas, USA

  6. Leonardi E, Bhowmick SS, Madria S (2005) Xandy: detecting changes on large unordered XML documents using relational databases. In: Zhou L, Ooi BC, Meng X (eds) Database systems for advanced applications. Springer, Berlin, pp 711–723

  7. Li C, Ling TW (2005) QED: a novel quaternary encoding to completely avoid re-labeling in XML updates. In: CIKM. ACM

  8. Lin RR, Chang YH, Chao KM (2013) A compact and efficient labeling scheme for XML documents. In: Meng W, Feng L, Bressan S, Winiwarter W, Song W(eds) Database systems for advanced applications. Springer, Berlin, pp 269–283

  9. Lu J, Meng X, Ling TW (2011) Indexing and querying XML using extended Dewey labeling scheme. Data Knowl Eng 70(1):35–59

    Article  Google Scholar 

  10. Morozov S, Saiedian H, Wang H (2014) Reusable prime number labeling scheme for hierarchical data representation in relational databases. J Comput Inf Technol 22(1):31–43

  11. Subramaniam S, Haw SC, Soon LK (2014) Relab: a subtree based labeling scheme for efficient XML query processing. In: IEEE 2nd International Symposium on Telecommunication Technologies (ISTT). IEEE, pp 121–125

  12. Sun DH, Hwang SC (2014) A labeling methods for keyword search over large XML documents. J KIISE 41(9):699–706

  13. Tatarinov I, Viglas SD, Beyer K, Shanmugasundaram J, Shekita E, Zhang C (2002) Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 204–215. ACM

  14. Wang Y, DeWitt DJ, Cai JY (2003) X-diff: an effective change detection algorithm for XML documents. In: ICDE, 2003. IEEE, pp 519–530

  15. Wu X, Lee ML, Hsu W (2004) A prime number labeling scheme for dynamic ordered XML trees. In: ICDE

  16. Xu L, Bao Z, Ling TW (2007) A dynamic labeling scheme using vectors. In: Wagner R, Revell N, Pernul G (eds) Database and expert systems applications. Springer, Berlin, pp 130–140

  17. Xu L, Ling TW, Wu H (2012) Labeling dynamic XML documents: an order-centric approach. IEEE Trans Knowl Data Eng 24(1):100–113

    Article  Google Scholar 

  18. Xu L, Ling TW, Wu H, Bao Z (2009) DDE: from dewey to a fully dynamic XML labeling scheme. In: SIGMOD. ACM

Download references

Acknowledgments

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0101-16-0054, WiseKB: Big data based self-evolving knowledge base and reasoning platform) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1002236) and ETRI R&D Program (“Development of Big Data Platform for Dual Mode Batch-Query Analytics, 16ZS1410”) funded by the Government of Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Hyuk Im.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahn, J., Im, DH., Lee, T. et al. A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce. J Supercomput 73, 810–836 (2017). https://doi.org/10.1007/s11227-016-1803-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1803-y

Keywords

Navigation