A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

Ahn, Jinhyun; Im, Dong-Hyuk; Lee, Taewhi; Kim, Hong-Gee

doi:10.1007/s11227-016-1803-y

A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

Published: 05 July 2016

Volume 73, pages 810–836, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jinhyun Ahn¹,
Dong-Hyuk Im²,
Taewhi Lee³ &
…
Hong-Gee Kim⁴

415 Accesses
3 Citations
Explore all metrics

Abstract

A massive amount of extensible markup language (XML) data from various areas is available on the Web. Answering structural queries against XML data is important, as it is the core of information retrieval systems for XML data. Labeling scheme has been suggested for rapid query processing of massive XML data. Interval-based, prefix-based, and prime number labeling scheme exist. Of these, the prime number labeling scheme has the advantage of query processing by arithmetic operations. Recently, the repetitive prime number labeling scheme was proposed; this scheme produces a smaller label size than conventional prime number labeling using prime numbers repetitively. However, a parallel algorithm for the repetitive prime number labeling scheme does not exist; therefore, this scheme is difficult to apply to massive XML data. In this paper, a dynamic and parallel approach of XML labeling algorithm that works with MapReduce is proposed for, particularly, the repetitive prime number labeling scheme. Two optimization techniques are devised: the label assignment order adjustment to further reduce the label size and the upper tree compressing technique to reduce the memory requirements during the labeling process. Experiments over real-world XML data confirmed that the techniques are effective than the previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data

High-performance XML modeling of parallel queries based on MapReduce framework

Article 14 September 2016

A new effective method for labeling dynamic XML data

Article Open access 19 December 2018

References

Choi H, Lee KH, Lee YJ (2014) Parallel labeling of massive XML data with MapReduce. J Supercomput 67(2):408–437
Article Google Scholar
Christophides V, Karvounarakis G, Plexousakis D, Scholl M, Tourtounis S (2004) Optimizing taxonomic semantic web queries using labeling schemes. Web Semant Sci Serv Agents World Wide Web 1(2):207–228
Article Google Scholar
Clark J, DeRose S et al (2004) XML Path Language (XPath) Version 1.0, W3C Recommendation, 1999
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Klaib A, Joan L (2014) Investigation into indexing XML data techniques. In: Proceedings. The International Conference on Internet Computing (ICOMP), 21–24 July 2014, Las Vegas, USA
Leonardi E, Bhowmick SS, Madria S (2005) Xandy: detecting changes on large unordered XML documents using relational databases. In: Zhou L, Ooi BC, Meng X (eds) Database systems for advanced applications. Springer, Berlin, pp 711–723
Li C, Ling TW (2005) QED: a novel quaternary encoding to completely avoid re-labeling in XML updates. In: CIKM. ACM
Lin RR, Chang YH, Chao KM (2013) A compact and efficient labeling scheme for XML documents. In: Meng W, Feng L, Bressan S, Winiwarter W, Song W(eds) Database systems for advanced applications. Springer, Berlin, pp 269–283
Lu J, Meng X, Ling TW (2011) Indexing and querying XML using extended Dewey labeling scheme. Data Knowl Eng 70(1):35–59
Article Google Scholar
Morozov S, Saiedian H, Wang H (2014) Reusable prime number labeling scheme for hierarchical data representation in relational databases. J Comput Inf Technol 22(1):31–43
Subramaniam S, Haw SC, Soon LK (2014) Relab: a subtree based labeling scheme for efficient XML query processing. In: IEEE 2nd International Symposium on Telecommunication Technologies (ISTT). IEEE, pp 121–125
Sun DH, Hwang SC (2014) A labeling methods for keyword search over large XML documents. J KIISE 41(9):699–706
Tatarinov I, Viglas SD, Beyer K, Shanmugasundaram J, Shekita E, Zhang C (2002) Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 204–215. ACM
Wang Y, DeWitt DJ, Cai JY (2003) X-diff: an effective change detection algorithm for XML documents. In: ICDE, 2003. IEEE, pp 519–530
Wu X, Lee ML, Hsu W (2004) A prime number labeling scheme for dynamic ordered XML trees. In: ICDE
Xu L, Bao Z, Ling TW (2007) A dynamic labeling scheme using vectors. In: Wagner R, Revell N, Pernul G (eds) Database and expert systems applications. Springer, Berlin, pp 130–140
Xu L, Ling TW, Wu H (2012) Labeling dynamic XML documents: an order-centric approach. IEEE Trans Knowl Data Eng 24(1):100–113
Article Google Scholar
Xu L, Ling TW, Wu H, Bao Z (2009) DDE: from dewey to a fully dynamic XML labeling scheme. In: SIGMOD. ACM

Download references

Acknowledgments

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0101-16-0054, WiseKB: Big data based self-evolving knowledge base and reasoning platform) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1002236) and ETRI R&D Program (“Development of Big Data Platform for Dual Mode Batch-Query Analytics, 16ZS1410”) funded by the Government of Korea.

Author information

Authors and Affiliations

Biomedical Knowledge Engineering Laboratory and Dental Research Institute, Seoul National University, Seoul, Korea
Jinhyun Ahn
Department of Computer and Information Engineering, Hoseo University, Asan, Korea
Dong-Hyuk Im
BigData Intelligence Research Department, Electronics and Telecommunications Research Institute, Daejon, Korea
Taewhi Lee
Biomedical Knowledge Engineering Laboratory, Dental Research Institute and Institute of Human-Environment Interface Biology, Seoul National University, Seoul, Korea
Hong-Gee Kim

Authors

Jinhyun Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Hyuk Im
View author publications
You can also search for this author in PubMed Google Scholar
Taewhi Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Gee Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong-Hyuk Im.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahn, J., Im, DH., Lee, T. et al. A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce. J Supercomput 73, 810–836 (2017). https://doi.org/10.1007/s11227-016-1803-y

Download citation

Published: 05 July 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11227-016-1803-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

Abstract

Access this article

Similar content being viewed by others

A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data

High-performance XML modeling of parallel queries based on MapReduce framework

A new effective method for labeling dynamic XML data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

Abstract

Access this article

Similar content being viewed by others

A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data

High-performance XML modeling of parallel queries based on MapReduce framework

A new effective method for labeling dynamic XML data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation