Skip to main content

A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data

  • Conference paper
  • First Online:
Semantic Technology (JIST 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10055))

Included in the following conference series:

Abstract

A massive amount of XML (Extensible Markup Language) data is available on the web, which can be viewed as tree data. One of the fundamental building blocks of information retrieval from tree data is answering structural queries. Various labeling schemes have been suggested for rapid structural query processing. We focus on the prefix-based labeling scheme that labels each node with a concatenation of its parent’s label and its child order. This scheme has been adapted in RDF (Resource Description Framework) data management systems that index RDF data in tree by grouping subjects. Recently, a MapReduce-based algorithm for the prefix-based labeling scheme was suggested. We observe that this algorithm fails to keep label size minimized, which makes the prefix-based labeling scheme difficult for massive real-world XML datasets. To address this issue, we propose a MapReduce-based algorithm for prefix-based labeling of XML data that reduces label size by adjusting the order of label assignments based on the structural information of the XML data. Experiments with real-world XML datasets show that the proposed approach is more effective than previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://linkeddata.org.

  2. 2.

    http://www.cs.washington.edu/research/xmldatasets/www/repository.html.

References

  1. Clark, J., DeRose, S., et al.: XML path language (XPath) (1999)

    Google Scholar 

  2. Pal, S., Cseri, I., Seeliger, O., Rys, M., Schaller, G., Yu, W., Tomic, D., Baras, A., Berg, B., Churin, D., et al.: XQuery implementation in a relational database system. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB Endowment, pp. 1175–1186 (2005)

    Google Scholar 

  3. O’Neil, P., O’Neil, E., Pal, S., Cseri, I., Schaller, G., Westbury, N.: ORDPATHs: insert-friendly XML node labels. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 903–908. ACM (2004)

    Google Scholar 

  4. Delbru, R., Toupikov, N., Catasta, M., Tummarello, G.: A node indexing scheme for web entity retrieval. In: Aroyo, L., Antoniou, G., Hyvönen, E., Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 240–256. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13489-0_17

    Chapter  Google Scholar 

  5. Choi, H., Lee, K.H., Lee, Y.J.: Parallel labeling of massive XML data with mapreduce. J. Supercomputing 67(2), 408–437 (2014)

    Article  MathSciNet  Google Scholar 

  6. Ahn, J., Im, D.H., Lee, T., Kim, H.G.: A dynamic and parallel approach for repetitive prime number labeling of XML data with MapReduce. J. Supercomputing (To Appear)

    Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Xu, L., Ling, T.W., Wu, H., Bao, Z.: DDE: from dewey to a fully dynamic XML labeling scheme. In: SIGMOD. ACM (2009)

    Google Scholar 

  9. Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 204–215. ACM (2002)

    Google Scholar 

  10. Lin, R.-R., Chang, Y.-H., Chao, K.-M.: A compact and efficient labeling scheme for XML documents. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7825, pp. 269–283. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37487-6_22

    Chapter  Google Scholar 

  11. Lu, J., Meng, X., Ling, T.W.: Indexing and querying XML using extended dewey labeling scheme. Data Knowl. Eng. 70(1), 35–59 (2011)

    Article  Google Scholar 

  12. Klaib, A., Joan, L.: Investigation into indexing XML data techniques (2014)

    Google Scholar 

  13. Xu, L., Bao, Z., Ling, T.W.: A dynamic labeling scheme using vectors. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 130–140. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74469-6_14

    Chapter  Google Scholar 

  14. Li, C., Ling, T.W.: QED: a novel quaternary encoding to completely avoid re-labeling in XML updates. In: CIKM. ACM (2005)

    Google Scholar 

  15. Christophides, V., Karvounarakis, G., Plexousakis, D., Scholl, M., Tourtounis, S.: Optimizing taxonomic semantic web queries using labeling schemes. Web Semant. Sci. Serv. Agents World Wide Web 1(2), 207–228 (2004)

    Article  Google Scholar 

  16. Xu, L., Ling, T.W., Wu, H.: Labeling dynamic XML documents: an order-centric approach. IEEE Trans. Knowl. Data Eng. 24(1), 100–113 (2012)

    Article  Google Scholar 

  17. Subramaniam, S., Haw, S.C., Soon, L.K.: Relab: A subtree based labeling scheme for efficient XML query processing. In: 2014 IEEE 2nd International Symposium on Telecommunication Technologies (ISTT), pp. 121–125. IEEE (2014)

    Google Scholar 

  18. Wu, X., Lee, M.L., Hsu, W.: A prime number labeling scheme for dynamic ordered XML trees. In: ICDE (2004)

    Google Scholar 

  19. Sun, D.H., Hwang, S.C.: A labeling methods for keyword search over large XML documents. J. KIISE 41(9), 699–706 (2014)

    Article  Google Scholar 

  20. Wang, Y., DeWitt, D.J., Cai, J.Y.: X-Diff: An effective change detection algorithm for XML documents. In: 2003 Proceedings of the 19th International Conference on Data Engineering, pp. 519–530. IEEE (2003)

    Google Scholar 

  21. Leonardi, E., Bhowmick, S.S., Madria, S.: Xandy: Detecting changes on large unordered XML documents using relational databases. In: Zhou, L., Ooi, B.C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 711–723. Springer, Heidelberg (2005). doi:10.1007/11408079_65

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0101-16-0054, WiseKB: Big data based self-evolving knowledge base and reasoning platform) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1002236).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong-Gee Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ahn, J., Im, DH., Kim, HG. (2016). A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data. In: Li, YF., et al. Semantic Technology. JIST 2016. Lecture Notes in Computer Science(), vol 10055. Springer, Cham. https://doi.org/10.1007/978-3-319-50112-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50112-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50111-6

  • Online ISBN: 978-3-319-50112-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics