Skip to main content

GRAMS3: An Efficient Framework for XML Structural Similarity Search

  • Conference paper
  • 681 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Abstract

Structural similarity search is a fundamental technology for XML data management. However, existing methods do not scale well with large volume of XML document. The pq-gram is an efficient way of extracting substructure from the tree-structured data for approximate structural similarity search. In this paper, we propose an effective framework GRAMS3 for evaluating structural similarity of XML data. First pq-grams of XML document are extracted; then we study the characteristics of pq-gram of XML and generate doc-gram vector using TGF-IGF model for XML tree; finally we employ locality sensitive hashing for efficiently structural similarity search of XML documents. An empirical study using both synthetic and real datasets demonstrates the framework is efficient.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bertino, E., Guerrini, G., Mesiti, M.: A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications. Information Systems 29(1), 23–46 (2004)

    Article  MathSciNet  Google Scholar 

  2. Viyanon, W., Madria, S.K., Bhowmick, S.S.: XML Data Integration Based on Content and Structure Similarity Using Keys. In: OTM, pp. 484–493 (2008)

    Google Scholar 

  3. Tekli, J., Chbeir, R., Yetongnon, K.: An overview on XML similarity: background, current trends and future directions. Computer Science Review 3(3), 151–173 (2009)

    Article  Google Scholar 

  4. Jiang, T., Wang, L., Zhang, K.: Alignment of Trees-An Alternative to Tree Edit. In: CPM, pp. 75–86 (1994)

    Google Scholar 

  5. Yang, R., Kalnis, P., Tung, A.K.H.: Similarity evaluation on tree-structured data. In: SIGMOD, pp. 754–765 (2005)

    Google Scholar 

  6. Okura, N., Hirata, K., Kuboyama, T., Harao, M.: The q-Gram Distance for Ordered Unlabeled Trees. IEIC Technical Report, 105(273), 25–29 (2005)

    Google Scholar 

  7. Augsten, N., Böhlen, M., Gamper, J.: Approximate matching of hierarchical data using pq-grams. In: VLDB, pp. 301–312 (2005)

    Google Scholar 

  8. UW XML Repository (2009), http://www.cs.washington.edu/research/xmldatasets/

  9. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp. 563–576 (2009)

    Google Scholar 

  10. Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In: EDBT, pp. 744–755 (2009)

    Google Scholar 

  11. Haveliwala, T.H., Gionis, A., Indyk, P.: Scalable techniques for clustering the web. In: WebDB, vol. 129, p. 134 (2000)

    Google Scholar 

  12. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. JACM 42(6), 1145 (1995)

    Article  MathSciNet  Google Scholar 

  13. Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  14. Sigmod Record (2009), http://www.sigmod.org/publications/sigmod-record/xml-edition

  15. Xmark (2009), http://www.xml-benchmark.org/

  16. Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: WebDB, pp. 61–66 (2002)

    Google Scholar 

  17. Lian, W., Cheung, D.W., Mamoulis, N., Yiu, S.M.: An efficient and scalable algorithm for clustering XML documents by structure. In: TKDE, pp. 82–96 (2004)

    Google Scholar 

  18. Rafiei, D., Moise, D.L., Sun, D.: Finding Syntactic Similarities Between XML Documents. In: ICDESA, pp. 512–516 (2006)

    Google Scholar 

  19. Augsten, N., Böhlen, M., Gamper, J.: The pq-Gram Distance between Ordered Labeled Trees. TODS 35(1), 1–36 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yuan, P., Wang, X., Sha, C., Gao, M., Zhou, A. (2010). GRAMS3: An Efficient Framework for XML Structural Similarity Search. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14589-6_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14588-9

  • Online ISBN: 978-3-642-14589-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics