Skip to main content

Generation of Synthetic XML for Evaluation of Hybrid XML Systems

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Abstract

Hybrid XML storage offers a large number of alternative shredding choices. In order to automatically determine optimal shredding strategies it is crucial to have an insight into how the structure of a XML data set affects the performance. Since the structure can take many forms and the number of possible mappings is huge it is important to gain insights on the relation between structure and performance for formats that are actually used. By taking real-world data sets and modify the structure in steps you can see how the performance and other measurable properties change. We describe how a data generator can be used to produce a synthetic data set based on an existing data set, by using four different models. We compare the performance on the original data set with the performance on the different synthetic models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Strömbäck, L., Asberg, M., Hall, D.: HShreX - A Tool for Design and Evaluation of Hybrid XML Storage. In: Int. Work. on Database and Expert Systems Applications (DEXA), pp. 417–421 (2009)

    Google Scholar 

  2. Bitton, D., DeWitt, D.J., Turbyfil, C.: Benchmarking Database Systems: A Systematic Approach. In: Proc. of the 1983 Very Large Database Conf. VLDB (1983)

    Google Scholar 

  3. Anon, et al.: A Measure of Transaction Processing Power. In: Stonebraker, M. (ed.) Readings in Database Systems. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  4. Carey, M.J., DeWitt, D.J., Jeffrey, F.N.: The OO7 Benchmark. In: Proc. of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 12–21 (1993)

    Google Scholar 

  5. Böhme, T., Rahm, E.: XMach-1: A Benchmark for XML Data Management. In: Proc. of German database conference BTW 2001, Oldenburg. Springer, Berlin (2001)

    Google Scholar 

  6. Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical report, CWI, Amsterdam, The Netherlands (2001)

    Google Scholar 

  7. Nambiar, U., Lacroix, Z., Bressan, S., Li Lee, M., Li, Y.: XML Benchmarks Put to the Test. In: IIWAS (2001)

    Google Scholar 

  8. The UniProt Consortium The Universal Protein Resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008)

    Google Scholar 

  9. Hucka, M., Finney, A., Sauro, H.M., et al.: The Systems Biology Markup Language (SBML): A Medium for Representation and Exchange of Biochemical Network Models. Bioinformatics 19(4), 524–531 (2003)

    Article  Google Scholar 

  10. DBLP XML Records, http://acm.org/sigmoid/dblp/dp/index.html

  11. Haklay, M., Weber, P.: OpenStreetMap: User-generated Street Maps. IEEE Pervasive Computing 7(4), 12–18 (2008)

    Article  Google Scholar 

  12. Legislative Documents in XML at the United States House of Representatives, http://xml.house.gov/

  13. Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proc. of the 5th Int. Work. on the Web and Databases (2002)

    Google Scholar 

  14. Freire, J., Haritsa, J., Ramanath, M., Roy, P., Simeon, J.: StatiX: Making XML Count. In: Proc. of ACM SIGMOD Conference, pp. 181–191 (2002)

    Google Scholar 

  15. Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast Detection of XML Structural Similarities. IEEE Trans. Know Data Eng. 7(2), 160–175 (2005)

    Google Scholar 

  16. Polyzotis, N., Garofalakis, M.N.: XCLUSTER Synopses for Structured XML Content. In: Proc. of the 22nd Int. Conf. on Data Engineering (2006)

    Google Scholar 

  17. Runapongsa, K., Patel, J.M., Jagadish, H.V., Chen, Y., Al-Khalifa, S.: The Michigan benchmark: Towards XML Query Performance Diagnostics. In: Proc. VLDB Conference, vol. 31 (2003)

    Google Scholar 

  18. Cohen, S.: Count-Constraints for Generating XML. In: Etzion, O., Kuflik, T., Motro, A. (eds.) NGITS 2006. LNCS, vol. 4032, pp. 153–164. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Cohen, S.: Generating XML Structure Using Examples and Constraints. In: Proc. of the VLDB Endowment, pp. 490–501 (2008)

    Google Scholar 

  20. Barbosa, D., Mendelzon, A., Keenleyside, J., Lyons, K.: ToXgene: A Template-based Data Generator for XML. In: Proc. of the 2002 ACM SIGMOD int. conf. on Management of data (2002)

    Google Scholar 

  21. Geng, K., Dobbie, G.: An XML Document Generator for Semantic Query Optimization Experimentation. Int. J. of Web Information Systems 3(1), 26–40 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hall, D., Strömbäck, L. (2010). Generation of Synthetic XML for Evaluation of Hybrid XML Systems. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14589-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14588-9

  • Online ISBN: 978-3-642-14589-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics