Skip to main content

Get a Sample for a Discount

Sampling-Based XML Data Pricing

  • Conference paper
Database and Expert Systems Applications (DEXA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8644))

Included in the following conference series:

Abstract

While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We show that the uniform random sampling of a rooted subtree with prescribed weight is unfortunately intractable. However, we are able to identify several practical cases that are tractable. The first case is uniform random sampling of a rooted subtree with prescribed size; the second case restricts to binary weights. For both these practical cases we present polynomial-time algorithms and explain how they can be integrated into an iterative exploratory sampling approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: PODS, pp. 227–236 (2009)

    Google Scholar 

  2. Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling. Computer Networks 33(1-6) (2000)

    Google Scholar 

  3. Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: ICDM (2008)

    Google Scholar 

  4. Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: Query-based data pricing. In: PODS (2012)

    Google Scholar 

  5. Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: QueryMarket demonstration: Pricing for online data markets. PVLDB 5(12) (2012)

    Google Scholar 

  6. Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: Toward practical query pricing with QueryMarket. In: SIGMOD (2013)

    Google Scholar 

  7. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)

    Google Scholar 

  8. Li, C., Li, D.Y., Miklau, G., Suciu, D.: A theory of pricing private data. In: ICDT (2013)

    Google Scholar 

  9. Li, C., Miklau, G.: Pricing aggregate queries in a data marketplace. In: WebDB (2012)

    Google Scholar 

  10. Lin, B.-R., Kifer, D.: On arbitrage-free pricing for general data queries. PVLDB 7(9), 757–768 (2014)

    Google Scholar 

  11. Lu, X., Bressan, S.: Sampling connected induced subgraphs uniformly at random. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 195–212. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation. In: EDBT (2009)

    Google Scholar 

  13. Maiya, A.S., Berger-Wolf, T.Y.: Sampling community structure. In: WWW (2010)

    Google Scholar 

  14. Muschalle, A., Stahl, F., Löser, A., Vossen, G.: Pricing approaches for data markets. In: Castellanos, M., Dayal, U., Rundensteiner, E.A. (eds.) BIRTE 2012. LNBIP, vol. 154, pp. 129–144. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4) (2002)

    Google Scholar 

  16. Ribeiro, B.F., Towsley, D.F.: Estimating and sampling graphs with multidimensional random walks. In: Internet Measurement Conference (2010)

    Google Scholar 

  17. Tang, R., Wu, H., Bao, Z., Bressan, S., Valduriez, P.: The price is right. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 380–394. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Wang, R.Y., Strong, D.M.: Beyond accuracy: What data quality means to data consumers. J. of Management Information Systems 12(4) (1996)

    Google Scholar 

  19. Wang, W., Jiang, H., Lu, H., Yu, J.X.: Containment join size estimation: Models and methods. In: SIGMOD (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tang, R., Amarilli, A., Senellart, P., Bressan, S. (2014). Get a Sample for a Discount. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8644. Springer, Cham. https://doi.org/10.1007/978-3-319-10073-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10073-9_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10072-2

  • Online ISBN: 978-3-319-10073-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics