Skip to main content

On Sampling Representatives of Relational Schemas with a Functional Dependency

  • Conference paper
  • First Online:
Foundations of Information and Knowledge Systems (FoIKS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS))

Abstract

We further contribute to numerous efforts to provide tools for generating sample database instances, complement a recent approach to achieve a uniform probability distribution over all samples of a specified size, and add new insight to the impact of schema normalization for a relational schema with one functional dependency and size restrictions on the attribute domains. These achievements result from studying the problem how to probabilistically generate a relation instance that is a representative of a class of equivalent or similar instances, respectively. An instance is equivalent to another instance if there are bijective domain mappings under which the former one is mapped on the other one. An instance is similar to another instance if they share the same combinatorial counting properties that can be understood as a solution to a layered system of equalities and lessthan-relationships among (non-negative) integer variables and some (non-negative) integer constants. For a normalized schema, the two notions turn out to coincide. Based on this result, we conceptually design and formally verify a probabilistic generation procedure that provides a random representative of a randomly selected class, i.e., each class is represented with the same probability or, alternatively, with the probability reflecting the number of its members. We also discuss the performance of a prototype implementation and further optimizations. For a non-normalized schema, however, the coincidence of the respective notions does not hold. So we only present some basic features of these notions, including a relationship to set unification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    To keep the notations simple, we deliberately refrain from explicitly denoting the dependence of \( Ins \)—and related items—from the size n, as well as from the relational schema, most relevantly including the attributes and their cardinalities.

  2. 2.

    For the sake of brevity, in the following we combine the definition of a “structure” with the definition of “satisfaction” of a structure by a relation instance; we will proceed similarly in Sect. 5.

  3. 3.

    As in Definition 2, here and in the following we integrate the definition of a “structure” with the definition of “satisfaction”.

  4. 4.

    However, since such a list might contain duplicates, a usual set notation would not be appropriate; instead we could use multi-sets.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  2. Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: Sellis, T.K., Miller, R.J., Kementsietsidis, A., Velegrakis, Y. (eds.) SIGMOD 2011, pp. 685–696. ACM (2011)

    Google Scholar 

  3. Arratia, R., DeSalvo, S.: Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example. Comb. Probab. Comput. 25(3), 324–351 (2016)

    Article  MathSciNet  Google Scholar 

  4. Berens, M., Biskup, J., Preuß, M.: Uniform probabilistic generation of relation instances satisfying a functional dependency. Inform. Syst. 103, 101848 (2021)

    Article  Google Scholar 

  5. Biskup, J., Preuß, M.: Can we probabilistically generate uniformly distributed relation instances efficiently? In: Darmont, J., Novikov, B., Wrembel, R. (eds.) ADBIS 2020. LNCS, vol. 12245, pp. 75–89. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-54832-2_8

    Chapter  Google Scholar 

  6. Dovier, A., Pontelli, E., Rossi, G.: Set unification. Theory Pract. Log. Program. 6(6), 645–701 (2006)

    Article  MathSciNet  Google Scholar 

  7. Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  8. Fristedt, B.: The structure of random partitions of large integers. Trans. Am. Math. Soc. 337(2), 703–735 (1993)

    Article  MathSciNet  Google Scholar 

  9. Gupta, U.I., Lee, D.T., Wong, C.K.: Ranking and unranking of B-trees. J. Algorithms 4(1), 51–60 (1983)

    Article  MathSciNet  Google Scholar 

  10. Harris, C.R., Millman, K.J., van der Walt, S.J., et al.: Array programming with numpy. Nature 585(7825), 357–362 (2020)

    Article  Google Scholar 

  11. Nijenhuis, A., Wilf, H.S.: A method and two algorithms on the theory of partitions. J. Comb. Theory Ser. A 18(2), 219–222 (1975)

    Article  MathSciNet  Google Scholar 

  12. Stanley, R.P.: Enumerative Combinatorics, vol. 1, 2nd edn. Cambridge University Press, Cambridge (2012)

    MATH  Google Scholar 

  13. Stojmenovic, I., Zoghbi, A.: Fast algorithms for generating integer partitions. Int. J. Comput. Math. 70(2), 319–332 (1998)

    Article  MathSciNet  Google Scholar 

  14. Transaction Processing Performance Council, TPC: TCP Benchmarks & Benchmark Results. http://www.tpc.org

Download references

Acknowledgements

We sincerely thank the anonymous reviewers for their careful evaluations and constructive remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joachim Biskup .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Berens, M., Biskup, J. (2022). On Sampling Representatives of Relational Schemas with a Functional Dependency. In: Varzinczak, I. (eds) Foundations of Information and Knowledge Systems. FoIKS 2022. Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-031-11321-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11321-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11320-8

  • Online ISBN: 978-3-031-11321-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics