Abstract
We further contribute to numerous efforts to provide tools for generating sample database instances, complement a recent approach to achieve a uniform probability distribution over all samples of a specified size, and add new insight to the impact of schema normalization for a relational schema with one functional dependency and size restrictions on the attribute domains. These achievements result from studying the problem how to probabilistically generate a relation instance that is a representative of a class of equivalent or similar instances, respectively. An instance is equivalent to another instance if there are bijective domain mappings under which the former one is mapped on the other one. An instance is similar to another instance if they share the same combinatorial counting properties that can be understood as a solution to a layered system of equalities and lessthan-relationships among (non-negative) integer variables and some (non-negative) integer constants. For a normalized schema, the two notions turn out to coincide. Based on this result, we conceptually design and formally verify a probabilistic generation procedure that provides a random representative of a randomly selected class, i.e., each class is represented with the same probability or, alternatively, with the probability reflecting the number of its members. We also discuss the performance of a prototype implementation and further optimizations. For a non-normalized schema, however, the coincidence of the respective notions does not hold. So we only present some basic features of these notions, including a relationship to set unification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
To keep the notations simple, we deliberately refrain from explicitly denoting the dependence of \( Ins \)—and related items—from the size n, as well as from the relational schema, most relevantly including the attributes and their cardinalities.
- 2.
For the sake of brevity, in the following we combine the definition of a “structure” with the definition of “satisfaction” of a structure by a relation instance; we will proceed similarly in Sect. 5.
- 3.
As in Definition 2, here and in the following we integrate the definition of a “structure” with the definition of “satisfaction”.
- 4.
However, since such a list might contain duplicates, a usual set notation would not be appropriate; instead we could use multi-sets.
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)
Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: Sellis, T.K., Miller, R.J., Kementsietsidis, A., Velegrakis, Y. (eds.) SIGMOD 2011, pp. 685–696. ACM (2011)
Arratia, R., DeSalvo, S.: Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example. Comb. Probab. Comput. 25(3), 324–351 (2016)
Berens, M., Biskup, J., Preuß, M.: Uniform probabilistic generation of relation instances satisfying a functional dependency. Inform. Syst. 103, 101848 (2021)
Biskup, J., Preuß, M.: Can we probabilistically generate uniformly distributed relation instances efficiently? In: Darmont, J., Novikov, B., Wrembel, R. (eds.) ADBIS 2020. LNCS, vol. 12245, pp. 75–89. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-54832-2_8
Dovier, A., Pontelli, E., Rossi, G.: Set unification. Theory Pract. Log. Program. 6(6), 645–701 (2006)
Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)
Fristedt, B.: The structure of random partitions of large integers. Trans. Am. Math. Soc. 337(2), 703–735 (1993)
Gupta, U.I., Lee, D.T., Wong, C.K.: Ranking and unranking of B-trees. J. Algorithms 4(1), 51–60 (1983)
Harris, C.R., Millman, K.J., van der Walt, S.J., et al.: Array programming with numpy. Nature 585(7825), 357–362 (2020)
Nijenhuis, A., Wilf, H.S.: A method and two algorithms on the theory of partitions. J. Comb. Theory Ser. A 18(2), 219–222 (1975)
Stanley, R.P.: Enumerative Combinatorics, vol. 1, 2nd edn. Cambridge University Press, Cambridge (2012)
Stojmenovic, I., Zoghbi, A.: Fast algorithms for generating integer partitions. Int. J. Comput. Math. 70(2), 319–332 (1998)
Transaction Processing Performance Council, TPC: TCP Benchmarks & Benchmark Results. http://www.tpc.org
Acknowledgements
We sincerely thank the anonymous reviewers for their careful evaluations and constructive remarks.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Berens, M., Biskup, J. (2022). On Sampling Representatives of Relational Schemas with a Functional Dependency. In: Varzinczak, I. (eds) Foundations of Information and Knowledge Systems. FoIKS 2022. Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-031-11321-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-11321-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11320-8
Online ISBN: 978-3-031-11321-5
eBook Packages: Computer ScienceComputer Science (R0)