On Sampling Representatives of Relational Schemas with a Functional Dependency

Berens, Maximilian; Biskup, Joachim

doi:10.1007/978-3-031-11321-5_1

Maximilian Berens⁸ &
Joachim Biskup⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS))

Included in the following conference series:

International Symposium on Foundations of Information and Knowledge Systems

226 Accesses
1 Citations

Abstract

We further contribute to numerous efforts to provide tools for generating sample database instances, complement a recent approach to achieve a uniform probability distribution over all samples of a specified size, and add new insight to the impact of schema normalization for a relational schema with one functional dependency and size restrictions on the attribute domains. These achievements result from studying the problem how to probabilistically generate a relation instance that is a representative of a class of equivalent or similar instances, respectively. An instance is equivalent to another instance if there are bijective domain mappings under which the former one is mapped on the other one. An instance is similar to another instance if they share the same combinatorial counting properties that can be understood as a solution to a layered system of equalities and lessthan-relationships among (non-negative) integer variables and some (non-negative) integer constants. For a normalized schema, the two notions turn out to coincide. Based on this result, we conceptually design and formally verify a probabilistic generation procedure that provides a random representative of a randomly selected class, i.e., each class is represented with the same probability or, alternatively, with the probability reflecting the number of its members. We also discuss the performance of a prototype implementation and further optimizations. For a non-normalized schema, however, the coincidence of the respective notions does not hold. So we only present some basic features of these notions, including a relationship to set unification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
To keep the notations simple, we deliberately refrain from explicitly denoting the dependence of \( Ins \)—and related items—from the size n, as well as from the relational schema, most relevantly including the attributes and their cardinalities.
2.
For the sake of brevity, in the following we combine the definition of a “structure” with the definition of “satisfaction” of a structure by a relation instance; we will proceed similarly in Sect. 5.
3.
As in Definition 2, here and in the following we integrate the definition of a “structure” with the definition of “satisfaction”.
4.
However, since such a list might contain duplicates, a usual set notation would not be appropriate; instead we could use multi-sets.

References

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)
MATH Google Scholar
Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: Sellis, T.K., Miller, R.J., Kementsietsidis, A., Velegrakis, Y. (eds.) SIGMOD 2011, pp. 685–696. ACM (2011)
Google Scholar
Arratia, R., DeSalvo, S.: Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example. Comb. Probab. Comput. 25(3), 324–351 (2016)
Article MathSciNet Google Scholar
Berens, M., Biskup, J., Preuß, M.: Uniform probabilistic generation of relation instances satisfying a functional dependency. Inform. Syst. 103, 101848 (2021)
Article Google Scholar
Biskup, J., Preuß, M.: Can we probabilistically generate uniformly distributed relation instances efficiently? In: Darmont, J., Novikov, B., Wrembel, R. (eds.) ADBIS 2020. LNCS, vol. 12245, pp. 75–89. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-54832-2_8
Chapter Google Scholar
Dovier, A., Pontelli, E., Rossi, G.: Set unification. Theory Pract. Log. Program. 6(6), 645–701 (2006)
Article MathSciNet Google Scholar
Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Fristedt, B.: The structure of random partitions of large integers. Trans. Am. Math. Soc. 337(2), 703–735 (1993)
Article MathSciNet Google Scholar
Gupta, U.I., Lee, D.T., Wong, C.K.: Ranking and unranking of B-trees. J. Algorithms 4(1), 51–60 (1983)
Article MathSciNet Google Scholar
Harris, C.R., Millman, K.J., van der Walt, S.J., et al.: Array programming with numpy. Nature 585(7825), 357–362 (2020)
Article Google Scholar
Nijenhuis, A., Wilf, H.S.: A method and two algorithms on the theory of partitions. J. Comb. Theory Ser. A 18(2), 219–222 (1975)
Article MathSciNet Google Scholar
Stanley, R.P.: Enumerative Combinatorics, vol. 1, 2nd edn. Cambridge University Press, Cambridge (2012)
MATH Google Scholar
Stojmenovic, I., Zoghbi, A.: Fast algorithms for generating integer partitions. Int. J. Comput. Math. 70(2), 319–332 (1998)
Article MathSciNet Google Scholar
Transaction Processing Performance Council, TPC: TCP Benchmarks & Benchmark Results. http://www.tpc.org

Download references

Acknowledgements

We sincerely thank the anonymous reviewers for their careful evaluations and constructive remarks.

Author information

Authors and Affiliations

Fakultät für Informatik, Technische Universität Dortmund, 44227, Dortmund, Germany
Maximilian Berens & Joachim Biskup

Authors

Maximilian Berens
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Biskup
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joachim Biskup .

Editor information

Editors and Affiliations

Université d'Artois and CNRS, Lens, France
Ivan Varzinczak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berens, M., Biskup, J. (2022). On Sampling Representatives of Relational Schemas with a Functional Dependency. In: Varzinczak, I. (eds) Foundations of Information and Knowledge Systems. FoIKS 2022. Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-031-11321-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-11321-5_1
Published: 10 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11320-8
Online ISBN: 978-3-031-11321-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Sampling Representatives of Relational Schemas with a Functional Dependency