Composite Key Generation on a Shared-Nothing Architecture

Hoffmann, Marie; Alexandrov, Alexander; Andritsos, Periklis; Soto, Juan; Markl, Volker

doi:10.1007/978-3-319-15350-6_12

Marie Hoffmann¹⁵,
Alexander Alexandrov¹⁵,
Periklis Andritsos¹⁶,
Juan Soto¹⁵ &
…
Volker Markl¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8904))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

1364 Accesses

Abstract

Generating synthetic data sets is integral to benchmarking, debugging, and simulating future scenarios. As data sets become larger, real data characteristics thereby become necessary for the success of new algorithms. Recently introduced software systems allow for synthetic data generation that is truly parallel. These systems use fast pseudorandom number generators and can handle complex schemas and uniqueness constraints on single attributes. Uniqueness is essential for forming keys, which identify single entries in a database instance. The uniqueness property is usually guaranteed by sampling from a uniform distribution and adjusting the sample size to the output size of the table such that there are no collisions. However, when it comes to real composite keys, where only the combination of the key attribute has the uniqueness property, a different strategy needs to be employed. In this paper, we present a novel approach on how to generate composite keys within a parallel data generation framework. We compute a joint probability distribution that incorporates the distributions of the key attributes and use the unique sequence positions of entries to address distinct values in the key domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.tpc.org/tpch/.
2.
denoted by \(\oplus \).
3.
Under the condition that the pad is used only once and not known to the adversary.

References

Alexandrov, A., Tzoumas, K., Markl, V.: Myriad: scalable and expressive data generation. Proc. VLDB Endowment 5(12), 1890–1893 (2012)
Article Google Scholar
Bruno, N., Chaudhuri, S.: Flexible database generators. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 1097–1107. VLDB Endowment (2005)
Google Scholar
Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
Article MATH Google Scholar
Eichenauer-Herrmann, J.: Explicit inversive congruential pseudorandom numbers: the compound approach. Computing 51(2), 175–182 (1993)
Article MATH MathSciNet Google Scholar
Eichenauer-Herrmann, J.: Statistical independence of a new class of inversive congruential pseudorandom numbers. Math. Comput. 60(201), 375–384 (1993)
Article MATH MathSciNet Google Scholar
Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: ACM SIGMOD Record, vol. 23, pp. 243–252. ACM (1994)
Google Scholar
Hoag, J.E.: Synthetic Data Generation: Theory, Techniques and Applications. PhD thesis, University of Arkansas (2007)
Google Scholar
Hoag, J.E., Thompson, C.W.: A parallel general-purpose synthetic data generator. ACM SIGMOD Rec. 36(1), 19–24 (2007)
Article Google Scholar
Marsaglia, G.: Xorshift rngs. J. Stat. Softw. 8(14), 1–6, 7 (2003)
Google Scholar
Panneton, F., L’ecuyer, P.: On the xorshift random number generators. ACM Trans. Model. Comput. Simul. 15(4), 346–361 (2005)
Article Google Scholar
Rabl, T., Poess, M.: Parallel data generation for performance analysis of large, complex RDBMS. DBTest, pp. 1–6 (2011)
Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their input that helped to improve the quality of the paper. Furthermore, the first author would like to thank Christian Lessig for his valuable assistance in editing.

Author information

Authors and Affiliations

DIMA, Technische Universität Berlin, Einsteinufer 17, 10587, Berlin, Germany
Marie Hoffmann, Alexander Alexandrov, Juan Soto & Volker Markl
Institut des Systèmes d’Information, Université de Lausanne, Bâtiment Internef, 1015, Lausanne, Switzerland
Periklis Andritsos

Authors

Marie Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Alexandrov
View author publications
You can also search for this author in PubMed Google Scholar
Periklis Andritsos
View author publications
You can also search for this author in PubMed Google Scholar
Juan Soto
View author publications
You can also search for this author in PubMed Google Scholar
Volker Markl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marie Hoffmann .

Editor information

Editors and Affiliations

Cisco Systems, Inc., San Jose, California, USA
Raghunath Nambiar
Oracle Corporation, Redwood Shores, California, USA
Meikel Poess

A Composite Keys

Listing 1.1 shows four SQL statements for creating simple schemas. For the sake of simplicity the statements only declare key columns. Table Simple has one column protein which is declared as primary key and is necessarily unique, i.e. protein makes up a simple key. Table Compound has two attributes, each making up a simple key in its own right, since they are declared as unique. Tables Composite1 and Composite2 are examples of composite key declarations. Composite1 has only one key attribute which makes up a simple key. Table Composite2 has even two attributes for which uniqueness exclusively holds for their combination. Possible instances of all four relations are shown below.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoffmann, M., Alexandrov, A., Andritsos, P., Soto, J., Markl, V. (2015). Composite Key Generation on a Shared-Nothing Architecture. In: Nambiar, R., Poess, M. (eds) Performance Characterization and Benchmarking. Traditional to Big Data. TPCTC 2014. Lecture Notes in Computer Science(), vol 8904. Springer, Cham. https://doi.org/10.1007/978-3-319-15350-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-15350-6_12
Published: 05 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15349-0
Online ISBN: 978-3-319-15350-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Composite Key Generation on a Shared-Nothing Architecture

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Composite Keys

A Composite Keys

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation