Skip to main content

Big Data Normalization for Massively Parallel Processing Databases

  • Conference paper
  • First Online:
Advances in Conceptual Modeling (ER 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9382))

Included in the following conference series:

Abstract

High performance querying and ad-hoc querying are commonly viewed as mutually exclusive goals in massively parallel processing databases. In the one extreme, a database can be set up to provide the results of a single known query so that the use of available of resources are maximized and response time minimized, but at the cost of all other queries being suboptimally executed. In the other extreme, when no query is known in advance, the database must provide the information without such optimization, normally resulting in inefficient execution of all queries. This paper introduces a novel technique, highly normalized Big Data using Anchor modeling, that provides a very efficient way to store information and utilize resources, thereby providing ad-hoc querying with high performance for the first time in massively parallel processing databases. A case study of how this approach is used for a Data Warehouse at Avito over two years time, with estimates for and results of real data experiments carried out in HP Vertica, an MPP RDBMS, are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(N=\langle pessimistic RAM estimation \rangle /\langle available RAM \rangle \), rounded up.

References

  1. Banerjee, A., Ghosh, J.: Clickstream clustering using weighted longest common subsequences. In: Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining

    Google Scholar 

  2. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)

    Article  MathSciNet  Google Scholar 

  3. Date, C.E., Darwen, H., Lorentzos, N.A.: Temporal Data and the Relational Model. Elsevier Science, San Francisco (2003)

    Google Scholar 

  4. Hultgren, H.: Modeling the Agile Data Warehouse with Data Vault, vol. 1. Brighton Hamilton, Brighton (2012)

    Google Scholar 

  5. Kalavri, V., Vlassov, V.: MapReduce: Limitations, Optimizations and Open Issues, TrustCom/ISPA/IUCC, pp. 1031–1038. IEEE (2013)

    Google Scholar 

  6. Lamb, A., Fuller, M., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)

    Article  Google Scholar 

  7. Rönnbäck, L., Regardt, O., Bergholtz, M., Johannesson, P., Wohed, P.: Anchor modeling - agile information modeling in evolving data environments. Data Knowl. Eng. 69(12), 1229–1253 (2010)

    Article  Google Scholar 

  8. Shrinivas, L., Bodagala, S., et al.: Materialization strategies in the vertica analytic database: lessons learned. In: Christian, S.J., Jermaine, C.M., Zhou, X. (eds.) ICDE. IEEE Computer Society, pp. 1196–1207 (2013)

    Google Scholar 

  9. Wang, G., Konolige, T., et al.: You are how you click: clickstream analysis for sybil detection, pp. 241–256. USENIX Security, August 2013

    Google Scholar 

  10. Russias Avito Becomes Worlds 3rd Biggest Classifieds Site After \({\$}\)570M Deal With Naspers. http://techcrunch.com/2013/03/11/russias-avito-becomes-worlds-3rd-biggest-classifieds-site-after-naspers-deal/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars Rönnbäck .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Golov, N., Rönnbäck, L. (2015). Big Data Normalization for Massively Parallel Processing Databases. In: Jeusfeld, M., Karlapalem, K. (eds) Advances in Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9382. Springer, Cham. https://doi.org/10.1007/978-3-319-25747-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25747-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25746-4

  • Online ISBN: 978-3-319-25747-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics