Abstract
High performance querying and ad-hoc querying are commonly viewed as mutually exclusive goals in massively parallel processing databases. In the one extreme, a database can be set up to provide the results of a single known query so that the use of available of resources are maximized and response time minimized, but at the cost of all other queries being suboptimally executed. In the other extreme, when no query is known in advance, the database must provide the information without such optimization, normally resulting in inefficient execution of all queries. This paper introduces a novel technique, highly normalized Big Data using Anchor modeling, that provides a very efficient way to store information and utilize resources, thereby providing ad-hoc querying with high performance for the first time in massively parallel processing databases. A case study of how this approach is used for a Data Warehouse at Avito over two years time, with estimates for and results of real data experiments carried out in HP Vertica, an MPP RDBMS, are also presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(N=\langle pessimistic RAM estimation \rangle /\langle available RAM \rangle \), rounded up.
References
Banerjee, A., Ghosh, J.: Clickstream clustering using weighted longest common subsequences. In: Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)
Date, C.E., Darwen, H., Lorentzos, N.A.: Temporal Data and the Relational Model. Elsevier Science, San Francisco (2003)
Hultgren, H.: Modeling the Agile Data Warehouse with Data Vault, vol. 1. Brighton Hamilton, Brighton (2012)
Kalavri, V., Vlassov, V.: MapReduce: Limitations, Optimizations and Open Issues, TrustCom/ISPA/IUCC, pp. 1031–1038. IEEE (2013)
Lamb, A., Fuller, M., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)
Rönnbäck, L., Regardt, O., Bergholtz, M., Johannesson, P., Wohed, P.: Anchor modeling - agile information modeling in evolving data environments. Data Knowl. Eng. 69(12), 1229–1253 (2010)
Shrinivas, L., Bodagala, S., et al.: Materialization strategies in the vertica analytic database: lessons learned. In: Christian, S.J., Jermaine, C.M., Zhou, X. (eds.) ICDE. IEEE Computer Society, pp. 1196–1207 (2013)
Wang, G., Konolige, T., et al.: You are how you click: clickstream analysis for sybil detection, pp. 241–256. USENIX Security, August 2013
Russias Avito Becomes Worlds 3rd Biggest Classifieds Site After \({\$}\)570M Deal With Naspers. http://techcrunch.com/2013/03/11/russias-avito-becomes-worlds-3rd-biggest-classifieds-site-after-naspers-deal/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Golov, N., Rönnbäck, L. (2015). Big Data Normalization for Massively Parallel Processing Databases. In: Jeusfeld, M., Karlapalem, K. (eds) Advances in Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9382. Springer, Cham. https://doi.org/10.1007/978-3-319-25747-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-25747-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25746-4
Online ISBN: 978-3-319-25747-1
eBook Packages: Computer ScienceComputer Science (R0)