ABSTRACT
The eCommerce world is facing increasingly huge data volumes and bigger user community. This paper presents an architecture to enable highly performant and highly scalable queries for large community of external customers. The architecture explores the unique pattern of external customer activities: in a big data store hosting a big community of large number of users, in the range of tens or hundreds of millions, each user's data is a fraction of the whole but the community as a whole demands extremely high volume of concurrent analytical queries with sub-second response. In the system, a key-value store is utilized to maximize read concurrency, a custom compression algorithm is developed to minimize data transfer, and a custom query engine is developed to provide aggregation on the fly. Scalability and other potential applications are discussed in the end.
- "Key-value database", Internet: https://en.wikipedia.org/wiki/Key-value_database, {Feb. 13, 2018}.Google Scholar
- "DB-Engines Ranking", Internet: http://db-engines.com/en/ranking, {Feb. 13, 2018}.Google Scholar
- "How fast is Redis?", Internet: https://redis.io/topics/benchmarks, {Feb. 13, 2018}.Google Scholar
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels, Dynamo: Amazon's Highly Available Key-value Store, Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, October 14-17, 2007, Stevenson, Washington, USA Google ScholarDigital Library
- "Swift's documentation", Internet: https://docs.openstack.org/swift/latest, {Feb. 13, 2018}.Google Scholar
- "Data compression" ", Internet: https://en.wikipedia.org/wiki/Data_compression, {Feb. 13, 2018}.Google Scholar
- Ye, et al., "Transforming character delimited values", U.S. Patent 9,619,152, issued April 11, 2017.Google Scholar
Index Terms
- Distributed Data Aggregation at Scale for Large Community of Users
Recommendations
Revisiting aggregation techniques for big data
DOLAP '13: Proceedings of the sixteenth international workshop on Data warehousing and OLAPIn this talk we first present an introduction to AsterixDB [1], a parallel, semistructured platform to ingest, store, index, query, analyze, and publish "big data" (http://asterixdb.ics.uci.edu) and the various challenges we addressed while building it. ...
Simba: spatial in-memory big data analysis
SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsWe present the Simba (<u>S</u>patial <u>I</u>n-Memory <u>B</u>ig data <u>A</u>nalytics) system, which offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba natively extends the Spark SQL engine to ...
A flexible aggregation framework on large-scale heterogeneous information networks
OLAP On-line Analytical Processing can provide users with aggregate results from different perspectives and granularities. With the advent of heterogeneous information networks that consist of multi-type, interconnected nodes, such as bibliographic ...
Comments