ABSTRACT
Zipfian distribution is used extensively to generate workloads to test, tune, and benchmark data stores. This paper presents a decentralized implementation of this technique, named D-Zipfian, using N parallel generators to issue requests. A request is a reference to a data item from a fixed population of data items. The challenge is for each generator to reference a disjoint set of data items. Moreover, they should finish at approximately the same time by performing work proportional to their processing capability. Intuitively, D-Zipfian assigns a total probability of 1/N to each of the N generators and requires each generator to reference data items with a scaled probability. In the case of heterogeneous generators, the total probability of each generator is proportional to its processing capability. We demonstrate the effectiveness of D-Zipfian using empirical measurements of the chi-square statistic.
- C. Aniszczyk. Caching with Twemcache, http://engineering.twitter.com/2012/07/caching-with-twemcache.html.Google Scholar
- Anon. A Measure of Transaction Processing Power. Datamation, April 1985. Google ScholarDigital Library
- S. Barahmand and S. Ghandeharizadeh. BG: A Benchmark to Evaluate Interactive Social Networking Actions. CoRR, Proceedings of 2013 CIDR, abs/0913.1780, January 2013.Google Scholar
- R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Rec., 39:12--27, May 2011. Google ScholarDigital Library
- A. Dan, D. Sitaram, and P. Shahabuddin. Scheduling Policies for an On-Demand Video Server with Batching. In 2nd ACM Multimedia Conference, October 1994. Google ScholarDigital Library
- R. Fan and N. Lynch. Gradient Clock Synchronization. In Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, pages 320--327, 2004. Google ScholarDigital Library
- S. Ghandeharizadeh and D. J. DeWitt. Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines. In 16th International Conference on Very Large Data Bases, pages 481--492, 1990. Google ScholarDigital Library
- S. Ghandeharizadeh, J. Yap, and S. Barahmand. COSAR-CQN: An Application Transparent Approach to Cache Consistency. In Twenty First International Conference On Software Engineering and Data Engineering, Los Angeles, CA, Best Paper Award, 2012.Google Scholar
- K. Iwanicki, M. van Steen, and S. Voulgaris. Gossip-based Clock Synchronization for Large Decentralized Systems. In Proceedings of the Second IEEE international conference on Self-Managed Networks, Systems, and Services, pages 28--42, 2006. Google ScholarDigital Library
- L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM, 21(7):558--565, Jul 1978. Google ScholarDigital Library
- D. L. Mills. On the Accuracy and Stablility of Clocks Synchronized by the Network Time Protocol in the Internet System. SIGCOMM Comput. Commun. Rev., 20(1), December 1989. Google ScholarDigital Library
- S. Patil, M. Polte, K. Ren, W. Tantisiriroj, L. Xiao, J. López, G. Gibson, A. Fuchs, and B. Rinaldi. YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores. In Cloud Computing, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- D. Patterson. For Better or Worse, Benchmarks Shape a Field. Communications of the ACM, 55, July 2012. Google ScholarDigital Library
- R. R and R. Greenstreet. Toward Higher Precision. Commun. ACM, 55(10):38--47, October 2012. Google ScholarDigital Library
- S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. A Scalable Content-Addressable Network. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pages 161--172, Aug. 2001. Google ScholarDigital Library
- P. Saab. Scaling memcached at Facebook, https://www.facebook.com/note.php?note_id=39391378919.Google Scholar
- M. Seltzer, D. Krinsky, K. Smith, and X. Zhang. The Case for Application Specific Benchmarking. In HotOS, 1999. Google ScholarDigital Library
- I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. In ACM SIGCOMM, pages 149--160, San Diego, California, Aug. 2001. Google ScholarDigital Library
- M. Stonebraker. New Opportunities for New SQL. Communications of the ACM, BLOG@ACM, 55, November 2012. Google ScholarDigital Library
- G. K. Zipf. Relative Frequency as a Determinant of Phonetic Change. Harvard Studies in Classified Philiology, Volume XL, 1929.Google ScholarCross Ref
Index Terms
- D-Zipfian: a decentralized implementation of Zipfian
Recommendations
The exact rank-frequency function and size-frequency function of N-grams and N-word phrases with applications
N-grams are generalized words consisting of N consecutive symbols (letters), as they are used in a text. N-word phrases are general concepts consisting of N consecutive words, also as used in a text. Given the rank-frequency function of single letters (...
Classification Using the Zipfian Kernel
We propose to use the Zipfian distribution as a kernel for the design of a nonparametric classifier in contrast to the Gaussian distribution used in most kernel methods. We show that the Zipfian distribution takes into account multifractal nature of ...
Fitting truncated geometric distributions in large scale real world networks
Degree distribution of nodes, especially a power-law degree distribution, has been regarded as one of the most significant structural characteristics of social and information networks. However it is observed here that for many large scale real world ...
Comments