skip to main content
10.1145/2396761.2398438acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Robust distributed indexing for locality-skewed workloads

Published:29 October 2012Publication History

ABSTRACT

Multidimensional indexing is crucial for enabling a fast search over large-scale data. Owing to the unprecedented scale of data, extending such indexing technology has recently gained attention in distributed environments. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Location-based queries are typically skewed to disaster areas during certain periods of time, e.g., during Hurricane Irene, search traffic increased by more than 2000%. To alleviate such hotspots, we propose workload-balancing as an optimization goal. A cost model analytically supporting the need for load balancing is first developed, then a distributed index that evenly distributes the workload is presented. Our empirical study suggests that hotspots degrading search performance can be effectively alleviated. Specifically, when deployed to Amazon EC2, our proposed scheme showed maximum speed-up of 127.7%. Even in hostile settings where workload is not at all correlated with the search criteria, the proposed scheme's performance is comparable to existing approaches optimized for such settings.

References

  1. Amazon Elastic Compute Cloud. Amazon Web Services. {online} http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  2. Chomp charts - monthly app statistics. Chomp, 2011. {online} http://chomp.com/etc/chomp-charts/.Google ScholarGoogle Scholar
  3. M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed b-tree. Proc. VLDB Endow., 1:598--609, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. SIGMOD Rec., 19(2):322--331, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Beckmann and B. Seeger. A revised R*-tree in comparison with related index structures. In Proc. of the 2009 ACM SIGMOD International Conference on Management of Data, pages 799--812, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18:509--517, September 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Cai, S. Zhou, W. Qian, L. Xu, K. Tan, and A. Zhou. C2: a new overlay network based on can and chord. Int. J. High Perform. Comput. Netw., 3:248--261, December 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Chen, H. T. Vo, S. Wu, B. C. Ooi, and M. T. Ozsu. A framework for supporting dbms-like indexes in the cloud. PVLDB, 4(11):702--713, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. In Proc. of 21st ACM SIGOPS symposium on Operating systems principles, pages 205--220. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Devine. Design and implementation of ddh: A distributed dynamic hashing algorithm. In Proc. of the 4th International Conference on Foundations of Data Organization and Algorithms, pages 101--114. Springer-Verlag, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. du Mouza, W. Litwin, and P. Rigaux. SD-Rtree: A scalable distributed rtree. In Proc. of the 23rd International Conference on Data Engineering, pages 296--305. IEEE Computer Society, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  12. V. Gaede and O. Günther. Multidimensional access methods. ACM Comput. Surv., 30(2):170--231, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Guttman. R-trees: a dynamic index structure for spatial searching. SIGMOD Rec., 14(2):47--57, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: a balanced tree structure for peer-to-peer networks. In Proc. of the 31st international conference on Very large data bases, pages 661--672. VLDB Endowment, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In Proc. of the twenty-ninth annual ACM symposium on Theory of computing, pages 654--663. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. S. Karlsson. Information organization and databases. chapter HQT*: a scalable distributed data structure for high-performance spatial accesses, pages 295--312. Kluwer Academic Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. T. Leutenegger, J. M. Edgington, and M. A. Lopez. STR: A simple and efficient algorithm for r-tree packing. In Proc. of the 13th International Conference on Data Engineering, page 497. IEEE Computer Society, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Litwin and M.-A. Neimat. k-rp*s: a scalable distributed data structure for high-performance multi-attribute access. In Proc. of the 4th international conference on on Parallel and distributed information systems, pages 120--131. IEEE Computer Society, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Lupu, B. C. Ooi, and Y. C. Tay. Paths to stardom: calibrating the potential of a peer-based data management system. In Proc. of the 2008 ACM SIGMOD international conference on Management of data, pages 265--278. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Morton. A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Co., 1966.Google ScholarGoogle Scholar
  21. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. SIGCOMM Comput. Commun. Rev., 31:161--172, August 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. SIGMOD Rec., 24(2):71--79, May 1995. Google ScholarGoogle ScholarCross RefCross Ref
  23. A. I. T. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proc. of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, pages 329--350. Springer-Verlag, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proc. of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages 149--160. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Wang, S. Wu, H. Gao, J. Li, and B. C. Ooi. Indexing multi-dimensional data in a cloud system. In Proc. of the 2010 ACM SIGMOD International Conference on Management of Data, pages 591--602. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Wu, D. Jiang, B. C. Ooi, and K.-L. Wu. Efficient b-tree based indexing for cloud data processing. Proc. VLDB Endow., 3:1207--1218, September 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Robust distributed indexing for locality-skewed workloads

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
              October 2012
              2840 pages
              ISBN:9781450311564
              DOI:10.1145/2396761

              Copyright © 2012 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 29 October 2012

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,861of8,427submissions,22%

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader