skip to main content
10.1145/1807167.1807186acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PR-join: a non-blocking join achieving higher early result rate with statistical guarantees

Published:06 June 2010Publication History

ABSTRACT

Online aggregation is a promising solution to achieving fast early responses for interactive ad-hoc queries that compute aggregates on a large amount of data. Essential to the success of online aggregation is a good non-blocking join algorithm that enables both (i) high early result rates with statistical guarantees and (ii) fast end-to-end query times. We analyze existing non-blocking join algorithms and find that they all provide sub-optimal early result rates, and those with fast end-to-end times achieve them only by further sacrificing their early result rates.

We propose a new non-blocking join algorithm, Partitioned expanding Ripple Join (PR-Join), which achieves considerably higher early result rates than previous non-blocking joins, while also delivering fast end-to-end query times. PR-Join performs separate, ripple-like join operations on individual hash partitions, where the width of a ripple expands multiplicatively over time. This contrasts with the non-partitioned, fixed-width ripples of Block Ripple Join. Assuming, as in previous non-blocking join studies, that the input relations are in random order, PR-Join ensures representative early results that are amenable to statistical guarantees. We show both analytically and with real-machine experiments that PR-Join achieves over an order of magnitude higher early result rates than previous non-blocking joins. We also discuss the benefits of using a flash-based SSD for temporary storage, showing that PR-Join can then achieve close to optimal end-to-end performance. Finally, we consider the joining of finite data streams that arrive over time, and find that PR-Join achieves similar or higher result rates than RPJ, the state-of-the-art algorithm specialized for that domain.

References

  1. D. Agrawal, D. Ganesan, R. Sitaraman, Y. Diao, and S. Singh. Lazy-adaptive tree: An optimized index structure for flash devices. PVLDB, 2(1):361--372, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Canim, B. Bhattacharjee, G. A. Mihaila, C. A. Lang, and K. Ross. An object placement advisor for db2 using solid state storage. PVLDB, 2(2):1318--1329, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chen. Flashlogging: exploiting flash devices for synchronous logging performance. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In SIGMOD, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J.-P. Dittrich, B. Seeger, D. S. Taylor, and P. Widmayer. Progressive merge join: A generic and non-blocking sort-based join algorithm. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Dobra, C. Jermaine, F. Rusu, and F. Xu. Turbo-charging estimate convergence in dbo. PVLDB, 2(1):419--430, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gartner, Inc. Market share: Business intelligence platform software, worldwide, 2007. http://www.gartner.com/it/page.jsp?id=700410, 2008.Google ScholarGoogle Scholar
  8. G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Graefe. The five-minute rule twenty years later. In DaMoN Workshop, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. M. Hellerstein, R. Avnur, and V. Raman. Informix under control: Online query processing. Data Min. Knowl. Discov., 4(4), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. IDC. The diverse and exploding digital universe. http://www.emc.com/collateral/analyst-reports/diverseexploding-digital-universe.pdf, 2008.Google ScholarGoogle Scholar
  14. C. Jermaine, S. Arumugam, A. Pol, and A. Dobra. Scalable approximate query processing with the dbo engine. ACM Trans. Database Syst., 33(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Jermaine, A. Dobra, S. Arumugam, S. Joshi, and A. Pol. The sort-merge-shrink join. ACM Trans. Database Syst., 31(4):1382--1416, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of hash to data base machine and its architecture. New Generation Comput., 1(1):63--74, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Koltsidas and S. Viglas. Flashing up the storage layer. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Lawrence. Early hash join: A configurable algorithm for the efficient and early production of join results. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S.-W. Lee, B. Moon, C. Park, J.-M. Kim, and S.-W. Kim. A case for flash memory ssd in enterprise database applications. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Luo, C. J. Ellmann, P. J. Haas, and J. F. Naughton. A scalable hash ripple join algorithm. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. F. Mokbel, M. Lu, and W. G. Aref. Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In ICDE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Nath and P. B. Gibbons. Online maintenance of very large random samples on flash storage. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Nath and A. Kansal. FlashDB: dynamic self-tuning database for NAND flash. In ACM/IEEE IPSN, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Tao, M. L. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis. RPJ: Producing fast join results on streams through rate-based optimization. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Tsirogiannis, S. Harizopoulos, M. A. Shah, J. L. Wiener, and G. Graefe. Query processing techniques for solid state drives. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Urhan and M. J. Franklin. Xjoin: A reactively-scheduled pipelined join operator. Data Eng. Bull., 23(2):27--33, 2000.Google ScholarGoogle Scholar

Index Terms

  1. PR-join: a non-blocking join achieving higher early result rate with statistical guarantees

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
            June 2010
            1286 pages
            ISBN:9781450300322
            DOI:10.1145/1807167

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader