Skip to main content
Log in

Abstract

Contemporary long-term storage devices feature powerful embedded processors and sizeable memory buffers. Active Storage Devices (ASD) is the hard disk technology that makes use of these significant resources to not only manage the disk operation but also to execute custom application code on large amounts of data. While prior research has shown that ASDs perform exceedingly well with filter-type algorithms, the evaluation of binary-relational operators has been limited. In this paper, we analyze and evaluate inter-operator parallelism of GRACE-based join algorithms that function atop ASDs. We derive accurate cost expressions for existing algorithms and expose performance bottlenecks; upon these findings we propose Active Hash Join, a new algorithm that exploits all system resources. Through experimentation, we confirm that existing algorithms are best suited for systems with either small or large numbers of ASDs. However, we find that the “adaptive” nature of Active Hash Join yields enhanced parallelism in all cases, especially when the aggregate ASD resources are comparable to the main CPU and main memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. A. Acharya, M. Uysal, and J. Saltz, “Active disks: Programming model, algorithms and evaluation,” in Procs. of the 8th Int. Conf. on ASPLOS, 1998, pp. 81–91.

  2. H. Boral and D.J. DeWitt, “Database machines: An idea whose time has passed? A critique of the future of database machines,” in Procs. of the 3rd International Workshop on Database Machines, 1983, pp. 166–187.

  3. G. Chen, M.T. Kandemir, and A. Nadgir, “Compiler-based code partitioning for intelligent embedded disk processing,” in Languages and Compilers for Parallel Computing (LCPC), 2003, pp. 451–465.

  4. P. Chen, E. Lee, G. Gibson, R. Katz, and D. Patterson, “RAID: High-performance, reliable secondary storage,” ACM Computing Surveys, vol. 26, no. 2, pp. 145–186, 1994.

  5. D.J. DeWitt, S. Ghandeharizadeh, D.A. Schneider, A. Bricker, H.-I. Hsiao, and R. Rasmussen, “The gamma database machine project,” IEEE TKDE, vol. 2, no. 1, pp. 44–62, 1990.

  6. A. Fournier and D. Fussell, “On the power of the frame buffer,” ACM Transactions on Graphics, vol. 7, no. 2, pp. 103–128, 1988.

  7. G. Ganger, B. Worthington, and Y. Patt, “The disksim simulation environment version 1.0 reference manual,” Technical Report CSE-TR-358-98, Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI Feb 1998.

  8. G.A. Gibson, D.F. Nagle, K. Amiri, J. Butler, F.W. Chang, H. Gobioff, C. Hardin, E. Riedel, D. Rochberg, and J. Zelenka, “A cost-effective, high-bandwidth storage architecture,” in ASPLOS-VIII: Proc. of the 8th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 1998, pp. 92–103.

  9. G. Graefe, “Query evaluation techniques for large databases,” ACM Computing Surveys, vol. 25, no. 2, pp. 73–170, 1993.

  10. J. Gray, “Put everything in the storage device,” Talk at the NASD Workshop on Storage Embedded Computing, 1998.

  11. K. Keeton, D.A. Patterson, and J.M. Hellerstein, “A case for intelligent disks (IDISKs),” SIGMOD Record, vol. 27, no. 3, pp. 42–52, 1998.

  12. K. Keeton, D.A. Patterson, and J.M. Hellerstein, “The intelligent disk (IDISK): A revolutionary approach to database computing infrastucture,” Unpublished White paper, 1998.

  13. M. Kitsuregawa and Y. Ogawa, “Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (SDC),” in Procs. of the 16th VLDB Int. Conf., Aug. 1990, pp. 210–221.

  14. M. Kitsuregawa, H. Tanaka, and T. Moto-oka, “Application of hash to database machine and its architecture,” New Generation Computing, vol. 1, no. 1, pp. 63–74, 1983.

  15. G. Memik, M.T. Kandemir, and A. Choudhary, “Design and evaluation of a smart disk cluster for DSS commercial workloads,” Journal of Parallel and Distributed Computing (JPDC), vol. 61, no. 11, pp. 1633–1664, 2001.

  16. Mesquite Software, “CSIM 18 Simulation Engine,” http://www.mesquite.com/, 2006.

  17. E. Riedel, C. Faloutsos, and D. Nagle, “Active disk architecture for databases,” Technical Report CMU-CS-00-145, Carnegie Mellon University, April 2000.

  18. E. Riedel, G.A. Gibson, and C. Faloutsos, “Active storage for large-scale data mining and multimedia,” in Procs. of 24th VLDB Int. Conf., Aug. 1998, pp. 62–73.

  19. C. Ruemmler and J. Wilkes, “An introduction to disk drive modeling,” IEEE Computer, vol. 27, no. 3, pp. 17–28, 1994.

  20. D.A. Schneider and D.J. DeWitt, “A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment,” in Procs. of the 1989 ACM SIGMOD Int. Conf., May/June 1989, pp. 110–121.

  21. Seagate Technologies, “Cheetah hard drives family overview,” http://www.seagate.com/products/enterprise/cheetah.html, 2005.

  22. L.D. Shapiro, “Join processing in database systems with large main memories,” ACM Transactions on Database Systems, vol. 11, no. 3, pp. 239–264, 1986.

  23. M. Uysal, J. Saltz, and A. Acharya, “Evaluation of active disks for decision support databases,” in Proceedings of the 6th IEEE International Symposium on High-Performance Computer Architecture, Toulouse, France, 2000.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Delis.

Additional information

Recommended by:

Ahmed Elmagarmid

Work partially supported by the University of Athens Research Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stoumpos, V., Delis, A. GRACE-based joins on active storage devices. Distrib Parallel Databases 20, 199–224 (2006). https://doi.org/10.1007/s10619-006-0238-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-006-0238-5

Keywords

Navigation