Skip to main content
Log in

On GPU’s viability as a middleware accelerator

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware-level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support.

We take a first step towards validating this hypothesis and we design StoreGPU, a library that accelerates a number of hashing-based middleware primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up twenty five fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Moya, V., Gonzalez, C., Roca, J., Fernandez, A., et al.: Shader performance analysis on a modern GPU architecture. In: IEEE/ACM International Symposium on Microarchitecture, MICRO-38, 2005

  2. Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., et al.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007). doi:10.1111/j.1467-8659.2007.01012.x

    Article  Google Scholar 

  3. NVIDIA CUDA Compute Unified Device Architecture: Programming Guide v2.0 (2008)

  4. Quinlan, S., Dorward, S.: Venti: a new approach to archival data storage. In: FAST, Monterey, CA, 2002

  5. Twisted Storage. http://twistedstorage.sourceforge.net/ (2008)

  6. Weatherspoon, H., Kubiatowicz, J.: Erasure coding vs. replication: a quantitative comparison. In: IPTPS, 2002

  7. Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system, In: SOSP, 2001

  8. Chun, B.-G., Dabek, F., Haeberlen, A., Sit, E., et al.: Efficient replica maintenance for distributed storage systems. In: NSDI, San Jose, CA, (2006)

  9. Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970). doi:10.1145/362686.362692

    Article  MATH  Google Scholar 

  10. Huffman, D.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952). doi:10.1109/JRPROC.1952.273898

    Article  Google Scholar 

  11. Vilayannur, M., Nath, P., Sivasubramaniam, A.: Providing tunable consistency for a parallel file store. In: USENIX Conference on File and Storage Technologies, 2005

  12. Al-Kiswany, S., Ripeanu, M., Vazhkudai, S., Gharaibeh, A.: STDCHK: a checkpoint storage system for desktop grid computing. In: ICDCS, Beijing, China, 2008

  13. Yumerefendi, A.R., Chase, J.S.: Strong accountability for network storage. In: FAST’07, 2007

  14. Cox, L.P., Noble, B.D.: Samsara: honor among thieves in peer-to-peer storage. In: ACM Symposium on Operating Systems Principles, 2003

  15. Fu, K., Kaashoek, M.F., Mazières, D.: Fast and secure distributed read-only file system. In: OSDI, 2000

  16. Kotla, R., Alvisi, L., Dahlin, M.: SafeStore: a durable and practical storage system. In: USENIX Annual Technical Conference, 2007

  17. Karger, D.R., Lehman, E., Leighton, F.T., Panigrahy, R., et al.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Symposium on Theory of Computing, 1997. ACM, New York (1997)

  18. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., et al.: Chord: a scalable peer-to-peer lookup service for Internet applications. In: SIGCOMM 2001, San Diego, USA, 2001

  19. Rowstron, A., Druschel, P.: Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, 2001

  20. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., et al.: Dynamo: Amazon’s highly available key-value store. In: SOSP07, 2007

  21. Dabek, F., Kaashoek, M.F., Karger, D., Morris, R., et al.: Wide-area cooperative storage with CFS. In: SOSP, 2001

  22. Eshghi, K., Lillibridge, M., Wilcock, L., Belrose, G., et al.: JumboStore: providing efficient incremental upload and versioning for a utility rendering service. In: FAST, 2007

  23. Jon Peddie Research Report: NVIDIA on a roll, grabs more desktop graphics market share in 4Q. http://www.jonpeddie.com/about/press/MarketWatch_Q405.shtml (2006)

  24. Jon Peddie Research Report: Overall GPU market was up an astounding 20%—desktop displaced mobile. http://www.jonpeddie.com/about/press/2007/GPU_market_Q307.shtml (2007)

  25. AMD Stream Computing SDK. Available from: http://ati.amd.com/technology/streamcomputing/ (2008)

  26. ATI Close To Metal (CTM) Technical Reference Version 1.01 Manual (2008)

  27. Open, C.L.: Available from: http://www.khronos.org/opencl/ (2008)

  28. RapidMind Development Platform. Available from: http://www.rapidmind.net/ (2008)

  29. Buck, I., Foley, T., Horn, D., Sugerman, J., et al.: Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23(3), 777–786 (2004). doi:10.1145/1015706.1015800

    Article  Google Scholar 

  30. McCool, M., Toit, S.D.: Metaprogramming GPUs with Sh. AK Peters, Wellesley (2004)

    Google Scholar 

  31. Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA tesla: a unified graphics and computing architecture. In: IEEE Micro, pp. 39–55, 2008

  32. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., et al.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

  33. Che, S., Boyer, M., Meng, J., Tarjan, D. et al.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), (2008). doi:10.1016/j.jpdc.2008.05.014

  34. Merkle, R.: A certified digital signature. In: Advances in Cryptology—CRYPTO. Lecture Notes in Computer Science. Springer, Berlin (1989)

  35. Damgard, I.: A design principle for hash functions. In: Advances in Cryptology—CRYPTO. Lecture Notes in Computer Science. Springer, Berlin (1989)

  36. Hargrove, P.H., Duell, J.C.: Berkeley Lab Checkpoint/Restart (BLCR) for Linux clusters. In: Scientific Discovery through Advanced Computing Program (SciDAC), 2006

  37. Altschul, S.F., Gish, W., Miller, W., Myers, E., et al.: Basic local alignment tool. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  38. Liu, W., Schmidt, B., Voss, G., Schroder, A., et al.: Bio-sequence database scanning on a GPU. In: IPDPS, 2006

  39. Thompson, C.J., Hahn, S., Oskin, M.: Using modern graphics architectures for general-purpose computing: a framework and analysis. In: ACM/IEEE International Symposium on Microarchitecture, 2002

  40. Kruger, J., Westermann, R.: Linear algebra operators for GPU implementation of numerical algorithms. In: ACM SIGGRAPH International Conference on Computer Graphics and Interactive Techniques, 2003

  41. Govindaraju, N.K., Lloyd, B., Wang, W., Manocha, M.L.: Fast computation of database operations using graphics processors. In: ACM SIGMOD International Conference on Management of Data, 2004

  42. Curry, M.L., Skjellum, A., Ward, H.L., Brightwell, R.: Accelerating Reed–Solomon coding in RAID systems with GPUs. In: IPDPS, 2008

  43. Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960). doi:10.1137/0108018

    Article  MATH  MathSciNet  Google Scholar 

  44. Falcao, G., Sousa, L., Silva, V.: Massive parallel LDPC decoding on GPU. In: ACM SIGPLAN Symposium on Principles and practice of Parallel Programming (PPoPP), Salt Lake City, 2008

  45. Harrison, O., Waldron, J.: AES encryption implementation and analysis on commodity graphics processing units, In: Workshop on Cryptographic Hardware and Embedded Systems (CHES), Vienna, Austria, 2007

  46. Harrison, O., Waldron, J.: Practical symmetric key cryptography on modern graphics hardware. In: USENIX Security Symposium, San Jose, CA, 2008

  47. Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications (ICSPC), Dubai, United Arab Emirates, 2007

  48. Moss, A., Page, D., Smart, N.: Toward acceleration of RSA using 3D graphics hardware. In: Cryptography and Coding, 2007

  49. Kaspersky Antivirus. Available from: http://www.kaspersky.com/ (2008)

  50. Elcomsoft password recovery software. Available from: http://www.elcomsoft.com (2008)

  51. Geforce 9 Series. http://www.nvidia.com/object/geforce9.html (2008)

  52. Dabiri, D., Blake, I.F.: Fast parallel algorithms for decoding Reed–Solomon codes based on remainder polynomials. IEEE Trans. Inf. Theory 41(4), 873–885 (1995). doi:10.1109/18.391235

    Article  MATH  MathSciNet  Google Scholar 

  53. Gilchrist, J.: Parallel compression with BZIP2. In: IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), 2004

  54. Nightingale, E.B., Peek, D., Chen, P.M., Flinn, J.: Parallelizing security checks on commodity hardware. In: ASPLOS, Seattle, WA, 2008

  55. Geforce 8 Series. http://www.nvidia.com/page/geforce8.html (2008)

  56. Bakhoda, A., Yuan, G., Fung, W.W.L., Wong, H., et al.: Performance analysis of GPU compute workloads via detailed simulation. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA, 2009

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samer Al-Kiswany.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Kiswany, S., Gharaibeh, A., Santos-Neto, E. et al. On GPU’s viability as a middleware accelerator. Cluster Comput 12, 123–140 (2009). https://doi.org/10.1007/s10586-009-0076-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-009-0076-0

Keywords

Navigation