Abstract
Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware-level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support.
We take a first step towards validating this hypothesis and we design StoreGPU, a library that accelerates a number of hashing-based middleware primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up twenty five fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.
Similar content being viewed by others
References
Moya, V., Gonzalez, C., Roca, J., Fernandez, A., et al.: Shader performance analysis on a modern GPU architecture. In: IEEE/ACM International Symposium on Microarchitecture, MICRO-38, 2005
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., et al.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007). doi:10.1111/j.1467-8659.2007.01012.x
NVIDIA CUDA Compute Unified Device Architecture: Programming Guide v2.0 (2008)
Quinlan, S., Dorward, S.: Venti: a new approach to archival data storage. In: FAST, Monterey, CA, 2002
Twisted Storage. http://twistedstorage.sourceforge.net/ (2008)
Weatherspoon, H., Kubiatowicz, J.: Erasure coding vs. replication: a quantitative comparison. In: IPTPS, 2002
Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system, In: SOSP, 2001
Chun, B.-G., Dabek, F., Haeberlen, A., Sit, E., et al.: Efficient replica maintenance for distributed storage systems. In: NSDI, San Jose, CA, (2006)
Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970). doi:10.1145/362686.362692
Huffman, D.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952). doi:10.1109/JRPROC.1952.273898
Vilayannur, M., Nath, P., Sivasubramaniam, A.: Providing tunable consistency for a parallel file store. In: USENIX Conference on File and Storage Technologies, 2005
Al-Kiswany, S., Ripeanu, M., Vazhkudai, S., Gharaibeh, A.: STDCHK: a checkpoint storage system for desktop grid computing. In: ICDCS, Beijing, China, 2008
Yumerefendi, A.R., Chase, J.S.: Strong accountability for network storage. In: FAST’07, 2007
Cox, L.P., Noble, B.D.: Samsara: honor among thieves in peer-to-peer storage. In: ACM Symposium on Operating Systems Principles, 2003
Fu, K., Kaashoek, M.F., Mazières, D.: Fast and secure distributed read-only file system. In: OSDI, 2000
Kotla, R., Alvisi, L., Dahlin, M.: SafeStore: a durable and practical storage system. In: USENIX Annual Technical Conference, 2007
Karger, D.R., Lehman, E., Leighton, F.T., Panigrahy, R., et al.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Symposium on Theory of Computing, 1997. ACM, New York (1997)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., et al.: Chord: a scalable peer-to-peer lookup service for Internet applications. In: SIGCOMM 2001, San Diego, USA, 2001
Rowstron, A., Druschel, P.: Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, 2001
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., et al.: Dynamo: Amazon’s highly available key-value store. In: SOSP07, 2007
Dabek, F., Kaashoek, M.F., Karger, D., Morris, R., et al.: Wide-area cooperative storage with CFS. In: SOSP, 2001
Eshghi, K., Lillibridge, M., Wilcock, L., Belrose, G., et al.: JumboStore: providing efficient incremental upload and versioning for a utility rendering service. In: FAST, 2007
Jon Peddie Research Report: NVIDIA on a roll, grabs more desktop graphics market share in 4Q. http://www.jonpeddie.com/about/press/MarketWatch_Q405.shtml (2006)
Jon Peddie Research Report: Overall GPU market was up an astounding 20%—desktop displaced mobile. http://www.jonpeddie.com/about/press/2007/GPU_market_Q307.shtml (2007)
AMD Stream Computing SDK. Available from: http://ati.amd.com/technology/streamcomputing/ (2008)
ATI Close To Metal (CTM) Technical Reference Version 1.01 Manual (2008)
Open, C.L.: Available from: http://www.khronos.org/opencl/ (2008)
RapidMind Development Platform. Available from: http://www.rapidmind.net/ (2008)
Buck, I., Foley, T., Horn, D., Sugerman, J., et al.: Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23(3), 777–786 (2004). doi:10.1145/1015706.1015800
McCool, M., Toit, S.D.: Metaprogramming GPUs with Sh. AK Peters, Wellesley (2004)
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA tesla: a unified graphics and computing architecture. In: IEEE Micro, pp. 39–55, 2008
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., et al.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Che, S., Boyer, M., Meng, J., Tarjan, D. et al.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), (2008). doi:10.1016/j.jpdc.2008.05.014
Merkle, R.: A certified digital signature. In: Advances in Cryptology—CRYPTO. Lecture Notes in Computer Science. Springer, Berlin (1989)
Damgard, I.: A design principle for hash functions. In: Advances in Cryptology—CRYPTO. Lecture Notes in Computer Science. Springer, Berlin (1989)
Hargrove, P.H., Duell, J.C.: Berkeley Lab Checkpoint/Restart (BLCR) for Linux clusters. In: Scientific Discovery through Advanced Computing Program (SciDAC), 2006
Altschul, S.F., Gish, W., Miller, W., Myers, E., et al.: Basic local alignment tool. Mol. Biol. 215, 403–410 (1990)
Liu, W., Schmidt, B., Voss, G., Schroder, A., et al.: Bio-sequence database scanning on a GPU. In: IPDPS, 2006
Thompson, C.J., Hahn, S., Oskin, M.: Using modern graphics architectures for general-purpose computing: a framework and analysis. In: ACM/IEEE International Symposium on Microarchitecture, 2002
Kruger, J., Westermann, R.: Linear algebra operators for GPU implementation of numerical algorithms. In: ACM SIGGRAPH International Conference on Computer Graphics and Interactive Techniques, 2003
Govindaraju, N.K., Lloyd, B., Wang, W., Manocha, M.L.: Fast computation of database operations using graphics processors. In: ACM SIGMOD International Conference on Management of Data, 2004
Curry, M.L., Skjellum, A., Ward, H.L., Brightwell, R.: Accelerating Reed–Solomon coding in RAID systems with GPUs. In: IPDPS, 2008
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960). doi:10.1137/0108018
Falcao, G., Sousa, L., Silva, V.: Massive parallel LDPC decoding on GPU. In: ACM SIGPLAN Symposium on Principles and practice of Parallel Programming (PPoPP), Salt Lake City, 2008
Harrison, O., Waldron, J.: AES encryption implementation and analysis on commodity graphics processing units, In: Workshop on Cryptographic Hardware and Embedded Systems (CHES), Vienna, Austria, 2007
Harrison, O., Waldron, J.: Practical symmetric key cryptography on modern graphics hardware. In: USENIX Security Symposium, San Jose, CA, 2008
Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications (ICSPC), Dubai, United Arab Emirates, 2007
Moss, A., Page, D., Smart, N.: Toward acceleration of RSA using 3D graphics hardware. In: Cryptography and Coding, 2007
Kaspersky Antivirus. Available from: http://www.kaspersky.com/ (2008)
Elcomsoft password recovery software. Available from: http://www.elcomsoft.com (2008)
Geforce 9 Series. http://www.nvidia.com/object/geforce9.html (2008)
Dabiri, D., Blake, I.F.: Fast parallel algorithms for decoding Reed–Solomon codes based on remainder polynomials. IEEE Trans. Inf. Theory 41(4), 873–885 (1995). doi:10.1109/18.391235
Gilchrist, J.: Parallel compression with BZIP2. In: IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), 2004
Nightingale, E.B., Peek, D., Chen, P.M., Flinn, J.: Parallelizing security checks on commodity hardware. In: ASPLOS, Seattle, WA, 2008
Geforce 8 Series. http://www.nvidia.com/page/geforce8.html (2008)
Bakhoda, A., Yuan, G., Fung, W.W.L., Wong, H., et al.: Performance analysis of GPU compute workloads via detailed simulation. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA, 2009
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Al-Kiswany, S., Gharaibeh, A., Santos-Neto, E. et al. On GPU’s viability as a middleware accelerator. Cluster Comput 12, 123–140 (2009). https://doi.org/10.1007/s10586-009-0076-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-009-0076-0