Abstract
Streaming languages were originally aimed at streaming architectures, but recent work has shown the stream programming model to be useful in exploiting parallelism on general purpose processors. Current research in mapping stream code onto GPPs deals with load balancing and generating threads based on hardware features. We look into improving problems associated with stream data locality and stream data parallelism on GPPs. We suggest that automatically generating vectorized code for these streaming operations is a potential solution. We use the Brook stream language as our syntax base and augment it to generate vector intrinsics targeting the x86 architecture. This compiler uses both existing and new strategies to transform high-level streaming kernel code into vector instructions without requiring additional annotations. We compare our system’s results to existing mapping strategies aimed at using stream code on GPPs. When evaluating performance, we see a wide range of speedups from a few percent to over 2x and discuss the level of effectiveness of using vector code over scalar equivalents in specific application domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Owens, J.D., Rixner, S., Kapasi, U.J., Mattson, P., Towles, B., Serebrin, B., Dally, W.J.: Media processing applications on the imagine stream processor. In: International Conference on Computer Design, p. 295 (2002)
Taylor, M.B., Lee, W., Miller, J., Wentzlaff, D., Bratt, I., Greenwald, B., Hoffmann, H., Johnson, P., Kim, J., Psota, J., Saraf, A., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A.: Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In: ISCA 2004: Proceedings of the 31st annual international symposium on Computer architecture, Washington, DC, USA, vol. 2. IEEE Computer Society, Los Alamitos (2004)
Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., Namkoong, J., Owens, J.D., Towles, B., Chang, A., Rixner, S.: Imagine: Media processing with streams. IEEE Micro 21(2), 35–46 (2001)
Zhang, X.D.: A streaming computation framework for the cell processor. M. eng. thesis, Massachusetts Institute of Technology, Cambridge, MA (August 2007)
Zhang, X.D., Li, Q.J., Rabbah, R., Amarasinghe, S.: A lightweight streaming layer for multicore execution. In: Workshop on Design, Architecture and Simulation of Chip Multi-Processors, Chicago, IL (December 2007)
Amarasinghe, S.: StreamIt A Programming Language for the Era of Multicores (November 2006)
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: SIGGRAPH 2004: ACM SIGGRAPH 2004 Papers, pp. 777–786. ACM, New York (2004)
Gummaraju, J., Rosenblum, M.: Stream programming on general-purpose processors. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 343–354. IEEE Computer Society, Los Alamitos (2005)
Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W.J.: Architectural support for the stream execution model on general-purpose processors. In: PACT 2007: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, Washington, DC, USA, pp. 3–12. IEEE Computer Society, Los Alamitos (2007)
Talla, D., John, L.K., Burger, D.: Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements. IEEE Trans. Comput. 52(8), 1015–1031 (2003)
Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: PLDI 2008: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pp. 114–124. ACM, New York (2008)
Gummaraju, J., Coburn, J., Turner, Y., Rosenblum, M.: Streamware: programming general-purpose multicore processors using streams. SIGOPS Oper. Syst. Rev. 42(2), 297–307 (2008)
wei Liao, S., Du, Z., Wu, G., Lueh, G.Y.: Data and computation transformations for brook streaming applications on multiprocessors. In: CGO 2006: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA, pp. 196–207. IEEE Computer Society, Los Alamitos (2006)
Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)
Das, A., Dally, W.J., Mattson, P.: Compiling for stream processing. In: PACT 2006: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pp. 33–42. ACM, New York (2006)
Amarasinghe, S., Gordon, M.I., Karczmarek, M., Lin, J., Maze, D., Rabbah, R.M., Thies, W.: Language and compiler design for streaming applications. Int. J. Parallel Program. 33(2), 261–278 (2005)
Advanced Micro Devices, Inc.: AMD Brook+ (November 2007), http://ati.amd.com/technology/streamcomputing/AMD-Brookplus.pdf
Nuzman, D., Zaks, A.: Autovectorization in GCC - two years later. In: GCC Summit (June 2006)
Naishlos, D.: Autovectorization in GCC. In: GCC Summit (June 2004)
Intel Corp.: Intel(R) C++ Compiler Intrinsics Reference (2007) ftp://download.intel.com/support/performancetools/c/linux/v9/intref_cls.pdf
Intel Corp.: Intel(R) 64 and IA-32 Architectures Optimization Reference Manual (2007), http://www.intel.com/design/processor/manuals/248966.pdf
Mucci, P.J.: PapiEx - Execute arbitrary application and measure hardware performance counters with PAPI (2009), http://icl.cs.utk.edu/~mucci/papiex/
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. SIGPLAN Not. 41(6), 132–143 (2006)
Stratton, J., Stone, S., mei Hwu, W.: MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)
RapidMind: RapidMind Development Platform (May 2008), http://www.sharcnet.ca/events/ssgc2008/presentations/2008-05-27%20RapidMind%20SHARCnet.pdf
Krall, A., Lelait, S.: Compilation techniques for multimedia processors. International Journal of Parallel Programming 28, 347–361 (2000)
Allen, R., Kennedy, K.: Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems 9, 491–542 (1987)
Ren, G., Wu, P., Padua, D.: A preliminary study on the vectorization of multimedia applications for multimedia extensions. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 420–435. Springer, Heidelberg (2004)
Larsen, S., Rabbah, R., Amarasinghe, S.: Exploiting vector parallelism in software pipelined loops. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 119–129. IEEE Computer Society, Los Alamitos (2005)
Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO 2006: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA, pp. 281–294. IEEE Computer Society, Los Alamitos (2006)
Intel Corp.: Intel(R) Advanced Vector Extensions Programming Reference (2008), http://softwarecommunity.intel.com/isn/downloads/intelavx/Intel-AVX-Programming-Reference-319433003.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Manley, R., Gregg, D. (2010). Mapping Streaming Languages to General Purpose Processors through Vectorization. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-13374-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)