A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

Roca, Jordi; Moya, Victor; Gonzalez, Carlos; Escandell, Vicente; Murciego, Albert; Fernandez, Agustin; Espasa, Roger

doi:10.1007/s00371-010-0492-4

A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

Original Article
Published: 14 April 2010

Volume 26, pages 707–719, (2010)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jordi Roca¹,
Victor Moya¹,
Carlos Gonzalez¹,
Vicente Escandell¹,
Albert Murciego¹,
Agustin Fernandez¹ &
…
Roger Espasa²

174 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few pixels. In contrast, the shader core counts and increasing GFLOPs in modern GPUs clearly suggests parallelizing this computation entirely across multiple shader threads, making use of the powerful wide-ALU instructions. In this paper, we present a very efficient SIMD-like rasterization code targeted at very small triangles that scales very well with the number of shader cores and has higher performance than traditional edge equation based algorithms. We have extended the ATTILA GPU shader ISA (del Barrioet al. in IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241, 2006) with two fixed point instructions to meet the rasterization precision requirement. This paper also introduces a novel subpixel Bounding Box size optimization that adjusts the bounds much more finely, which is critical for small triangles, and doubles the 2×2-pixel stamp test efficiency. The proposed shader rasterization program can run on top of the original pixel shader program in such a way that selected fragments are rasterized, attribute interpolated and pixel shaded in the same pass. Our results show that our technique yields better performance than a classic rasterizer at 8 or more shader cores, with speedups as high as 4× for 16 shader cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Parallel Implementation of 3D Graphics Pipeline

The Design and Prototype Implementation of a Pipelined Heterogeneous Multi-core GPU

GPU Architecture

References

Beyond3D Graphic Hardware and Technical Forums. http://www.beyond3d.com/resources (2010)
Abrash, M.: Rasterization on Larrabee. http://software.intel.com/en-us/articles/rasterization-on-larrabee/ (2009)
Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time Rendering, 3rd edn. Peters, Natick (2008)
Google Scholar
ARB: ARB fragment program specification v 1.0. http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.tx (2002)
del Barrio, V., Gonzalez, C., Roca, J., Fernandez, A.: ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241 (2006)
Cook, H.L., Carpenter, L., Catmull, E.: The Reyes Image Rendering Architecture, pp. 28–35 (1988)
Dudash, B.: Tesselation of displaced subdivision surfaces in DX11. GPU-BBQ 2008. http://www.nvidia.in/object/gpubbq-2008-subdiv.html (2008)
Eldridge, M., Igehy, H., Hanrahan, P.: Pomegranate: a fully scalable graphics architecture. In: SIGGRAPH ’00: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 443–454. ACM/Addison-Wesley, New York (2000)
Chapter Google Scholar
Ewins, J.P., Waller, M.D., White, M., Lister, P.F.: MIP-Map level selection for texture mapping. IEEE Trans. Vis. Comput. Graph. 4(4), 317–329 (1998)
Article Google Scholar
Fatahalian, K., Luong, E., Boulos, S., Akeley, K., Mark, W.R., Hanrahan, P.: Data-parallel rasterization of micropolygons with defocus and motion blur. In: HPG ’09: Proceedings of the Conference on High Performance Graphics, pp. 59–68. ACM, New York (2009)
Chapter Google Scholar
Greene, N., Kass, M., Miller, G.: Hierarchical Z-buffer visibility. In: SIGGRAPH ’93: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, pp. 231–238. ACM, New York (1993)
Chapter Google Scholar
Hennessy, J., Patterson, D.: Computer Architecture—A Quantitative Approach. Morgan Kaufmann, San Mateo (2003)
Google Scholar
Kun, Z., Qiming, H.: RenderAnts: interactive REYES rendering on GPUs. ACM Trans. Graph. (2009)
Licea-Kane, B.: GLSL: Center or centroid? (or when shaders attack!). http://www.opengl.org/pipeline/article/vol003_6/ (2007)
Low, K.L.: Perspective-Correct Interpolation (2002)
McCool, M.D., Wales, C., Moule, K.: Incremental and hierarchical Hilbert order edge equation polygon rasterization. In: HWWS ’01: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 65–72. ACM, New York (2001)
Chapter Google Scholar
McCormack, J., McNamara, R.: Tiled polygon traversal using half-plane edge functions. In: HWWS ’00: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 15–21. ACM, New York (2000)
Chapter Google Scholar
Mitchell, J., Sander, P.: Applications of explicit Early-Z culling. In: Real-Time Shading Course, SIGGRAPH 2004 (2004)
Olano, M., Greer, T.: Triangle scan conversion using 2D homogeneous coordinates. In: HWWS ’97: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 89–95. ACM, New York (1997)
Chapter Google Scholar
Pineda, J.: A parallel algorithm for polygon rasterization. In: SIGGRAPH ’88: Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques, pp. 17–20. ACM, New York (1988)
Chapter Google Scholar
Zhang, H., Hoff, K.E. III: Fast backface culling using normal masks. In: SI3D ’97: Proceedings of the 1997 Symposium on Interactive 3D Graphics, pp. 103–106. ACM, New York (1997)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Architecture Department (UPC), Barcelona, Spain
Jordi Roca, Victor Moya, Carlos Gonzalez, Vicente Escandell, Albert Murciego & Agustin Fernandez
Intel Barcelona, Barcelona, Spain
Roger Espasa

Authors

Jordi Roca
View author publications
You can also search for this author in PubMed Google Scholar
Victor Moya
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Vicente Escandell
View author publications
You can also search for this author in PubMed Google Scholar
Albert Murciego
View author publications
You can also search for this author in PubMed Google Scholar
Agustin Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Roger Espasa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Roca.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roca, J., Moya, V., Gonzalez, C. et al. A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization. Vis Comput 26, 707–719 (2010). https://doi.org/10.1007/s00371-010-0492-4

Download citation

Published: 14 April 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s00371-010-0492-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

Abstract

Access this article

Similar content being viewed by others

A Parallel Implementation of 3D Graphics Pipeline

The Design and Prototype Implementation of a Pipelined Heterogeneous Multi-core GPU

GPU Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

Abstract

Access this article

Similar content being viewed by others

A Parallel Implementation of 3D Graphics Pipeline

The Design and Prototype Implementation of a Pipelined Heterogeneous Multi-core GPU

GPU Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation