Skip to main content
Log in

A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few pixels. In contrast, the shader core counts and increasing GFLOPs in modern GPUs clearly suggests parallelizing this computation entirely across multiple shader threads, making use of the powerful wide-ALU instructions. In this paper, we present a very efficient SIMD-like rasterization code targeted at very small triangles that scales very well with the number of shader cores and has higher performance than traditional edge equation based algorithms. We have extended the ATTILA GPU shader ISA (del Barrioet al. in IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241, 2006) with two fixed point instructions to meet the rasterization precision requirement. This paper also introduces a novel subpixel Bounding Box size optimization that adjusts the bounds much more finely, which is critical for small triangles, and doubles the 2×2-pixel stamp test efficiency. The proposed shader rasterization program can run on top of the original pixel shader program in such a way that selected fragments are rasterized, attribute interpolated and pixel shaded in the same pass. Our results show that our technique yields better performance than a classic rasterizer at 8 or more shader cores, with speedups as high as 4× for 16 shader cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Beyond3D Graphic Hardware and Technical Forums. http://www.beyond3d.com/resources (2010)

  2. Abrash, M.: Rasterization on Larrabee. http://software.intel.com/en-us/articles/rasterization-on-larrabee/ (2009)

  3. Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time Rendering, 3rd edn. Peters, Natick (2008)

    Google Scholar 

  4. ARB: ARB fragment program specification v 1.0. http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.tx (2002)

  5. del Barrio, V., Gonzalez, C., Roca, J., Fernandez, A.: ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241 (2006)

  6. Cook, H.L., Carpenter, L., Catmull, E.: The Reyes Image Rendering Architecture, pp. 28–35 (1988)

  7. Dudash, B.: Tesselation of displaced subdivision surfaces in DX11. GPU-BBQ 2008. http://www.nvidia.in/object/gpubbq-2008-subdiv.html (2008)

  8. Eldridge, M., Igehy, H., Hanrahan, P.: Pomegranate: a fully scalable graphics architecture. In: SIGGRAPH ’00: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 443–454. ACM/Addison-Wesley, New York (2000)

    Chapter  Google Scholar 

  9. Ewins, J.P., Waller, M.D., White, M., Lister, P.F.: MIP-Map level selection for texture mapping. IEEE Trans. Vis. Comput. Graph. 4(4), 317–329 (1998)

    Article  Google Scholar 

  10. Fatahalian, K., Luong, E., Boulos, S., Akeley, K., Mark, W.R., Hanrahan, P.: Data-parallel rasterization of micropolygons with defocus and motion blur. In: HPG ’09: Proceedings of the Conference on High Performance Graphics, pp. 59–68. ACM, New York (2009)

    Chapter  Google Scholar 

  11. Greene, N., Kass, M., Miller, G.: Hierarchical Z-buffer visibility. In: SIGGRAPH ’93: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, pp. 231–238. ACM, New York (1993)

    Chapter  Google Scholar 

  12. Hennessy, J., Patterson, D.: Computer Architecture—A Quantitative Approach. Morgan Kaufmann, San Mateo (2003)

    Google Scholar 

  13. Kun, Z., Qiming, H.: RenderAnts: interactive REYES rendering on GPUs. ACM Trans. Graph. (2009)

  14. Licea-Kane, B.: GLSL: Center or centroid? (or when shaders attack!). http://www.opengl.org/pipeline/article/vol003_6/ (2007)

  15. Low, K.L.: Perspective-Correct Interpolation (2002)

  16. McCool, M.D., Wales, C., Moule, K.: Incremental and hierarchical Hilbert order edge equation polygon rasterization. In: HWWS ’01: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 65–72. ACM, New York (2001)

    Chapter  Google Scholar 

  17. McCormack, J., McNamara, R.: Tiled polygon traversal using half-plane edge functions. In: HWWS ’00: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 15–21. ACM, New York (2000)

    Chapter  Google Scholar 

  18. Mitchell, J., Sander, P.: Applications of explicit Early-Z culling. In: Real-Time Shading Course, SIGGRAPH 2004 (2004)

  19. Olano, M., Greer, T.: Triangle scan conversion using 2D homogeneous coordinates. In: HWWS ’97: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 89–95. ACM, New York (1997)

    Chapter  Google Scholar 

  20. Pineda, J.: A parallel algorithm for polygon rasterization. In: SIGGRAPH ’88: Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques, pp. 17–20. ACM, New York (1988)

    Chapter  Google Scholar 

  21. Zhang, H., Hoff, K.E. III: Fast backface culling using normal masks. In: SI3D ’97: Proceedings of the 1997 Symposium on Interactive 3D Graphics, pp. 103–106. ACM, New York (1997)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Roca.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roca, J., Moya, V., Gonzalez, C. et al. A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization. Vis Comput 26, 707–719 (2010). https://doi.org/10.1007/s00371-010-0492-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-010-0492-4

Keywords

Navigation