Abstract
This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few pixels. In contrast, the shader core counts and increasing GFLOPs in modern GPUs clearly suggests parallelizing this computation entirely across multiple shader threads, making use of the powerful wide-ALU instructions. In this paper, we present a very efficient SIMD-like rasterization code targeted at very small triangles that scales very well with the number of shader cores and has higher performance than traditional edge equation based algorithms. We have extended the ATTILA GPU shader ISA (del Barrioet al. in IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241, 2006) with two fixed point instructions to meet the rasterization precision requirement. This paper also introduces a novel subpixel Bounding Box size optimization that adjusts the bounds much more finely, which is critical for small triangles, and doubles the 2×2-pixel stamp test efficiency. The proposed shader rasterization program can run on top of the original pixel shader program in such a way that selected fragments are rasterized, attribute interpolated and pixel shaded in the same pass. Our results show that our technique yields better performance than a classic rasterizer at 8 or more shader cores, with speedups as high as 4× for 16 shader cores.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Beyond3D Graphic Hardware and Technical Forums. http://www.beyond3d.com/resources (2010)
Abrash, M.: Rasterization on Larrabee. http://software.intel.com/en-us/articles/rasterization-on-larrabee/ (2009)
Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time Rendering, 3rd edn. Peters, Natick (2008)
ARB: ARB fragment program specification v 1.0. http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.tx (2002)
del Barrio, V., Gonzalez, C., Roca, J., Fernandez, A.: ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241 (2006)
Cook, H.L., Carpenter, L., Catmull, E.: The Reyes Image Rendering Architecture, pp. 28–35 (1988)
Dudash, B.: Tesselation of displaced subdivision surfaces in DX11. GPU-BBQ 2008. http://www.nvidia.in/object/gpubbq-2008-subdiv.html (2008)
Eldridge, M., Igehy, H., Hanrahan, P.: Pomegranate: a fully scalable graphics architecture. In: SIGGRAPH ’00: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 443–454. ACM/Addison-Wesley, New York (2000)
Ewins, J.P., Waller, M.D., White, M., Lister, P.F.: MIP-Map level selection for texture mapping. IEEE Trans. Vis. Comput. Graph. 4(4), 317–329 (1998)
Fatahalian, K., Luong, E., Boulos, S., Akeley, K., Mark, W.R., Hanrahan, P.: Data-parallel rasterization of micropolygons with defocus and motion blur. In: HPG ’09: Proceedings of the Conference on High Performance Graphics, pp. 59–68. ACM, New York (2009)
Greene, N., Kass, M., Miller, G.: Hierarchical Z-buffer visibility. In: SIGGRAPH ’93: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, pp. 231–238. ACM, New York (1993)
Hennessy, J., Patterson, D.: Computer Architecture—A Quantitative Approach. Morgan Kaufmann, San Mateo (2003)
Kun, Z., Qiming, H.: RenderAnts: interactive REYES rendering on GPUs. ACM Trans. Graph. (2009)
Licea-Kane, B.: GLSL: Center or centroid? (or when shaders attack!). http://www.opengl.org/pipeline/article/vol003_6/ (2007)
Low, K.L.: Perspective-Correct Interpolation (2002)
McCool, M.D., Wales, C., Moule, K.: Incremental and hierarchical Hilbert order edge equation polygon rasterization. In: HWWS ’01: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 65–72. ACM, New York (2001)
McCormack, J., McNamara, R.: Tiled polygon traversal using half-plane edge functions. In: HWWS ’00: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 15–21. ACM, New York (2000)
Mitchell, J., Sander, P.: Applications of explicit Early-Z culling. In: Real-Time Shading Course, SIGGRAPH 2004 (2004)
Olano, M., Greer, T.: Triangle scan conversion using 2D homogeneous coordinates. In: HWWS ’97: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 89–95. ACM, New York (1997)
Pineda, J.: A parallel algorithm for polygon rasterization. In: SIGGRAPH ’88: Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques, pp. 17–20. ACM, New York (1988)
Zhang, H., Hoff, K.E. III: Fast backface culling using normal masks. In: SI3D ’97: Proceedings of the 1997 Symposium on Interactive 3D Graphics, pp. 103–106. ACM, New York (1997)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Roca, J., Moya, V., Gonzalez, C. et al. A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization. Vis Comput 26, 707–719 (2010). https://doi.org/10.1007/s00371-010-0492-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-010-0492-4