Abstract
Culling techniques have always been a central part of computer graphics, but graphics hardware still lack efficient and flexible support for culling. To improve the situation, we introduce the programmable culling unit, which is as flexible as the fragment program unit and capable of quickly culling entire blocks of fragments. Furthermore, it is very easy for the developer to use the PCU as culling programs can be automatically derived from fragment programs containing a discard instruction. Our PCU can be integrated into an existing fragment program unit with a modest hardware overhead of only about 10%. Using the PCU, we have observed shader speedups between 1.4 and 2.1 for relevant scenes.
Supplemental Material
- Aila, T., Miettinen, V., and Nordlund, P. 2003. Delay streams for graphics hardware. ACM Transactions on Graphics, 22, 3, 792--800. Google ScholarDigital Library
- Akenine-Möller, T., and Ström, J. 2003. Graphics for the masses: A hardware rasterization architecture for mobile phones. ACM Transactions on Graphics, 22, 3, 801--808. Google ScholarDigital Library
- Bittner, J., Wimmer, M., Piringer, H., and Purgathofer, W. 2004. Coherent hierarchical culling: Hardware occlusion queries made useful. Computer Graphics Forum, 23, 3, 615--624.Google ScholarCross Ref
- Blythe, D. 2006. The direct3d 10 system. ACM Transactions on Graphics, 25, 3, 724--734. Google ScholarDigital Library
- Comba, J. L. D., and Stolfi, J. 1993. Affine arithmetic and its applications to computer graphics. In SIBGRAPI 1993, 9--18.Google Scholar
- Cook, R. L. 1984. Shade trees. In Computer Graphics (Proceedings of ACM SIGGRAPH 84), 223--231. Google ScholarDigital Library
- Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Language Systems 13, 4, 451--490. Google ScholarDigital Library
- Doggett, M., 2005. Overview of the xbox360 gpu. Keynote at EUROGRAPHICS 2005.Google Scholar
- Donovan, W., 2006. Pixel load instruction for a programmable graphics processor. US Patent 7,091,979.Google Scholar
- Greene, N., and Kass, M. 1994. Error-bounded antialiased rendering of complex environments. In Proceedings of ACM SIGGRAPH 1994, 59--66. Google ScholarDigital Library
- Greene, N., Kass, M., and Miller, G. 1993. Hierarchical z-buffer visibility. In Proceedings of ACM SIGGRAPH 1993, 231--238. Google ScholarDigital Library
- Heidrich, W., Slusallek, P., and Seidel, H.-P. 1998. Sampling procedural shaders using affine arithmetic. In Proceedings of ACM SIGGRAPH 1998, 158--176. Google ScholarDigital Library
- Kearfott, R. B. 1996. Interval computations: Introduction, uses, and resources. Euromath Bulletin 2, 1, 95--112.Google Scholar
- Lindholm, E., Kilgard, M. J., and Moreton, H. 2001. A user-programmable vertex engine. In Proceedings of ACM SIGGRAPH 2001, ACM Press, 149--158. Google ScholarDigital Library
- Loop, C., and Blinn, J. 2006. Real-time gpu rendering of piece-wise algebraic surfaces. ACM Transactions on Graphics, 25, 3, 664--670. Google ScholarDigital Library
- Mammen, A. 1989. Transparency and antialiasing algorithms implemented with the virtual pixel maps technique. IEEE Computer Graphics and Applications 9, 4, 43--55. Google ScholarDigital Library
- McCool, M. D., Wales, C., and Moule, K. 2002. Incremental and hierarchical hilbert order edge equation polygon rasterization. In Graphics Hardware, 65--72. Google ScholarDigital Library
- Molnar, S., and Montrym, J., 2006. Position conflict detection and avoidance in a programmable graphics processor using tile coverage data. US Patent 7,053,893.Google Scholar
- Moore, R. E. 1966. Interval Analysis. Prentice-Hall.Google Scholar
- Morein, S. 2000. Ati radeon hyperz technology. In Workshop on Graphics Hardware, Hot3D Proceedings, ACM Press.Google Scholar
- Moule, K., and McCool, M. D. 2002. Efficient bounded adaptive tesselation of displacement maps. In Graphics Interface, 171--180.Google Scholar
- Purcell, T. J., Donner, C., Cammarano, M., Jensen, H. W., and Hanrahan, P. 2003. Photon mapping on programmable graphics hardware. In Graphics Hardware, 41--50. Google ScholarDigital Library
- Stamminger, M., Slusallek, P., and Seidel, H.-P. 1997. Bounded radiosity --- illumination on general surfaces and clusters. Computer Graphics Forum 16, 3, C309--C317.Google ScholarCross Ref
- Tatarchuk, N. 2006. Dynamic parallax occlusion mapping with approximate soft shadows. In Proceedings of ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (SI3D '06), 63--69. Google ScholarDigital Library
- Uralsky, Y. 2005. Efficient Soft-Edged Shadows Using Pixel Shader Branching. In GPU Gems 2. Addison-Wesley Professional, 269--282.Google Scholar
Index Terms
- PCU: the programmable culling unit
Recommendations
PCU: the programmable culling unit
SIGGRAPH '07: ACM SIGGRAPH 2007 papersCulling techniques have always been a central part of computer graphics, but graphics hardware still lack efficient and flexible support for culling. To improve the situation, we introduce the programmable culling unit, which is as flexible as the ...
Automatic pre-tessellation culling
Graphics processing units supporting tessellation of curved surfaces with displacement mapping exist today. Still, to our knowledge, culling only occurs after tessellation, that is, after the base primitives have been tessellated into triangles. We ...
Real-time multiply recursive reflections and refractions using hybrid rendering
We present a new method for real-time rendering of multiple recursions of reflections and refractions. The method uses the strengths of real-time ray tracing for objects close to the camera, by storing them in a per-frame constructed bounding volume ...
Comments