Abstract
Both the frame rate and the latency are crucial to the performance of realtime rendering applications such as video games. Spatial supersampling methods, such as the Deep Learning SuperSampling (DLSS), have been proven successful at decreasing the rendering time of each frame by rendering at a lower resolution. But temporal supersampling methods that directly aim at producing more frames on the fly are still not practically available. This is mainly due to both its own computational cost and the latency introduced by interpolating frames from the future. In this paper, we present ExtraNet, an efficient neural network that predicts accurate shading results on an extrapolated frame, to minimize both the performance overhead and the latency. With the help of the rendered auxiliary geometry buffers of the extrapolated frame, and the temporally reliable motion vectors, we train our ExtraNet to perform two tasks simultaneously: irradiance in-painting for regions that cannot find historical correspondences, and accurate ghosting-free shading prediction for regions where temporal information is available. We present a robust hole-marking strategy to automate the classification of these tasks, as well as the data generation from a series of high-quality production-ready scenes. Finally, we use lightweight gated convolutions to enable fast inference. As a result, our ExtraNet is able to produce plausibly extrapolated frames without easily noticeable artifacts, delivering a 1.5× to near 2× increase in frame rates with minimized latency in practice.
Supplemental Material
- Dmitry Andreev. 2010. Real-Time Frame Rate up-Conversion for Video Games: Or How to Get from 30 to 60 Fps for "Free". In ACM SIGGRAPH 2010 Talks (Los Angeles, California) (SIGGRAPH '10). Association for Computing Machinery, Article 16, 1 pages. Google ScholarDigital Library
- Simon Baker, Stefan Roth, Daniel Scharstein, Michael J. Black, J.P. Lewis, and Richard Szeliski. 2007. A Database and Evaluation Methodology for Optical Flow. In 2007 IEEE 11th International Conference on Computer Vision. 1--8.Google ScholarCross Ref
- Steve Bako, Thijs Vogels, Brian Mcwilliams, Mark Meyer, Jan NováK, Alex Harvill, Pradeep Sen, Tony Derose, and Fabrice Rousselle. 2017. Kernel-predicting Convolutional Networks for Denoising Monte Carlo Renderings. ACM Trans. Graph. 36, 4 (July 2017), 97:1--97:14. Google ScholarDigital Library
- Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2019. Depth-Aware Video Frame Interpolation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3698--3707.Google Scholar
- Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2021. MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2021), 933--948.Google ScholarCross Ref
- Huw Bowles, Kenny Mitchell, Robert Sumner, Jeremy Moore, and Markus Gross. 2012. Iterative Image Warping. Computer Graphics Forum 31 (05 2012), 1.Google Scholar
- Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi, Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Reconstruction of Monte Carlo Image Sequences Using a Recurrent Denoising Autoencoder. ACM Trans. Graph. 36, 4 (July 2017), 98:1--98:12. Google ScholarDigital Library
- Gyorgy Denes, Kuba Maruszczyk, George Ash, and Rafał K. Mantiuk. 2019. Temporal Resolution Multiplexing: Exploiting the limitations of spatio-temporal vision for more efficient VR rendering. IEEE Transactions on Visualization and Computer Graphics 25, 5 (2019), 2072--2082.Google ScholarCross Ref
- Piotr Didyk, Elmar Eisemann, Tobias Ritschel, Karol Myszkowski, and Hans-Peter Seidel. 2010a. Perceptually-motivated Real-time Temporal Upsampling of 3D Content for High-refresh-rate Displays. Computer Graphics Forum 29, 2 (2010), 713--722.Google ScholarCross Ref
- Piotr Didyk, Tobias Ritschel, Elmar Eisemann, Karol Myszkowski, and Hans-Peter Seidel. 2010b. Adaptive Image-space Stereo View Synthesis. In Vision, Modeling, and Visualization (2010). The Eurographics Association.Google Scholar
- Epic Games. 2018. Unreal Engine 4.19: Screen Percentage with Temporal Upsample. https://docs.unrealengine.com/en-US/Engine/Rendering/ScreenPercentage/index.html. Accessed in August 2019.Google Scholar
- Denis Fortun, Patrick Bouthemy, and Charles Kervrann. 2015. Optical flow modeling and computation: A survey. Computer Vision and Image Understanding 134 (2015), 1 -- 21. Image Understanding for Real-world Distributed Video Networks. Google ScholarDigital Library
- Jie Guo, Mengtian Li, Quewei Li, Yuting Qiang, Bingyang Hu, Yanwen Guo, and Ling-Qi Yan. 2019. GradNet: Unsupervised Deep Screened Poisson Reconstruction for Gradient-Domain Rendering. ACM Trans. Graph. 38, 6, Article 223 (Nov. 2019), 13 pages. Google ScholarDigital Library
- Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 558--567.Google ScholarCross Ref
- Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, and Shuchang Zhou. 2020. RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation. arXiv preprint arXiv:2011.06294 (2020).Google Scholar
- Huaizu Jiang, Deqing Sun, Varan Jampani, Ming-Hsuan Yang, Erik Learned-Miller, and Jan Kautz. 2018. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9000--9008.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Thomas Leimkühler, Hans-Peter Seidel, and Tobias Ritschel. 2017. Minimal Warping: Planning Incremental Novel-view Synthesis. Computer Graphics Form (Proc. EGSR) 36, 4 (2017). Google ScholarDigital Library
- Edward Liu. 2020. DLSS 2.0 - Image Reconstruction for Real-Time Rendering with Deep learning. In Game Developers Conference.Google Scholar
- Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. In The European Conference on Computer Vision (ECCV).Google Scholar
- Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, and Chao Yang. 2020. Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations. In Computer Vision - ECCV 2020. Springer International Publishing, Cham, 725--741.Google ScholarDigital Library
- Michael Mara, Morgan McGuire, Benedikt Bitterli, and Wojciech Jarosz. 2017. An efficient denoising algorithm for global illumination. High Performance Graphics 10 (2017), 3105762--3105774.Google Scholar
- William R. Mark, Leonard McMillan, and Gary Bishop. 1997. Post-Rendering 3D Warping. In Proceedings of the 1997 Symposium on Interactive 3D Graphics (Providence, Rhode Island, USA) (I3D '97). Association for Computing Machinery, New York, NY, USA, 7--ff. Google ScholarDigital Library
- Joerg H. Mueller, Thomas Neff, Philip Voglreiter, Markus Steinberger, and Dieter Schmalstieg. 2021. Temporally Adaptive Shading Reuse for Real-Time Rendering and Virtual Reality. ACM Trans. Graph. 40, 2, Article 11 (April 2021), 14 pages. Google ScholarDigital Library
- Netflix. 2016. Toward a practical perceptual video quality metric. https://medium.com/netflix-techblog/toward-a-practical-perceptual-video-quality-metric-653f208b9652.Google Scholar
- Simon Niklaus and Feng Liu. 2020. Softmax Splatting for Video Frame Interpolation. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Oculus. 2016. Asynchronous SpaceWarp (ASW). https://developer.oculus.com/blog/asynchronous-spacewarp//.Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. Google ScholarDigital Library
- AJ Piergiovanni and Michael S. Ryoo. 2019. Representation Flow for Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Bernhard Reinert, Johannes Kopf, Tobias Ritschel, Eduardo Cuervo, David Chu, and Hans-Peter Seidel. 2016. Proxy-guided Image-based Rendering for Mobile Devices. Computer Graphics Forum 35, 7 (2016), 353--362. Google ScholarDigital Library
- Yurui Ren, Xiaoming Yu, Ruonan Zhang, Thomas H. Li, Shan Liu, and Ge Li. 2019. StructureFlow: Image Inpainting via Structure-Aware Appearance Flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarCross Ref
- Marcel Santana Santos, Tsang Ing Ren, and Nima Khademi Kalantari. 2020. Single Image HDR Reconstruction Using a CNN with Masked Features and Perceptual Loss. ACM Trans. Graph. 39, 4, Article 80 (July 2020), 10 pages.Google ScholarDigital Library
- Daniel Scherzer, Lei Yang, Oliver Mattausch, Diego Nehab, Pedro V. Sander, Michael Wimmer, and Elmar Eisemann. 2012. Temporal Coherence Methods in Real-Time Rendering. Comput. Graph. Forum 31, 8 (Dec. 2012), 2378--2408. Google ScholarDigital Library
- Christoph Schied, Anton Kaplanyan, Chris Wyman, Anjul Patney, Chakravarty R Alla Chaitanya, John Burgess, Shiqiu Liu, Carsten Dachsbacher, Aaron Lefohn, and Marco Salvi. 2017. Spatiotemporal variance-guided filtering: real-time reconstruction for path-traced global illumination. In Proceedings of High Performance Graphics. 1--12. Google ScholarDigital Library
- Andre Schollmeyer, Simon Schneegans, Stephan Beck, Anthony Steed, and Bernd Froehlich. 2017. Efficient Hybrid Image Warping for High Frame-Rate Stereoscopic Rendering. IEEE Transactions on Visualization and Computer Graphics 23, 4 (2017), 1332--1341. Google ScholarDigital Library
- Pradeep Sen, Matthias Zwicker, Fabrice Rousselle, Sung-Eui Yoon, and Nima Khademi Kalantari. 2015. Denoising Your Monte Carlo Renders: Recent Advances in Image-space Adaptive Sampling and Reconstruction. In ACM SIGGRAPH 2015 Courses (Los Angeles, California) (SIGGRAPH '15). 11:1--11:255. Google ScholarDigital Library
- Eli Shechtman, Alex Rav-Acha, Michal Irani, and Steve Seitz. 2010. Regenerative Morphing. In IEEE Conference on Computer VIsion and Pattern Recognition (CVPR). San-Francisco, CA.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 568--576. Google ScholarDigital Library
- Josef Spjut, Ben Boudaoud, Kamran Binaee, Jonghyun Kim, Alexander Majercik, Morgan McGuire, David Luebke, and Joohwan Kim. 2019. Latency of 30 Ms Benefits First Person Targeting Tasks More Than Refresh Rate Above 60 Hz. In SIGGRAPH Asia 2019 Technical Briefs (Brisbane, QLD, Australia) (SA '19). Association for Computing Machinery, New York, NY, USA, 110--113. Google ScholarDigital Library
- Tiancheng Sun, Zexiang Xu, Xiuming Zhang, Sean Fanello, Christoph Rhemann, Paul Debevec, Yun-Ta Tsai, Jonathan T Barron, and Ravi Ramamoorthi. 2020. Light stage super-resolution: continuous high-frequency relighting. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1--12. Google ScholarDigital Library
- Thijs Vogels, Fabrice Rousselle, Brian McWilliams, Gerhard Röthlin, Alex Harvill, David Adler, Mark Meyer, and Jan Novák. 2018. Denoising with kernel prediction and asymmetric loss functions. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--15. Google ScholarDigital Library
- Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600--612. Google ScholarDigital Library
- Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, and Chenliang Xu. 2020. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3370--3379.Google Scholar
- Kai Xiao, Gabor Liktor, and Karthik Vaidyanathan. 2018. Coarse Pixel Shading with Temporal Supersampling. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (Montreal, Quebec, Canada) (I3D '18). Association for Computing Machinery, New York, NY, USA, Article 1, 7 pages. Google ScholarDigital Library
- Lei Xiao, Salah Nouri, Matt Chapman, Alexander Fix, Douglas Lanman, and Anton Kaplanyan. 2020. Neural Supersampling for Real-Time Rendering. ACM Trans. Graph. 39, 4, Article 142 (July 2020), 12 pages. Google ScholarDigital Library
- Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, and Jiebo Luo. 2019. Foreground-Aware Image Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Lei Yang, Shiqiu Liu, and Marco Salvi. 2020. A Survey of Temporal Antialiasing Techniques. Computer Graphics Forum 39, 2 (2020), 607--621.Google ScholarCross Ref
- Lei Yang, Diego Nehab, Pedro V. Sander, Pitchaya Sitthi-amorn, Jason Lawrence, and Hugues Hoppe. 2009. Amortized Supersampling. ACM Trans. Graph. 28, 5 (Dec. 2009), 1--12. Google ScholarDigital Library
- Lei Yang, Yu-Chiu Tse, Pedro V Sander, Jason Lawrence, Diego Nehab, Hugues Hoppe, and Clara L Wilkins. 2011. Image-based bidirectional scene reprojection. In Proceedings of the 2011 SIGGRAPH Asia Conference. 1--10. Google ScholarDigital Library
- Zili Yi, Qiang Tang, Shekoofeh Azizi, Daesik Jang, and Zhan Xu. 2020. Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting., 7505-7514 pages.Google Scholar
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision. 4471--4480.Google ScholarCross Ref
- Jiyang Yu and Ravi Ramamoorthi. 2020. Learning Video Stabilization Using Optical Flow. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8156--8164.Google Scholar
- Zheng Zeng, Shiqiu Liu, Jinglei Yang, Lu Wang, and Ling-Qi Yan. 2021. Temporally Reliable Motion Vectors for Real-time Ray Tracing (to appear). In Computer Graphics Forum (Proceedings of Eurographics 2021).Google Scholar
- Henning Zimmer, Fabrice Rousselle, Wenzel Jakob, Oliver Wang, David Adler, Wojciech Jarosz, Olga Sorkine-Hornung, and Alexander Sorkine-Hornung. 2015. Path-space Motion Estimation and Decomposition for Robust Animation Filtering. Computer Graphics Forum (Proceedings of EGSR) 34, 4 (June 2015). Google ScholarDigital Library
Index Terms
- ExtraNet: real-time extrapolated rendering for low-latency temporal supersampling
Recommendations
Temporally Stable Real-Time Joint Neural Denoising and Supersampling
Recent advances in ray tracing hardware bring real-time path tracing into reach, and ray traced soft shadows, glossy reflections, and diffuse global illumination are now common features in games. Nonetheless, ray budgets are still limited. This results ...
High-quality shear-warp volume rendering using efficient supersampling and pre-integration technique
ICAT'06: Proceedings of the 16th international conference on Advances in Artificial Reality and Tele-ExistenceAs shear-warp volume rendering is the fastest rendering method, image quality is not good as that of other high-quality rendering methods. In this paper, we propose two methods to improve the image quality of shear-warp volume rendering. First, the ...
Comments