Graphics rendering applications increasingly leverage neural networks to improve frame rates while maintaining image quality in tasks such as denoising, supersampling, and frame extrapolation. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend upon this idea as ReFrame and explore different caching policies to optimize trade-offs between quality and performance for rendering workloads. By reducing the latency of the neural network inferences, we can allocate more resources to high-quality rendering, such as ray tracing, improving both the frame rate and final image quality. ReFrame can be applied to a variety of encoder- decoder style networks commonly found in rendering pipelines. Experimental results show that we can achieve 1.4× speedup on average with negligible quality loss in three real-time rendering tasks. We outperform DeltaCNN [1], another method to exploit frame similarity, in these tasks and can further improve inference time when combined.
We exploit temporal redundancy in neural networks for real-time rendering workloads by caching intermediate layer features to reuse in subsequent frames and avoid unnecessary computations, similar to DeepCache [2]. We can cache features for U-Net and U-Net++ architectures as well as encoder-decoder style networks that include feature concatenations.
Caching features for U-Net and U-Net++ architectures.
Caching features for encoder-decoder feature concatenation.
We dynamically refresh the contents when it becomes stale by comparing the inputs of the current frame against the previous inputs used to generate the cached features. This approach requires more computation than the fixed refresh schedule proposed by DeepCache [2], but adapts well to the unpredictable nature of real-time rendering and prevents sudden quality drops. In fact, this is one of many key differences between diffusion models and real-time rendering.
Diffusion | Rendering |
---|---|
Often applies a U-Net / Encoder-Decoder architecture. | |
Relies on repeated forward passes. | |
Features high temporal redundancy between subsequent forward-pass inferences. | |
Total number and behavior of forward passes is predetermined. | Total number and bahavior of forward passes are random, dependent on real-time user inputs. |
Errors from one forward pass can be corrected by other forward passes and does not affect final output. | Errors from each forward pass is directly visible to the user and accumulates. |
Inference time is best-effort but quality is important. | Quality is best-effort but inference time is fixed. |
ReFrame applies to 72% of inferences on average with a low sensitivity setting, resulting in a 40% reduction in FLOPs and 1.6× speedup in inference latency.
Policy | Workload | Scene | Skipped Frames ↑ |
Eliminated Enc-Dec FLOPs ↑ |
Speedup ↑ |
---|---|---|---|---|---|
Delta_H | FE | Sun Temple | 50% | 27% | 1.42 |
Cyber Punk | 30% | 16% | 1.10 | ||
Asian Village | 35% | 19% | 1.24 | ||
SS | Sun Temple | 40% | 29% | 1.30 | |
IC | Garden Chair | 13% | 6% | 1.05 | |
Delta_L | FE | Sun Temple | 80% | 43% | 1.72 |
Cyber Punk | 60% | 32% | 1.49 | ||
Asian Village | 60% | 32% | 1.55 | ||
SS | Sun Temple | 80% | 57% | 1.85 | |
IC | Garden Chair | 79% | 34% | 1.20 |
Frame extrapolation example.
Supersampling example.
Image composition example.
@inproceedings{liu2025reframe,
title={ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering},
author={Lufei Liu and Tor M. Aamodt},
booktitle={Proceedings of International Conference on Machine Learning (ICML)},
year={2025},
organization={PMLR},
}
[1] Parger, Mathias, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, and Markus
Steinberger. "DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos." Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022.
[2] Ma, Xinyin, Gongfan Fang, and Xinchao Wang. "DeepCache: Accelerating Diffusion Models for Free."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.