Abstract

Graphics rendering applications increasingly leverage neural networks to improve frame rates while maintaining image quality in tasks such as denoising, supersampling, and frame extrapolation. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend upon this idea as ReFrame and explore different caching policies to optimize trade-offs between quality and performance for rendering workloads. By reducing the latency of the neural network inferences, we can allocate more resources to high-quality rendering, such as ray tracing, improving both the frame rate and final image quality. ReFrame can be applied to a variety of encoder- decoder style networks commonly found in rendering pipelines. Experimental results show that we can achieve 1.4× speedup on average with negligible quality loss in three real-time rendering tasks. We outperform DeltaCNN [1], another method to exploit frame similarity, in these tasks and can further improve inference time when combined.

ReFrame

We exploit temporal redundancy in neural networks for real-time rendering workloads by caching intermediate layer features to reuse in subsequent frames and avoid unnecessary computations, similar to DeepCache [2]. We can cache features for U-Net and U-Net++ architectures as well as encoder-decoder style networks that include feature concatenations.

Caching features for U-Net and U-Net++ architectures.

ReFrame for U-Net and U-Net++

Caching features for encoder-decoder feature concatenation.

ReFrame for concatenations

We dynamically refresh the contents when it becomes stale by comparing the inputs of the current frame against the previous inputs used to generate the cached features. This approach requires more computation than the fixed refresh schedule proposed by DeepCache [2], but adapts well to the unpredictable nature of real-time rendering and prevents sudden quality drops. In fact, this is one of many key differences between diffusion models and real-time rendering.

Diffusion Rendering
Often applies a U-Net / Encoder-Decoder architecture.
Relies on repeated forward passes.
Features high temporal redundancy between subsequent forward-pass inferences.
Total number and behavior of forward passes is predetermined. Total number and bahavior of forward passes are random, dependent on real-time user inputs.
Errors from one forward pass can be corrected by other forward passes and does not affect final output. Errors from each forward pass is directly visible to the user and accumulates.
Inference time is best-effort but quality is important. Quality is best-effort but inference time is fixed.

Results

ReFrame applies to 72% of inferences on average with a low sensitivity setting, resulting in a 40% reduction in FLOPs and 1.6× speedup in inference latency.

Policy Workload Scene Skipped
Frames ↑
Eliminated
Enc-Dec FLOPs ↑
Speedup ↑
Delta_H FE Sun Temple 50% 27% 1.42
Cyber Punk 30% 16% 1.10
Asian Village 35% 19% 1.24
SS Sun Temple 40% 29% 1.30
IC Garden Chair 13% 6% 1.05
Delta_L FE Sun Temple 80% 43% 1.72
Cyber Punk 60% 32% 1.49
Asian Village 60% 32% 1.55
SS Sun Temple 80% 57% 1.85
IC Garden Chair 79% 34% 1.20

Frame extrapolation example.

Frame extrapolation (ExtraNet) Asian Village

Supersampling example.

Supersampling Sun Temple

Image composition example.

Implicit depth Garden Chair
FE: Frame extrapolation, SS: Supersampling, IC: Image composition

Bibtex

@inproceedings{liu2025reframe,
    title={ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering},
    author={Lufei Liu and Tor M. Aamodt},
    booktitle={Proceedings of International Conference on Machine Learning (ICML)},
    year={2025},
    organization={PMLR},
}
[1] Parger, Mathias, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, and Markus Steinberger. "DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. [2] Ma, Xinyin, Gongfan Fang, and Xinchao Wang. "DeepCache: Accelerating Diffusion Models for Free." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.