ReFrame

Abstract

Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4x speedup on average with negligible quality loss in three real-time rendering tasks.

We exploit temporal redundancy in neural networks for real-time rendering workloads by caching intermediate layer features to reuse in subsequent frames and avoid unnecessary computations, similar to DeepCache [1]. We can cache features for U-Net and U-Net++ architectures as well as encoder-decoder style networks that include feature concatenations.

Caching features for U-Net and U-Net++ architectures.

Caching features for encoder-decoder feature concatenation.

We dynamically refresh the contents when it becomes stale by comparing the inputs of the current frame against the previous inputs used to generate the cached features. This approach requires more computation than the fixed refresh schedule proposed by DeepCache [1], but adapts well to the unpredictable nature of real-time rendering and prevents sudden quality drops. In fact, this is one of many key differences between diffusion models and real-time rendering.

Diffusion	Rendering
Often applies a U-Net / Encoder-Decoder architecture.
Relies on repeated forward passes.
Features high temporal redundancy between subsequent forward-pass inferences.
Total number and behavior of forward passes is predetermined.	Total number and behavior of forward passes are random, dependent on real-time user inputs.
Errors from one forward pass can be corrected by other forward passes and does not affect final output.	Errors from each forward pass is directly visible to the user and accumulates.
Inference time is best-effort but quality is important.	Quality is best-effort but inference time is fixed.

Results

ReFrame applies to 72% of inferences on average with a low sensitivity setting, resulting in a 40% reduction in FLOPs and 1.6× speedup in inference latency.

Policy	Workload	Scene	Skipped Frames ↑	Eliminated Enc-Dec FLOPs ↑	Speedup ↑
Delta_H	FE	Sun Temple	50%	27%	1.42
		Cyber Punk	30%	16%	1.10
		Asian Village	35%	19%	1.24
	SS	Sun Temple	40%	29%	1.30
	IC	Garden Chair	13%	6%	1.05
Delta_L	FE	Sun Temple	80%	43%	1.72
		Cyber Punk	60%	32%	1.49
		Asian Village	60%	32%	1.55
	SS	Sun Temple	80%	57%	1.85
	IC	Garden Chair	79%	34%	1.20

Frame extrapolation example.

Frame extrapolation (ExtraNet) Asian Village

Supersampling example.

Image composition example.

FE: Frame extrapolation, SS: Supersampling, IC: Image composition

Bibtex

@inproceedings{liu2025reframe,
    title={ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering},
    author={Lufei Liu and Tor M. Aamodt},
    booktitle={Proceedings of International Conference on Machine Learning (ICML)},
    year={2025},
    organization={PMLR},
}

[1] Xinyin Ma, Gongfan Fang, and Xinchao Wang. "DeepCache: Accelerating Diffusion Models for Free." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.

ReFrame:

Layer Caching for Accelerated Inference in Real-Time Rendering

ICML 2025

Lufei Liu, Tor M. Aamodt

University of British Columbia

Abstract

ReFrame

Results

Bibtex