Game engines are O(n) in scene complexity. Diffusion models are O(1), so the same cost whether you’re rendering an empty room or a million polygons. What if you made the engine differentiable and optimized the diffusion model against it directly, rather than sampled frames? Has anyone tried it?
5 months ago