๐งต Accepted at
@iclr-conf.bsky.social!
Target networks stabilize bootstrapping in RL ๐ก๏ธ
But induce slow-moving targets ๐ข
Online networks adapt fast โก
But can diverge with function approximation ๐ฅ
๐ ๐๐ก๐ง๐ข ๐ฟ uses the online network ๐ผ๐ป๐น๐ ๐ถ๐ณ ๐ถ๐ ๐ฐ๐ฎ๐ป โ yielding faster ๐ข๐ฏ๐ฅ more stable RL.
Hereโs how ๐