In a somewhat recent paper we introduced conditional Wasserstein Distances. They generalize a property that basically explains why KL works well for generative modelling, the chain rule of KL!
It says that if one wants to approximate the posterior, one can also minimize the KL between joints.
10 months ago