Rohan Tangri
Oct 14, 2021

--

So you'd generally want to use this kind of loss in a supervised setting when dealing with classification. In this setting, the "true" distribution is a categorical distribution only one probability of 1 and 0 everywhere else. The problem with this is that the reverse KL divergence isn't really defined for this kind of problem. Going back to the formula for KL Divergence, you'll see that you end up with terms dividing by 0! In reinforcement learning, we use mode seeking behaviour since we only really care about a single mode/action, we don't care about trying to model the second or third best action to take. Hope that makes sense 😊

--

--

Rohan Tangri
Rohan Tangri

Written by Rohan Tangri

AI PhD @ Imperial College London

No responses yet