Are there any problems when using the Epsilon-Greedy method to find the Optimal Policy?

Question

Are there any problems when using the Epsilon-Greedy method to find the Optimal Policy?

1 Answer

sharadyadav1986 · Answer 1 · 2023-05-05T18:08:38+0000

The ϵ-greedy policy is a policy that chooses the best action (i.e. the action associated with the highest value) with probability 1−ϵ ∈[0,1] and a random action with probability ϵ. The problem with ϵ-greedy is that when it chooses the random actions (i.e. with probability ϵ), it chooses them uniformly (i.e. it considers all actions equally good), even though certain actions (even excluding the currently best one) are better than others.

One solution is to this is to use Softmax Action Selection Rules, here we vary the action probabilities as a graded function of estimated value. In this way, the greedy action is still given the highest selection probability, but all the others are ranked and weighted according to their value estimates. The most common softmax method uses a Gibbs, or Boltzmann, distribution.

Are there any problems when using the Epsilon-Greedy method to find the Optimal Policy?

Please log in or register to answer this question.

1 Answer