The ϵ-greedy policy is a policy that chooses the best action (i.e. the action associated with the highest value) with probability 1−ϵ ∈[0,1] and a random action with probability ϵ. The problem with ϵ-greedy is that when it chooses the random actions (i.e. with probability ϵ), it chooses them uniformly (i.e. it considers all actions equally good), even though certain actions (even excluding the currently best one) are better than others.
One solution is to this is to use Softmax Action Selection Rules, here we vary the action probabilities as a graded function of estimated value. In this way, the greedy action is still given the highest selection probability, but all the others are ranked and weighted according to their value estimates. The most common softmax method uses a Gibbs, or Boltzmann, distribution.