0 votes
in Reinforcement Learning by
Can you think of an example of an Epsilon-Greedy Policy in real life?

1 Answer

0 votes
by
An Epsilon-Greedy Policy allows the agent to decide according to a certain threshold, between an action that maximizes a Q-value or over a random action that it may maximize the Q-value.

For example, say there are many routes from our work to home and we have explored only two routes so far. Thus, to reach home, we can select the route that takes us home most quickly out of the two routes we have explored (this is our Q-value). However, there are still many other routes that we have not explored yet that might be even better than our current optimal route. The question is whether we should explore new routes (exploration) or whether we should always use our current optimal route (exploitation).

In such context, we introduce a policy called the epsilon-greedy policy:

With a probability epsilon, we explore different actions of ways to go home from work (exploration).

With a probability 1-epsilon, we choose an action that has the maximum Q value, that is, the route that takes us to home in the quickest way (exploitation).

Now, before selecting an action, a random number r in the range of [0,1] is selected. If that r is larger than epsilon, we use the well-known route that will take us home more quickly; but if r < epsilon, a random action is selected and we explore other routes. If we follow these rules then we have implemented an epsilon-greedy policy.

Related questions

0 votes
asked May 6, 2023 in Reinforcement Learning by Robin
0 votes
asked Mar 3, 2020 in DevOps by rajeshsharma
...