0 votes
in Reinforcement Learning by
What is the difference between Off-Policy and On-Policy Learning?

1 Answer

0 votes
by
To understand the difference between On-Policy Learning and Off-Policy Learning let us first take a look at two terms before moving further.

Target Policy: It is the policy that an agent is trying to learn i.e agent is learning value function for this policy.

Behavior Policy: It is the policy that is being used by an agent for action select i.e agent follows this policy to interact with the environment.

Now, On-Policy Learning :

We evaluate and improve the same policy which is being used to select actions. That means we will try to evaluate and improve the same policy that the agent is already using for action selection. In short , [Target Policy == Behavior Policy]. Some examples of On-Policy algorithms are Policy Iteration, Value Iteration, Monte Carlo for On-Policy, Sarsa, etc.

In Off-Policy Learning:

We evaluate and improve a policy that is different from the policy that is used for action selection. In short, [Target Policy != Behavior Policy]. Some examples of Off-Policy learning algorithms are Q learning, expected sarsa(can act in both ways), etc.
...