What is the difference between Off-Policy and On-Policy Learning?

Question

What is the difference between Off-Policy and On-Policy Learning?

1 Answer

sharadyadav1986 · Answer 1 · 2023-05-05T18:07:02+0000

To understand the difference between On-Policy Learning and Off-Policy Learning let us first take a look at two terms before moving further.

Target Policy: It is the policy that an agent is trying to learn i.e agent is learning value function for this policy.

Behavior Policy: It is the policy that is being used by an agent for action select i.e agent follows this policy to interact with the environment.

Now, On-Policy Learning :

We evaluate and improve the same policy which is being used to select actions. That means we will try to evaluate and improve the same policy that the agent is already using for action selection. In short , [Target Policy == Behavior Policy]. Some examples of On-Policy algorithms are Policy Iteration, Value Iteration, Monte Carlo for On-Policy, Sarsa, etc.

In Off-Policy Learning:

We evaluate and improve a policy that is different from the policy that is used for action selection. In short, [Target Policy != Behavior Policy]. Some examples of Off-Policy learning algorithms are Q learning, expected sarsa(can act in both ways), etc.

What is the difference between Off-Policy and On-Policy Learning?

Please log in or register to answer this question.

1 Answer

Top Trending Technologies Questions and Answers

HOT LINKS

TRANDING TECHNOLOGIES

CONTACT US

Follow us on Social Media