0 votes
in Reinforcement Learning by
Name some advantages of using Temporal difference vs Monte Carlo methods for Reinforcement Learning

1 Answer

0 votes
by

Temporal Difference (TD) is the combination of both Monte Carlo (MC) and Dynamic Programming (DP) ideas. In this sense, like Monte Carlo methods, TD methods can learn directly from the experiences without the model of the environment, but on other hand, there are inherent advantages of TD-learning over Monte Carlo methods.

In MC methods:

We must wait until the end of the episode before the return is known.

We have high variance and low bias.

We don't exploit the Markov property.

In Temporal Difference learning:

We can learn online after every step and do not need to wait until the end of the episode.

We have low variance and some decent bias.

We exploit the Markov property.

The Markov property state that the future is independent of the past given the present, so in this sense temporal difference could be applied in environments that satisfy such assumption.

...