The RL agent works based on the theory of reward maximization. This is exactly why the RL agent must be trained in such a way that, he takes the best action so that the reward is maximum.
The collective rewards at a particular time with the respective action is written as:
The above equation is an ideal representation of rewards. Generally, things don’t work out like this while summing up the cumulative rewards.
Let me explain this with a small game. In the figure you can see a fox, some meat and a tiger.
Our RL agent is the fox and his end goal is to eat the maximum amount of meat before being eaten by the tiger.
Since this fox is a clever fellow, he eats the meat that is closer to him, rather than the meat which is close to the tiger, because the closer he is to the tiger, the higher are his chances of getting killed.
As a result, the rewards near the tiger, even if they are bigger meat chunks, will be discounted. This is done because of the uncertainty factor, that the tiger might kill the fox.
The next thing to understand is, how discounting of rewards work?
To do this, we define a discount rate called gamma. The value of gamma is between 0 and 1. The smaller the gamma, the larger the discount and vice versa.