The values of Gamma and Lambda are hyperparameters in generalized temporal difference (TD) algorithms, and their choice can have a significant impact on the performance of the algorithm. Here are some general guidelines for selecting appropriate values:
Gamma (discount factor): Gamma determines the importance of future rewards in the learning process. A value of 1 means that future rewards are as important as immediate rewards, while a value of 0 means that only immediate rewards are important. In most cases, a value between 0.9 and 0.99 works well.
Lambda (eligibility trace decay rate): Lambda determines the rate at which the eligibility trace decays over time. A value of 1 means that the eligibility trace does not decay at all, while a value of 0 means that the eligibility trace decays very quickly. In most cases, a value between 0.5 and 0.9 works well.
There is no one-size-fits-all answer for the choice of Gamma and Lambda values, and the optimal values may depend on the specific problem and data being analyzed. It is recommended to start with a small range of values for each parameter and perform a grid search to find the optimal values through experimentation and evaluation of the algorithm's performance. Additionally, it is important to keep in mind that the choice of Gamma and Lambda values may also depend on the learning rate and other hyperparameters in the algorithm.