The Q-learning is a Reinforcement Learning algorithm in which an agent tries to learn the optimal policy from its past experiences with the environment. The past experiences of an agent are a sequence of state-action-rewards:
the Agent(a0) was in State (s0) and on performing an Action (a0), which resulted in receiving a Reward (r1) and thus being updated to State (s1).