1) Q-learning is a popular algorithm used in reinforcement learning. It is based on the Bellman equation. In this algorithm, the agent tries to learn the policies that can provide the best actions to perform for maximining the rewards under particular circumstances. The agent learns these optimal policies from past experiences.
2) In Q-learning, the Q is used to represent the quality of the actions at each state, and the goal of the agent is to maximize the value of Q.