Question 1

What is reinforcement learning in simple terms?

Accepted Answer

Reinforcement learning trains an agent to make decisions by letting it act in an environment and rewarding good outcomes. Through trial and error the agent learns a strategy that maximises the total reward it collects over time.

Question 2

What is the difference between reinforcement and supervised learning?

Accepted Answer

Supervised learning is told the correct answer for each example, whereas reinforcement learning only receives a reward signal that says how good an outcome was, often much later. The agent must discover which actions led to the reward itself.

Question 3

What is a policy in reinforcement learning?

Accepted Answer

A policy is the agent's strategy — a rule that maps each state to the action it should take. The goal of training is to find a policy that maximises the expected cumulative reward across an episode.

Question 4

What is the exploration-exploitation tradeoff?

Accepted Answer

Exploitation means choosing the action currently believed to be best, while exploration means trying new actions to discover something better. An agent that only exploits can get stuck, so good RL balances trying new things against using what it already knows.

What Is Reinforcement Learning? How AI Learns From Rewards

What reinforcement learning is

The core framework

Policies and value functions

Exploration vs exploitation

Real-world examples