Implementing Q-Learning for Rock Paper Scissors
This article explains how to train a Reinforcement Learning agent using Q-learning to play the classic game Rock Paper Scissors.
Instead of manually programming strategies, the agent learns through trial and error by observing rewards from its actions.
๐ Table of Contents
- Introduction to Reinforcement Learning
- Game Mechanics
- Reward Matrix Design
- Understanding Q-Learning
- Python Implementation
- Training the Agent
- CLI Training Output
- Understanding the Q-Table
- Interactive Demo
- Key Insights
- Related Articles
Introduction to Reinforcement Learning
Reinforcement Learning (RL) is a machine learning paradigm where an agent learns by interacting with an environment and receiving rewards or penalties.
Instead of learning from labeled datasets, the agent learns through experience.
- Agent takes an action
- Environment returns a reward
- Agent updates its knowledge
Why Reinforcement Learning Matters
Reinforcement Learning powers many modern technologies such as:
- Game-playing AI systems
- Autonomous robotics
- Recommendation engines
- Financial trading algorithms
Game Mechanics
The Rock Paper Scissors game contains three actions:
- Rock
- Paper
- Scissors
Each action has a deterministic outcome against another action.
| Action | Beats |
|---|---|
| Rock | Scissors |
| Paper | Rock |
| Scissors | Paper |
Reward Matrix Design
To train a reinforcement learning agent, we convert game outcomes into numerical rewards.
| Outcome | Reward |
|---|---|
| Win | +1 |
| Loss | -1 |
| Tie | 0 |
These rewards guide the learning algorithm toward optimal strategies.
Understanding Q-Learning
Q-learning is a reinforcement learning algorithm that learns the value of taking an action in a specific state.
The algorithm maintains a table called the Q-table.
The Q-table stores expected rewards for each state-action pair.
Q-Learning Formula
Q(s,a) = Q(s,a) + ฮฑ [R + ฮณ max(Q(s',a')) - Q(s,a)]
- s = current state
- a = action
- ฮฑ = learning rate
- ฮณ = discount factor
- R = reward
Intuition Behind Q-Learning
The algorithm updates knowledge using:
- Immediate reward
- Best possible future reward
Over many iterations the values converge toward optimal behavior.
Python Implementation
Initialize Q-table
import numpy as np import random actions = ["Rock","Paper","Scissors"] Q = np.zeros((3,3)) alpha = 0.1 gamma = 0.9 epsilon = 0.1 reward_matrix = [ [0,-1,1], [1,0,-1], [-1,1,0] ]
The Q-table starts with zeros, meaning the agent initially has no knowledge.
Training the Agent
for episode in range(10000):
state = random.randint(0,2)
if random.random() < epsilon:
action = random.randint(0,2)
else:
action = np.argmax(Q[state])
opponent = random.randint(0,2)
reward = reward_matrix[action][opponent]
Q[state][action] = Q[state][action] + 0.1 * (
reward + 0.9 * np.max(Q[action]) - Q[state][action]
)
During training the agent sometimes explores random actions to discover better strategies.
CLI Output Example
$ python rps_qlearning.py Training started... Episode 1000 complete Episode 5000 complete Episode 10000 complete Final Q Table: [[ 0.12 0.88 -0.44] [-0.32 0.21 0.92] [0.71 -0.51 0.08]] Optimal Strategy Learned: Rock -> Paper Paper -> Scissors Scissors -> Rock
Understanding the Q-Table
The Q-table stores expected rewards for each action.
| State | Rock | Paper | Scissors |
|---|---|---|---|
| Rock | 0.12 | 0.88 | -0.44 |
| Paper | -0.32 | 0.21 | 0.92 |
| Scissors | 0.71 | -0.51 | 0.08 |
Interactive Demo
Play against a simple agent:
๐ก Key Insights
- Reinforcement Learning learns through rewards
- Q-learning uses a table of expected action rewards
- Exploration allows discovery of better strategies
- Rock Paper Scissors demonstrates RL concepts clearly
- Q-tables help interpret the learning process
Related Articles
- Natural Language Generation with Reinforcement Learning
- Scalar Rewards in Reinforcement Learning
- Hammer Lifestyle Shark Tank Case Study
Author: Subham
No comments:
Post a Comment