Wednesday, October 16, 2024

Q-Learning Implementation for Rock, Paper, Scissors with Custom Rewards and Strategy Analysis

Q-Learning Rock Paper Scissors Tutorial | Reinforcement Learning Explained

Implementing Q-Learning for Rock Paper Scissors

This article explains how to train a Reinforcement Learning agent using Q-learning to play the classic game Rock Paper Scissors.

Instead of manually programming strategies, the agent learns through trial and error by observing rewards from its actions.

Introduction to Reinforcement Learning

Reinforcement Learning (RL) is a machine learning paradigm where an agent learns by interacting with an environment and receiving rewards or penalties.

Instead of learning from labeled datasets, the agent learns through experience.

Agent takes an action
Environment returns a reward
Agent updates its knowledge

Why Reinforcement Learning Matters

Reinforcement Learning powers many modern technologies such as:

Game-playing AI systems
Autonomous robotics
Recommendation engines
Financial trading algorithms

Game Mechanics

The Rock Paper Scissors game contains three actions:

Rock
Paper
Scissors

Each action has a deterministic outcome against another action.

Action	Beats
Rock	Scissors
Paper	Rock
Scissors	Paper

Reward Matrix Design

To train a reinforcement learning agent, we convert game outcomes into numerical rewards.

Outcome	Reward
Win	+1
Loss	-1
Tie	0

These rewards guide the learning algorithm toward optimal strategies.

Understanding Q-Learning

Q-learning is a reinforcement learning algorithm that learns the value of taking an action in a specific state.

The algorithm maintains a table called the Q-table.

The Q-table stores expected rewards for each state-action pair.

Q-Learning Formula


Q(s,a) = Q(s,a) + α [R + γ max(Q(s',a')) - Q(s,a)]

s = current state
a = action
α = learning rate
γ = discount factor
R = reward

Intuition Behind Q-Learning

The algorithm updates knowledge using:

Immediate reward
Best possible future reward

Over many iterations the values converge toward optimal behavior.

Python Implementation

Initialize Q-table


import numpy as np

import random

actions = ["Rock","Paper","Scissors"]

Q = np.zeros((3,3))

alpha = 0.1

gamma = 0.9

epsilon = 0.1

reward_matrix = [

[0,-1,1],

[1,0,-1],

[-1,1,0]

]

The Q-table starts with zeros, meaning the agent initially has no knowledge.

Training the Agent


for episode in range(10000):

    state = random.randint(0,2)

    if random.random() < epsilon:

        action = random.randint(0,2)

    else:

        action = np.argmax(Q[state])

    opponent = random.randint(0,2)

    reward = reward_matrix[action][opponent]

    Q[state][action] = Q[state][action] + 0.1 * (

        reward + 0.9 * np.max(Q[action]) - Q[state][action]

    )

During training the agent sometimes explores random actions to discover better strategies.

CLI Output Example


$ python rps_qlearning.py

Training started...

Episode 1000 complete

Episode 5000 complete

Episode 10000 complete

Final Q Table:

[[ 0.12 0.88 -0.44]

 [-0.32 0.21 0.92]

 [0.71 -0.51 0.08]]

Optimal Strategy Learned:

Rock -> Paper

Paper -> Scissors

Scissors -> Rock

Understanding the Q-Table

The Q-table stores expected rewards for each action.

State	Rock	Paper	Scissors
Rock	0.12	0.88	-0.44
Paper	-0.32	0.21	0.92
Scissors	0.71	-0.51	0.08

Interactive Demo

Play against a simple agent:

💡 Key Insights

Reinforcement Learning learns through rewards
Q-learning uses a table of expected action rewards
Exploration allows discovery of better strategies
Rock Paper Scissors demonstrates RL concepts clearly
Q-tables help interpret the learning process

Author: Subham

Pages

Wednesday, October 16, 2024

Implementing Q-Learning for Rock Paper Scissors

📚 Table of Contents

Introduction to Reinforcement Learning

Game Mechanics

Reward Matrix Design

Understanding Q-Learning

Q-Learning Formula

Python Implementation

Initialize Q-table

Training the Agent

CLI Output Example

Understanding the Q-Table

Interactive Demo

💡 Key Insights

Related Articles

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers