Showing posts with label returns. Show all posts
Showing posts with label returns. Show all posts

Wednesday, October 16, 2024

Q-Learning Implementation for Rock, Paper, Scissors with Custom Rewards and Strategy Analysis


Q-Learning Rock Paper Scissors Tutorial | Reinforcement Learning Explained

Implementing Q-Learning for Rock Paper Scissors

This article explains how to train a Reinforcement Learning agent using Q-learning to play the classic game Rock Paper Scissors.

Instead of manually programming strategies, the agent learns through trial and error by observing rewards from its actions.


๐Ÿ“š Table of Contents


Introduction to Reinforcement Learning

Reinforcement Learning (RL) is a machine learning paradigm where an agent learns by interacting with an environment and receiving rewards or penalties.

Instead of learning from labeled datasets, the agent learns through experience.

  • Agent takes an action
  • Environment returns a reward
  • Agent updates its knowledge
Why Reinforcement Learning Matters

Reinforcement Learning powers many modern technologies such as:

  • Game-playing AI systems
  • Autonomous robotics
  • Recommendation engines
  • Financial trading algorithms

Game Mechanics

The Rock Paper Scissors game contains three actions:

  • Rock
  • Paper
  • Scissors

Each action has a deterministic outcome against another action.

Action Beats
Rock Scissors
Paper Rock
Scissors Paper

Reward Matrix Design

To train a reinforcement learning agent, we convert game outcomes into numerical rewards.

Outcome Reward
Win +1
Loss -1
Tie 0

These rewards guide the learning algorithm toward optimal strategies.


Understanding Q-Learning

Q-learning is a reinforcement learning algorithm that learns the value of taking an action in a specific state.

The algorithm maintains a table called the Q-table.

The Q-table stores expected rewards for each state-action pair.

Q-Learning Formula


Q(s,a) = Q(s,a) + ฮฑ [R + ฮณ max(Q(s',a')) - Q(s,a)]

  • s = current state
  • a = action
  • ฮฑ = learning rate
  • ฮณ = discount factor
  • R = reward
Intuition Behind Q-Learning

The algorithm updates knowledge using:

  • Immediate reward
  • Best possible future reward

Over many iterations the values converge toward optimal behavior.


Python Implementation

Initialize Q-table


import numpy as np

import random

actions = ["Rock","Paper","Scissors"]

Q = np.zeros((3,3))

alpha = 0.1

gamma = 0.9

epsilon = 0.1

reward_matrix = [

[0,-1,1],

[1,0,-1],

[-1,1,0]

]

The Q-table starts with zeros, meaning the agent initially has no knowledge.


Training the Agent


for episode in range(10000):

    state = random.randint(0,2)

    if random.random() < epsilon:

        action = random.randint(0,2)

    else:

        action = np.argmax(Q[state])

    opponent = random.randint(0,2)

    reward = reward_matrix[action][opponent]

    Q[state][action] = Q[state][action] + 0.1 * (

        reward + 0.9 * np.max(Q[action]) - Q[state][action]

    )

During training the agent sometimes explores random actions to discover better strategies.


CLI Output Example


$ python rps_qlearning.py

Training started...

Episode 1000 complete

Episode 5000 complete

Episode 10000 complete

Final Q Table:

[[ 0.12 0.88 -0.44]

 [-0.32 0.21 0.92]

 [0.71 -0.51 0.08]]

Optimal Strategy Learned:

Rock -> Paper

Paper -> Scissors

Scissors -> Rock


Understanding the Q-Table

The Q-table stores expected rewards for each action.

State Rock Paper Scissors
Rock 0.12 0.88 -0.44
Paper -0.32 0.21 0.92
Scissors 0.71 -0.51 0.08

Interactive Demo

Play against a simple agent:


๐Ÿ’ก Key Insights

  • Reinforcement Learning learns through rewards
  • Q-learning uses a table of expected action rewards
  • Exploration allows discovery of better strategies
  • Rock Paper Scissors demonstrates RL concepts clearly
  • Q-tables help interpret the learning process


Author: Subham

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts