Showing posts with label pickle. Show all posts
Showing posts with label pickle. Show all posts

Thursday, October 17, 2024

Turn-Based Game Simulation Using Q-Learning for AI Decision Making


Q-Learning Explained Through a Turn-Based Game | Interactive Guide

๐ŸŽฎ Learning Q-Learning Through a Game

Let’s move away from formulas for a moment and think in terms of a game.

Two numbers exist: A = 12 and B = 51. Two players take turns — a human and an AI.

On each turn, a player chooses a number k and applies a move:

new_value = old_value - k × other_value

The objective is simple: force either A or B to become zero.

But beneath this simple rule lies a powerful idea — this game is a playground for reinforcement learning.


๐Ÿ“Œ Table of Contents


๐Ÿง  Game Intuition: More Than Just Numbers

At first glance, this looks like a mathematical game. But in reality, it is a decision-making problem under uncertainty.

Every move changes the state of the system. Every decision affects future possibilities.

The AI does not know the best move at the beginning. It learns through experience — by playing, failing, and improving.

๐Ÿ“– Think Deeper

This is exactly how humans learn strategy games. We don’t start with perfect knowledge — we experiment, observe outcomes, and adjust.


๐Ÿ”„ How the Game Actually Works

The game unfolds in rounds. Each round begins with the same initial values of A and B.

Players take turns. On each turn:

The player chooses:

1. A value of k 2. Whether to reduce A or B

Then the formula is applied, changing the state.

The moment either value becomes zero, the game ends.

What makes this interesting is that every move is not just a step — it is a strategic decision that shapes the entire future of the game.


๐Ÿค– How the AI Learns Over Time

The AI does not start intelligent. Initially, it behaves almost randomly.

Sometimes it explores — trying random values of k. Sometimes it exploits — using what it has learned so far.

This balance between exploration and exploitation is the core of Q-learning.

Over time, the AI begins to notice patterns:

“Certain moves lead to winning more often.” “Certain states are dangerous.”

And slowly, it becomes strategic.

๐Ÿ“– Why Exploration Matters

If the AI only used known strategies, it would never discover better ones. Exploration allows it to improve beyond its current knowledge.


๐Ÿ“Š Understanding the Q-Table (The AI's Memory)

The Q-table is where the AI stores its experience.

Each entry answers a question:

"If I am in this state, and I take this action, how good is it?"

The state is defined by the current values of A and B. The action is the chosen k and the variable being reduced.

After every move, the AI updates this table.

If a move leads to winning, it becomes more valuable. If it leads to losing, its value decreases.

Over many games, this table transforms from random guesses into a decision guide.


๐Ÿ’ป Code Example

import random

A, B = 12, 51
exploration_prob = 0.3

def choose_action(state, q_table):
    if random.random() < exploration_prob:
        return random.randint(1, 5)
    return max(q_table.get(state, {1:0}), key=q_table.get(state, {1:0}).get)

This snippet shows how the AI decides between exploring and exploiting.


๐Ÿ–ฅ️ Sample Game Output

Game Start: A=12, B=51

AI chooses k=2 → Reduces B → New B=27
Human chooses k=1 → Reduces A → New A= -15

Game Ends

Winner: AI

Each move updates the state — and the AI learns from the result.


๐Ÿ’ก Key Takeaways

This simple game reveals a powerful truth:

Learning is not about knowing the answer — it is about improving decisions over time.

Q-learning allows machines to:

Understand consequences Adapt strategies Improve through experience

And most importantly, learn without being explicitly told what is correct.


๐Ÿ”— Related Articles


๐Ÿ“Œ Final Thought

What looks like a small game is actually a model of intelligence.

The AI is not just playing — it is learning how to think.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts