๐ฎ Learning Q-Learning Through a Game
Let’s move away from formulas for a moment and think in terms of a game.
Two numbers exist: A = 12 and B = 51. Two players take turns — a human and an AI.
On each turn, a player chooses a number k and applies a move:
new_value = old_value - k × other_value
The objective is simple: force either A or B to become zero.
But beneath this simple rule lies a powerful idea — this game is a playground for reinforcement learning.
๐ Table of Contents
- Game Intuition
- How the Game Progresses
- How the AI Learns
- Understanding the Q-Table
- Code Example
- Game Output
- Key Takeaways
๐ง Game Intuition: More Than Just Numbers
At first glance, this looks like a mathematical game. But in reality, it is a decision-making problem under uncertainty.
Every move changes the state of the system. Every decision affects future possibilities.
The AI does not know the best move at the beginning. It learns through experience — by playing, failing, and improving.
๐ Think Deeper
This is exactly how humans learn strategy games. We don’t start with perfect knowledge — we experiment, observe outcomes, and adjust.
๐ How the Game Actually Works
The game unfolds in rounds. Each round begins with the same initial values of A and B.
Players take turns. On each turn:
The player chooses:
1. A value of k 2. Whether to reduce A or B
Then the formula is applied, changing the state.
The moment either value becomes zero, the game ends.
What makes this interesting is that every move is not just a step — it is a strategic decision that shapes the entire future of the game.
๐ค How the AI Learns Over Time
The AI does not start intelligent. Initially, it behaves almost randomly.
Sometimes it explores — trying random values of k. Sometimes it exploits — using what it has learned so far.
This balance between exploration and exploitation is the core of Q-learning.
Over time, the AI begins to notice patterns:
“Certain moves lead to winning more often.” “Certain states are dangerous.”
And slowly, it becomes strategic.
๐ Why Exploration Matters
If the AI only used known strategies, it would never discover better ones. Exploration allows it to improve beyond its current knowledge.
๐ Understanding the Q-Table (The AI's Memory)
The Q-table is where the AI stores its experience.
Each entry answers a question:
"If I am in this state, and I take this action, how good is it?"
The state is defined by the current values of A and B. The action is the chosen k and the variable being reduced.
After every move, the AI updates this table.
If a move leads to winning, it becomes more valuable. If it leads to losing, its value decreases.
Over many games, this table transforms from random guesses into a decision guide.
๐ป Code Example
import random
A, B = 12, 51
exploration_prob = 0.3
def choose_action(state, q_table):
if random.random() < exploration_prob:
return random.randint(1, 5)
return max(q_table.get(state, {1:0}), key=q_table.get(state, {1:0}).get)
This snippet shows how the AI decides between exploring and exploiting.
๐ฅ️ Sample Game Output
Game Start: A=12, B=51 AI chooses k=2 → Reduces B → New B=27 Human chooses k=1 → Reduces A → New A= -15 Game Ends Winner: AI
Each move updates the state — and the AI learns from the result.
๐ก Key Takeaways
This simple game reveals a powerful truth:
Learning is not about knowing the answer — it is about improving decisions over time.
Q-learning allows machines to:
Understand consequences Adapt strategies Improve through experience
And most importantly, learn without being explicitly told what is correct.
๐ Related Articles
- How Thresholds Shape Decisions
- Hierarchy in Reinforcement Learning
- NLP with Reinforcement Learning
- Decision Making Strategies
- Pruning Decision Trees
๐ Final Thought
What looks like a small game is actually a model of intelligence.
The AI is not just playing — it is learning how to think.