This blog explores data science and networking, combining theoretical concepts with practical implementations. Topics include routing protocols, network operations, and data-driven problem solving, presented with clarity and reproducibility in mind.
Tuesday, December 10, 2024
A Beginner’s Guide to LSTD and LSTDQ in Reinforcement Learning
Friday, October 25, 2024
A Beginner's Guide to the Median Elimination Algorithm in Reinforcement Learning
๐ฏ Median Elimination Algorithm in Reinforcement Learning (RL)
Reinforcement Learning is about making an agent learn the best decisions through trial and error. One powerful strategy for efficiently selecting the best action is the Median Elimination Algorithm.
This guide explains everything step-by-step in a simple, intuitive way with math, examples, and practical insights.
๐ Table of Contents
- 1. The Problem in RL
- 2. Core Idea of Median Elimination
- 3. Step-by-Step Algorithm
- 4. Mathematical Intuition (Simple)
- 5. Real-Life Example
- 6. Code Example
- 7. CLI Simulation Output
- 8. Why It Works
- 9. Limitations
- 10. Key Takeaways
❗ 1. The Problem in Reinforcement Learning
In RL, an agent must choose between multiple actions (called arms in bandit problems).
The challenge:
- Too many options = expensive exploration
- Need to quickly find the best action
๐ก 2. Core Idea of Median Elimination
Instead of testing everything equally, we repeatedly:
- Estimate performance
- Find the median reward
- Eliminate weaker half
This is similar to narrowing choices in a competition round by round.
⚙️ 3. Step-by-Step Algorithm
Step 1: Initialization
- Start with all arms
- Set accuracy parameters:
- ฮต (epsilon) → how close we want to be to best arm
- ฮด (delta) → confidence level
Step 2: Sampling
Pull each arm multiple times and compute average reward:
\[ \hat{r_i} = \frac{1}{n} \sum_{t=1}^{n} r_{i,t} \]
๐ This gives estimated reward for each arm.
Step 3: Compute Median
Sort all rewards and find median:
\[ median = middle\ value\ of\ sorted\ rewards \]
๐ Arms below median are weaker candidates.
Step 4: Elimination
- Keep only arms ≥ median
- Discard the rest
Step 5: Repeat
Repeat sampling → median → elimination until one arm remains.
๐ 4. Mathematical Intuition (Easy Version)
Confidence Guarantee
The algorithm ensures:
\[ P(\text{chosen arm is within } \epsilon \text{ of best}) \ge 1 - \delta \]
Simple Explanation:
- ฮต (epsilon): how wrong we can tolerate
- ฮด (delta): probability of failure
๐ฐ 5. Real-Life Example (Slot Machines)
Imagine 10 slot machines:
- Play each machine a few times
- Calculate average reward
- Find median performer
- Remove weaker machines
- Repeat until best machine remains
This avoids wasting time on bad machines.
๐ป 6. Code Example
import numpy as np
arms = [0.2, 0.5, 0.7, 0.4, 0.9]
epsilon = 0.1
delta = 0.1
def sample(arm, n=10):
return np.mean(np.random.binomial(1, arm, n))
# simple simulation
estimates = [sample(a) for a in arms]
median = np.median(estimates)
filtered = [a for a, est in zip(arms, estimates) if est >= median]
print("Remaining arms:", filtered)
๐ฅ️ 7. CLI Simulation Output
Click to Expand
Initial Arms: [0.2, 0.5, 0.7, 0.4, 0.9] Round 1: Estimates: [0.2, 0.6, 0.8, 0.3, 0.9] Median: 0.6 Remaining: [0.5, 0.7, 0.9] Round 2: Estimates: [0.5, 0.7, 0.9] Median: 0.7 Remaining: [0.7, 0.9] Round 3: Remaining best arm: 0.9
๐ 8. Why It Works
- Reduces computation drastically
- Focuses only on promising actions
- Balances exploration and exploitation
⚠️ 9. Limitations
- Depends heavily on ฮต and ฮด
- Not efficient for very small problems
- Needs repeated sampling (still costly in some cases)
๐ก 10. Key Takeaways
- Median Elimination is a smart filtering algorithm
- Works by repeatedly removing weaker half
- Uses probability guarantees (ฮต, ฮด)
- Efficient for large action spaces
๐ฏ Final Summary
Median Elimination is like narrowing down contestants in a competition until only the best remains. It is simple, powerful, and widely used in reinforcement learning problems where decisions must be made efficiently under uncertainty.
Featured Post
How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing
The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...
Popular Posts
-
EIGRP Stub Routing In complex network environments, maintaining stability and efficienc...
-
Modern NTP Practices – Interactive Guide Modern NTP Practices – Interactive Guide Network Time Protocol (NTP)...
-
DeepID-Net and Def-Pooling Layer Explained | Interactive Guide DeepID-Net and Def-Pooling Layer Explaine...
-
GET VPN COOP Explained Simply: Key Server Redundancy Made Easy GET VPN COOP Explained (Simple + Practica...
-
Modern Cisco ASA Troubleshooting (Post-9.7) Modern Cisco ASA Troubleshooting (Post-9.7) With evolving netwo...
-
When Machine Learning Looks Right but Goes Wrong When Machine Learning Looks Right but Goes Wrong Picture a f...
-
Latent Space & Vector Arithmetic Explained | AI Image Transformations Latent Space & Vector Arit...
-
Process Synchronization – Interactive OS Guide Process Synchronization – Interactive Operating Systems Guide In an operati...
-
Event2Mind – Teaching Machines Human Intent and Emotion Event2Mind: Teaching Machines to Understand Human Intent...
-
Linear Regression vs Classification – Interactive Guide Linear Regression vs Classification – Interactive Theory Guide Line...