๐ฏ Median Elimination Algorithm in Reinforcement Learning (RL)
Reinforcement Learning is about making an agent learn the best decisions through trial and error. One powerful strategy for efficiently selecting the best action is the Median Elimination Algorithm.
This guide explains everything step-by-step in a simple, intuitive way with math, examples, and practical insights.
๐ Table of Contents
- 1. The Problem in RL
- 2. Core Idea of Median Elimination
- 3. Step-by-Step Algorithm
- 4. Mathematical Intuition (Simple)
- 5. Real-Life Example
- 6. Code Example
- 7. CLI Simulation Output
- 8. Why It Works
- 9. Limitations
- 10. Key Takeaways
❗ 1. The Problem in Reinforcement Learning
In RL, an agent must choose between multiple actions (called arms in bandit problems).
The challenge:
- Too many options = expensive exploration
- Need to quickly find the best action
๐ก 2. Core Idea of Median Elimination
Instead of testing everything equally, we repeatedly:
- Estimate performance
- Find the median reward
- Eliminate weaker half
This is similar to narrowing choices in a competition round by round.
⚙️ 3. Step-by-Step Algorithm
Step 1: Initialization
- Start with all arms
- Set accuracy parameters:
- ฮต (epsilon) → how close we want to be to best arm
- ฮด (delta) → confidence level
Step 2: Sampling
Pull each arm multiple times and compute average reward:
\[ \hat{r_i} = \frac{1}{n} \sum_{t=1}^{n} r_{i,t} \]
๐ This gives estimated reward for each arm.
Step 3: Compute Median
Sort all rewards and find median:
\[ median = middle\ value\ of\ sorted\ rewards \]
๐ Arms below median are weaker candidates.
Step 4: Elimination
- Keep only arms ≥ median
- Discard the rest
Step 5: Repeat
Repeat sampling → median → elimination until one arm remains.
๐ 4. Mathematical Intuition (Easy Version)
Confidence Guarantee
The algorithm ensures:
\[ P(\text{chosen arm is within } \epsilon \text{ of best}) \ge 1 - \delta \]
Simple Explanation:
- ฮต (epsilon): how wrong we can tolerate
- ฮด (delta): probability of failure
๐ฐ 5. Real-Life Example (Slot Machines)
Imagine 10 slot machines:
- Play each machine a few times
- Calculate average reward
- Find median performer
- Remove weaker machines
- Repeat until best machine remains
This avoids wasting time on bad machines.
๐ป 6. Code Example
import numpy as np
arms = [0.2, 0.5, 0.7, 0.4, 0.9]
epsilon = 0.1
delta = 0.1
def sample(arm, n=10):
return np.mean(np.random.binomial(1, arm, n))
# simple simulation
estimates = [sample(a) for a in arms]
median = np.median(estimates)
filtered = [a for a, est in zip(arms, estimates) if est >= median]
print("Remaining arms:", filtered)
๐ฅ️ 7. CLI Simulation Output
Click to Expand
Initial Arms: [0.2, 0.5, 0.7, 0.4, 0.9] Round 1: Estimates: [0.2, 0.6, 0.8, 0.3, 0.9] Median: 0.6 Remaining: [0.5, 0.7, 0.9] Round 2: Estimates: [0.5, 0.7, 0.9] Median: 0.7 Remaining: [0.7, 0.9] Round 3: Remaining best arm: 0.9
๐ 8. Why It Works
- Reduces computation drastically
- Focuses only on promising actions
- Balances exploration and exploitation
⚠️ 9. Limitations
- Depends heavily on ฮต and ฮด
- Not efficient for very small problems
- Needs repeated sampling (still costly in some cases)
๐ก 10. Key Takeaways
- Median Elimination is a smart filtering algorithm
- Works by repeatedly removing weaker half
- Uses probability guarantees (ฮต, ฮด)
- Efficient for large action spaces
๐ฏ Final Summary
Median Elimination is like narrowing down contestants in a competition until only the best remains. It is simple, powerful, and widely used in reinforcement learning problems where decisions must be made efficiently under uncertainty.
No comments:
Post a Comment