Showing posts with label Computational Efficiency. Show all posts
Showing posts with label Computational Efficiency. Show all posts

Friday, October 25, 2024

A Beginner's Guide to the Median Elimination Algorithm in Reinforcement Learning


Median Elimination Algorithm in Reinforcement Learning – Complete Guide

๐ŸŽฏ Median Elimination Algorithm in Reinforcement Learning (RL)

Reinforcement Learning is about making an agent learn the best decisions through trial and error. One powerful strategy for efficiently selecting the best action is the Median Elimination Algorithm.

This guide explains everything step-by-step in a simple, intuitive way with math, examples, and practical insights.


๐Ÿ“š Table of Contents


❗ 1. The Problem in Reinforcement Learning

In RL, an agent must choose between multiple actions (called arms in bandit problems).

Each arm gives uncertain rewards → the agent does not know which is best initially.

The challenge:

  • Too many options = expensive exploration
  • Need to quickly find the best action

๐Ÿ’ก 2. Core Idea of Median Elimination

Instead of testing everything equally, we repeatedly:

  • Estimate performance
  • Find the median reward
  • Eliminate weaker half

This is similar to narrowing choices in a competition round by round.


⚙️ 3. Step-by-Step Algorithm

Step 1: Initialization

  • Start with all arms
  • Set accuracy parameters:
    • ฮต (epsilon) → how close we want to be to best arm
    • ฮด (delta) → confidence level

Step 2: Sampling

Pull each arm multiple times and compute average reward:

\[ \hat{r_i} = \frac{1}{n} \sum_{t=1}^{n} r_{i,t} \]

๐Ÿ‘‰ This gives estimated reward for each arm.


Step 3: Compute Median

Sort all rewards and find median:

\[ median = middle\ value\ of\ sorted\ rewards \]

๐Ÿ‘‰ Arms below median are weaker candidates.


Step 4: Elimination

  • Keep only arms ≥ median
  • Discard the rest
This cuts the search space roughly in half each round.

Step 5: Repeat

Repeat sampling → median → elimination until one arm remains.


๐Ÿ“ 4. Mathematical Intuition (Easy Version)

Confidence Guarantee

The algorithm ensures:

\[ P(\text{chosen arm is within } \epsilon \text{ of best}) \ge 1 - \delta \]

Simple Explanation:

  • ฮต (epsilon): how wrong we can tolerate
  • ฮด (delta): probability of failure
Meaning: We are almost sure (1−ฮด) that our result is very close (ฮต) to the best choice.

๐ŸŽฐ 5. Real-Life Example (Slot Machines)

Imagine 10 slot machines:

  1. Play each machine a few times
  2. Calculate average reward
  3. Find median performer
  4. Remove weaker machines
  5. Repeat until best machine remains

This avoids wasting time on bad machines.


๐Ÿ’ป 6. Code Example

import numpy as np arms = [0.2, 0.5, 0.7, 0.4, 0.9] epsilon = 0.1 delta = 0.1 def sample(arm, n=10): return np.mean(np.random.binomial(1, arm, n)) # simple simulation estimates = [sample(a) for a in arms] median = np.median(estimates) filtered = [a for a, est in zip(arms, estimates) if est >= median] print("Remaining arms:", filtered)

๐Ÿ–ฅ️ 7. CLI Simulation Output

Click to Expand
Initial Arms: [0.2, 0.5, 0.7, 0.4, 0.9]

Round 1:
Estimates: [0.2, 0.6, 0.8, 0.3, 0.9]
Median: 0.6
Remaining: [0.5, 0.7, 0.9]

Round 2:
Estimates: [0.5, 0.7, 0.9]
Median: 0.7
Remaining: [0.7, 0.9]

Round 3:
Remaining best arm: 0.9 

๐Ÿš€ 8. Why It Works

  • Reduces computation drastically
  • Focuses only on promising actions
  • Balances exploration and exploitation
Instead of checking everything deeply, it quickly filters out bad options.

⚠️ 9. Limitations

  • Depends heavily on ฮต and ฮด
  • Not efficient for very small problems
  • Needs repeated sampling (still costly in some cases)

๐Ÿ’ก 10. Key Takeaways

  • Median Elimination is a smart filtering algorithm
  • Works by repeatedly removing weaker half
  • Uses probability guarantees (ฮต, ฮด)
  • Efficient for large action spaces

๐ŸŽฏ Final Summary

Median Elimination is like narrowing down contestants in a competition until only the best remains. It is simple, powerful, and widely used in reinforcement learning problems where decisions must be made efficiently under uncertainty.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts