Showing posts with label offline reinforcement learning. Show all posts
Showing posts with label offline reinforcement learning. Show all posts

Tuesday, December 10, 2024

A Beginner’s Guide to LSPI and Fitted Q Iteration in Reinforcement Learning


LSPI vs Fitted Q Iteration in Reinforcement Learning

๐Ÿง  LSPI vs Fitted Q Iteration (FQI)

Reinforcement learning (RL) teaches an agent to make decisions that maximize reward. When data is limited, Least-Squares Policy Iteration (LSPI) and Fitted Q Iteration (FQI) are two powerful, data-efficient approaches.

๐Ÿ“˜ Basics: Policies & Q-Functions +
  • Policy: A rule mapping states to actions
  • Q-Function: Expected long-term reward of taking an action in a state
Q(state, action) → expected future reward
      
๐Ÿ“ What is LSPI? +

LSPI improves a policy by estimating the Q-function using least-squares regression over a fixed dataset.

How LSPI Works

  1. Collect experience data (S, A, R, S')
  2. Represent states/actions with features
  3. Solve Q-function using least-squares
  4. Update policy greedily
Dataset → Feature Matrix
→ Least-Squares Q
→ Greedy Policy Update
      
⚙️ Why LSPI is Useful +
  • Data efficient
  • Offline learning
  • Handles continuous state/action spaces
  • Interpretable linear models
๐Ÿ” What is Fitted Q Iteration (FQI)? +

FQI learns the Q-function by repeatedly fitting it to Bellman updates using powerful function approximators.

Q(s, a) = r + ฮณ · max Q(s', a')
      

FQI Process

  1. Initialize Q-function
  2. Apply Bellman update to dataset
  3. Fit a model (NN, tree, etc.)
  4. Repeat until convergence
๐Ÿ†š LSPI vs FQI: Key Differences +
Aspect LSPI FQI
Main Focus Policy improvement Q-function approximation
Function Approximation Linear features Neural nets / trees
Data Size Small to medium Medium to large
Interpretability High Lower
๐ŸŽฏ When to Use Which? +

Use LSPI if:

  • Limited data
  • Simple features
  • Need interpretability

Use FQI if:

  • Complex environments
  • Large datasets
  • Non-linear value functions

๐Ÿ’ก Key Takeaways

  • Both LSPI and FQI are data-efficient RL methods
  • LSPI is simple, linear, and interpretable
  • FQI is powerful and scales to complex problems
  • Choice depends on data size and environment complexity
Offline Reinforcement Learning • Data-Efficient Intelligence

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts