๐ง LSPI vs Fitted Q Iteration (FQI)
Reinforcement learning (RL) teaches an agent to make decisions that maximize reward. When data is limited, Least-Squares Policy Iteration (LSPI) and Fitted Q Iteration (FQI) are two powerful, data-efficient approaches.
- Policy: A rule mapping states to actions
- Q-Function: Expected long-term reward of taking an action in a state
Q(state, action) → expected future reward
LSPI improves a policy by estimating the Q-function using least-squares regression over a fixed dataset.
How LSPI Works
- Collect experience data (S, A, R, S')
- Represent states/actions with features
- Solve Q-function using least-squares
- Update policy greedily
Dataset → Feature Matrix
→ Least-Squares Q
→ Greedy Policy Update
- Data efficient
- Offline learning
- Handles continuous state/action spaces
- Interpretable linear models
FQI learns the Q-function by repeatedly fitting it to Bellman updates using powerful function approximators.
Q(s, a) = r + ฮณ · max Q(s', a')
FQI Process
- Initialize Q-function
- Apply Bellman update to dataset
- Fit a model (NN, tree, etc.)
- Repeat until convergence
| Aspect | LSPI | FQI |
|---|---|---|
| Main Focus | Policy improvement | Q-function approximation |
| Function Approximation | Linear features | Neural nets / trees |
| Data Size | Small to medium | Medium to large |
| Interpretability | High | Lower |
Use LSPI if:
- Limited data
- Simple features
- Need interpretability
Use FQI if:
- Complex environments
- Large datasets
- Non-linear value functions
๐ก Key Takeaways
- Both LSPI and FQI are data-efficient RL methods
- LSPI is simple, linear, and interpretable
- FQI is powerful and scales to complex problems
- Choice depends on data size and environment complexity