๐ฒ Understanding Probability Experiments & Reinforcement Learning
๐ Table of Contents
๐ What Are Experiments?
An experiment is any process that produces an outcome. In probability, experiments are repeatable and measurable.
Examples:
- Flipping a coin
- Rolling a die
- Click prediction in apps
- Robot decision making
⚪ Bernoulli Experiment
A Bernoulli experiment has only two outcomes:
Success (1) or Failure (0)
Examples:
- Coin flip → Heads or Tails
- Email click → Click or No Click
Mathematical Insight
A Bernoulli random variable is defined as:
P(X = 1) = p P(X = 0) = 1 - p
๐ฝ Why is Bernoulli important?
It is the building block for all other probability distributions like binomial and geometric.
๐ Binomial Experiment
A binomial experiment repeats a Bernoulli experiment multiple times.
Example
Flip coin 10 times → Count number of heads
Formula
P(X = k) = (n choose k) * p^k * (1-p)^(n-k)
Where:
- n = number of trials
- k = number of successes
- p = probability of success
๐ฝ Real-world intuition
Used in marketing (conversion rates), medicine (treatment success), and AI models.
๐ฏ Multinomial Experiment
Multinomial experiments extend binomial experiments to more than two outcomes.
Example
Roll a dice 20 times → Track frequency of 1–6
Formula
P(X1,...,Xk) = n! / (x1! x2! ... xk!) * p1^x1 * ... * pk^xk
๐ฝ Key Insight
Instead of success/failure, we now track multiple categories simultaneously.
๐ท️ Categorical Outcomes
Categorical outcomes represent labels rather than numbers.
- Favorite fruit
- Customer segment
- User choice in apps
๐ Mathematical Foundation
These experiments are all probability distributions:
- Bernoulli → Single trial
- Binomial → Repeated binary trials
- Multinomial → Multi-category trials
They follow probability rules:
Sum of probabilities = 1
๐ Mathematical Deep Dive (Probability Distributions)
Probability experiments are formally described using random variables and distributions. Below is the mathematical structure behind each concept.
⚪ Bernoulli Distribution
A Bernoulli random variable represents a single trial with two outcomes.
Mathematically:
$$ X \sim \text{Bernoulli}(p) $$
Probability mass function:
$$ P(X = x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases} $$
๐ฝ Explanation
The parameter $p$ represents the probability of success. The entire distribution is defined by just one parameter.
๐ Binomial Distribution
A binomial distribution represents repeated Bernoulli trials.
$$ X \sim \text{Binomial}(n, p) $$
Probability mass function:
$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k} $$
๐ฝ Explanation
- $n$ = number of trials - $k$ = number of successes - $\binom{n}{k}$ counts combinations
๐ฏ Multinomial Distribution
Generalization of binomial distribution for multiple categories.
$$ (X_1, X_2, ..., X_m) \sim \text{Multinomial}(n, p_1, p_2, ..., p_m) $$
Probability mass function:
$$ P(X_1, ..., X_m) = \frac{n!}{x_1! x_2! \cdots x_m!} \prod_{i=1}^{m} p_i^{x_i} $$
๐ฝ Explanation
- $m$ = number of categories - $x_i$ = count of category i - $p_i$ = probability of category i
๐ท️ Categorical Distribution
A single draw from multiple categories.
$$ X \sim \text{Categorical}(p_1, p_2, ..., p_k) $$
Probability:
$$ P(X = i) = p_i $$
๐ฝ Explanation
Unlike multinomial, categorical deals with a single trial instead of repeated ones.
๐ค Connection to Reinforcement Learning
In reinforcement learning, policy distributions are often modeled using these probability functions:
- Bernoulli → binary action policies
- Binomial → success tracking over episodes
- Multinomial → action selection among multiple choices
- Categorical → softmax-based policy outputs
Example policy:
$$ \pi(a|s) = \text{softmax}(z_i) $$
๐ฝ Why this matters
This is how AI agents decide actions probabilistically instead of deterministically.
๐ค Reinforcement Learning Connection
1. Bernoulli → Reward Signal
Agent gets reward or not.
2. Binomial → Repeated Actions
Track success rate over time.
3. Multinomial → Multiple Actions
Agent chooses between many actions.
4. Categorical → Decision Classes
Agent selects between discrete strategies.
๐ฝ Deep RL Insight
These probability models are used in:
- Policy gradients
- Bandit problems
- Exploration strategies
๐ป CLI Simulation Example
Code Example
import numpy as np
# Bernoulli Trial
print("Bernoulli:", np.random.binomial(1, 0.5))
# Binomial Trial
print("Binomial:", np.random.binomial(10, 0.5))
# Multinomial Trial
print("Multinomial:", np.random.multinomial(10, [1/6]*6))
CLI Output
$ python experiment.py Bernoulli: 1 Binomial: 6 Multinomial: [2 1 3 1 2 1]
๐ฝ Output Explanation
Each run produces different outcomes due to randomness.
๐ฏ Key Takeaways
- Bernoulli = single binary outcome
- Binomial = repeated Bernoulli
- Multinomial = multiple outcomes
- Categorical = labels without order
- RL uses these for decision making
๐ Final Thoughts
Understanding probability experiments builds the foundation for machine learning and AI. These concepts simplify complex systems into understandable patterns, enabling smarter decisions and predictive intelligence.
No comments:
Post a Comment