Saturday, October 26, 2024

How Outcomes Work in Reinforcement Learning and Experiments


Understanding Bernoulli, Binomial, Multinomial & RL

๐ŸŽฒ Understanding Probability Experiments & Reinforcement Learning

๐Ÿ” What Are Experiments?

An experiment is any process that produces an outcome. In probability, experiments are repeatable and measurable.

Examples:

  • Flipping a coin
  • Rolling a die
  • Click prediction in apps
  • Robot decision making
๐Ÿ’ก Core Idea: Every experiment produces outcomes that can be measured, predicted, and learned from.

⚪ Bernoulli Experiment

A Bernoulli experiment has only two outcomes:

Success (1) or Failure (0)

Examples:

  • Coin flip → Heads or Tails
  • Email click → Click or No Click

Mathematical Insight

A Bernoulli random variable is defined as:

P(X = 1) = p  
P(X = 0) = 1 - p
๐Ÿ”ฝ Why is Bernoulli important?

It is the building block for all other probability distributions like binomial and geometric.

๐Ÿ“Š Binomial Experiment

A binomial experiment repeats a Bernoulli experiment multiple times.

Example

Flip coin 10 times → Count number of heads

Formula

P(X = k) = (n choose k) * p^k * (1-p)^(n-k)

Where:

  • n = number of trials
  • k = number of successes
  • p = probability of success
๐Ÿ”ฝ Real-world intuition

Used in marketing (conversion rates), medicine (treatment success), and AI models.

๐ŸŽฏ Multinomial Experiment

Multinomial experiments extend binomial experiments to more than two outcomes.

Example

Roll a dice 20 times → Track frequency of 1–6

Formula

P(X1,...,Xk) = n! / (x1! x2! ... xk!) * p1^x1 * ... * pk^xk
๐Ÿ”ฝ Key Insight

Instead of success/failure, we now track multiple categories simultaneously.

๐Ÿท️ Categorical Outcomes

Categorical outcomes represent labels rather than numbers.

  • Favorite fruit
  • Customer segment
  • User choice in apps
๐Ÿ’ก Important: No inherent order exists in categorical data.

๐Ÿ“ Mathematical Foundation

These experiments are all probability distributions:

  • Bernoulli → Single trial
  • Binomial → Repeated binary trials
  • Multinomial → Multi-category trials

They follow probability rules:

Sum of probabilities = 1

๐Ÿ“ Mathematical Deep Dive (Probability Distributions)

Probability experiments are formally described using random variables and distributions. Below is the mathematical structure behind each concept.

⚪ Bernoulli Distribution

A Bernoulli random variable represents a single trial with two outcomes.

Mathematically:

$$ X \sim \text{Bernoulli}(p) $$

Probability mass function:

$$ P(X = x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases} $$

๐Ÿ”ฝ Explanation

The parameter $p$ represents the probability of success. The entire distribution is defined by just one parameter.

๐Ÿ“Š Binomial Distribution

A binomial distribution represents repeated Bernoulli trials.

$$ X \sim \text{Binomial}(n, p) $$

Probability mass function:

$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k} $$

๐Ÿ”ฝ Explanation

- $n$ = number of trials - $k$ = number of successes - $\binom{n}{k}$ counts combinations

๐ŸŽฏ Multinomial Distribution

Generalization of binomial distribution for multiple categories.

$$ (X_1, X_2, ..., X_m) \sim \text{Multinomial}(n, p_1, p_2, ..., p_m) $$

Probability mass function:

$$ P(X_1, ..., X_m) = \frac{n!}{x_1! x_2! \cdots x_m!} \prod_{i=1}^{m} p_i^{x_i} $$

๐Ÿ”ฝ Explanation

- $m$ = number of categories - $x_i$ = count of category i - $p_i$ = probability of category i

๐Ÿท️ Categorical Distribution

A single draw from multiple categories.

$$ X \sim \text{Categorical}(p_1, p_2, ..., p_k) $$

Probability:

$$ P(X = i) = p_i $$

๐Ÿ”ฝ Explanation

Unlike multinomial, categorical deals with a single trial instead of repeated ones.

๐Ÿค– Connection to Reinforcement Learning

In reinforcement learning, policy distributions are often modeled using these probability functions:

  • Bernoulli → binary action policies
  • Binomial → success tracking over episodes
  • Multinomial → action selection among multiple choices
  • Categorical → softmax-based policy outputs

Example policy:

$$ \pi(a|s) = \text{softmax}(z_i) $$

๐Ÿ”ฝ Why this matters

This is how AI agents decide actions probabilistically instead of deterministically.

๐Ÿค– Reinforcement Learning Connection

1. Bernoulli → Reward Signal

Agent gets reward or not.

2. Binomial → Repeated Actions

Track success rate over time.

3. Multinomial → Multiple Actions

Agent chooses between many actions.

4. Categorical → Decision Classes

Agent selects between discrete strategies.

๐Ÿ”ฝ Deep RL Insight

These probability models are used in:

  • Policy gradients
  • Bandit problems
  • Exploration strategies

๐Ÿ’ป CLI Simulation Example

Code Example

import numpy as np

# Bernoulli Trial
print("Bernoulli:", np.random.binomial(1, 0.5))

# Binomial Trial
print("Binomial:", np.random.binomial(10, 0.5))

# Multinomial Trial
print("Multinomial:", np.random.multinomial(10, [1/6]*6))

CLI Output

$ python experiment.py

Bernoulli: 1
Binomial: 6
Multinomial: [2 1 3 1 2 1]
๐Ÿ”ฝ Output Explanation

Each run produces different outcomes due to randomness.

๐ŸŽฏ Key Takeaways

  • Bernoulli = single binary outcome
  • Binomial = repeated Bernoulli
  • Multinomial = multiple outcomes
  • Categorical = labels without order
  • RL uses these for decision making

๐Ÿ“˜ Final Thoughts

Understanding probability experiments builds the foundation for machine learning and AI. These concepts simplify complex systems into understandable patterns, enabling smarter decisions and predictive intelligence.

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts