Showing posts with label Bernoulli. Show all posts
Showing posts with label Bernoulli. Show all posts

Saturday, October 26, 2024

How Outcomes Work in Reinforcement Learning and Experiments


Understanding Bernoulli, Binomial, Multinomial & RL

๐ŸŽฒ Understanding Probability Experiments & Reinforcement Learning

๐Ÿ” What Are Experiments?

An experiment is any process that produces an outcome. In probability, experiments are repeatable and measurable.

Examples:

  • Flipping a coin
  • Rolling a die
  • Click prediction in apps
  • Robot decision making
๐Ÿ’ก Core Idea: Every experiment produces outcomes that can be measured, predicted, and learned from.

⚪ Bernoulli Experiment

A Bernoulli experiment has only two outcomes:

Success (1) or Failure (0)

Examples:

  • Coin flip → Heads or Tails
  • Email click → Click or No Click

Mathematical Insight

A Bernoulli random variable is defined as:

P(X = 1) = p  
P(X = 0) = 1 - p
๐Ÿ”ฝ Why is Bernoulli important?

It is the building block for all other probability distributions like binomial and geometric.

๐Ÿ“Š Binomial Experiment

A binomial experiment repeats a Bernoulli experiment multiple times.

Example

Flip coin 10 times → Count number of heads

Formula

P(X = k) = (n choose k) * p^k * (1-p)^(n-k)

Where:

  • n = number of trials
  • k = number of successes
  • p = probability of success
๐Ÿ”ฝ Real-world intuition

Used in marketing (conversion rates), medicine (treatment success), and AI models.

๐ŸŽฏ Multinomial Experiment

Multinomial experiments extend binomial experiments to more than two outcomes.

Example

Roll a dice 20 times → Track frequency of 1–6

Formula

P(X1,...,Xk) = n! / (x1! x2! ... xk!) * p1^x1 * ... * pk^xk
๐Ÿ”ฝ Key Insight

Instead of success/failure, we now track multiple categories simultaneously.

๐Ÿท️ Categorical Outcomes

Categorical outcomes represent labels rather than numbers.

  • Favorite fruit
  • Customer segment
  • User choice in apps
๐Ÿ’ก Important: No inherent order exists in categorical data.

๐Ÿ“ Mathematical Foundation

These experiments are all probability distributions:

  • Bernoulli → Single trial
  • Binomial → Repeated binary trials
  • Multinomial → Multi-category trials

They follow probability rules:

Sum of probabilities = 1

๐Ÿ“ Mathematical Deep Dive (Probability Distributions)

Probability experiments are formally described using random variables and distributions. Below is the mathematical structure behind each concept.

⚪ Bernoulli Distribution

A Bernoulli random variable represents a single trial with two outcomes.

Mathematically:

$$ X \sim \text{Bernoulli}(p) $$

Probability mass function:

$$ P(X = x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases} $$

๐Ÿ”ฝ Explanation

The parameter $p$ represents the probability of success. The entire distribution is defined by just one parameter.

๐Ÿ“Š Binomial Distribution

A binomial distribution represents repeated Bernoulli trials.

$$ X \sim \text{Binomial}(n, p) $$

Probability mass function:

$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k} $$

๐Ÿ”ฝ Explanation

- $n$ = number of trials - $k$ = number of successes - $\binom{n}{k}$ counts combinations

๐ŸŽฏ Multinomial Distribution

Generalization of binomial distribution for multiple categories.

$$ (X_1, X_2, ..., X_m) \sim \text{Multinomial}(n, p_1, p_2, ..., p_m) $$

Probability mass function:

$$ P(X_1, ..., X_m) = \frac{n!}{x_1! x_2! \cdots x_m!} \prod_{i=1}^{m} p_i^{x_i} $$

๐Ÿ”ฝ Explanation

- $m$ = number of categories - $x_i$ = count of category i - $p_i$ = probability of category i

๐Ÿท️ Categorical Distribution

A single draw from multiple categories.

$$ X \sim \text{Categorical}(p_1, p_2, ..., p_k) $$

Probability:

$$ P(X = i) = p_i $$

๐Ÿ”ฝ Explanation

Unlike multinomial, categorical deals with a single trial instead of repeated ones.

๐Ÿค– Connection to Reinforcement Learning

In reinforcement learning, policy distributions are often modeled using these probability functions:

  • Bernoulli → binary action policies
  • Binomial → success tracking over episodes
  • Multinomial → action selection among multiple choices
  • Categorical → softmax-based policy outputs

Example policy:

$$ \pi(a|s) = \text{softmax}(z_i) $$

๐Ÿ”ฝ Why this matters

This is how AI agents decide actions probabilistically instead of deterministically.

๐Ÿค– Reinforcement Learning Connection

1. Bernoulli → Reward Signal

Agent gets reward or not.

2. Binomial → Repeated Actions

Track success rate over time.

3. Multinomial → Multiple Actions

Agent chooses between many actions.

4. Categorical → Decision Classes

Agent selects between discrete strategies.

๐Ÿ”ฝ Deep RL Insight

These probability models are used in:

  • Policy gradients
  • Bandit problems
  • Exploration strategies

๐Ÿ’ป CLI Simulation Example

Code Example

import numpy as np

# Bernoulli Trial
print("Bernoulli:", np.random.binomial(1, 0.5))

# Binomial Trial
print("Binomial:", np.random.binomial(10, 0.5))

# Multinomial Trial
print("Multinomial:", np.random.multinomial(10, [1/6]*6))

CLI Output

$ python experiment.py

Bernoulli: 1
Binomial: 6
Multinomial: [2 1 3 1 2 1]
๐Ÿ”ฝ Output Explanation

Each run produces different outcomes due to randomness.

๐ŸŽฏ Key Takeaways

  • Bernoulli = single binary outcome
  • Binomial = repeated Bernoulli
  • Multinomial = multiple outcomes
  • Categorical = labels without order
  • RL uses these for decision making

๐Ÿ“˜ Final Thoughts

Understanding probability experiments builds the foundation for machine learning and AI. These concepts simplify complex systems into understandable patterns, enabling smarter decisions and predictive intelligence.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts