This blog explores data science and networking, combining theoretical concepts with practical implementations. Topics include routing protocols, network operations, and data-driven problem solving, presented with clarity and reproducibility in mind.
Tuesday, August 13, 2024
Biased and Unbiased Selection in Statistics: Concepts and Calculations
Monday, August 5, 2024
Calculating Sample Variance: Using ๐vs. ๐−1
Understanding Sample Variance and VIF
This educational guide explains how to calculate sample variance using n and n-1, the differences in results, and real-world implications. Additionally, we explore Variance Inflation Factor (VIF) for detecting multicollinearity.
๐ Table of Contents
- Sample Variance Basics
- Step-by-Step Variance Calculation Example
- Real-Life Example: Clinical Trials
- Detecting Multicollinearity Using VIF
- Related Articles
1. Sample Variance Basics
Variance measures the spread of data points around the mean. There are two common formulas for sample variance:
- Using n: Average of squared deviations from the mean.
- Using n-1: Corrected version to estimate population variance from a sample, also called Bessel's correction.
Why n-1? Using n underestimates the true variance when working with a sample because it does not account for the fact that the mean is itself estimated from the sample.
2. Step-by-Step Variance Calculation Example
Let's calculate the variance for three student scores: 80, 85, 90 using n.
- Average = (80 + 85 + 90) / 3 = 85
- Squared deviations:
- (80 - 85)2 = 25
- (85 - 85)2 = 0
- (90 - 85)2 = 25
- Variance = (25 + 0 + 25) / 3 = 16.67
Now, using n-1 for the same data:
- Squared deviations remain 25, 0, 25
- Variance = (25 + 0 + 25) / (3-1) = 25
Comparison:
- Variance using n = 16.67
- Variance using n-1 = 25
CLI Simulation Example
$ python >>> import numpy as np >>> data = [80, 85, 90] >>> np.var(data) # Using n 16.666666666666668 >>> np.var(data, ddof=1) # Using n-1 25.0
3. Real-Life Example: Clinical Trials
Consider a study evaluating a new drug:
- Two groups: Drug vs Placebo, 30 patients each
- We calculate variance of blood pressure reduction
- Impact of using n vs n-1:
- Variance with n = 25 mmHg²
- Variance with n-1 = 27 mmHg²
- Consequences:
- Confidence intervals are narrower with n → overconfidence in effect.
- Hypothesis tests may falsely indicate significance.
- Decision-making may be flawed → regulatory or safety issues.
- Key Takeaway: Using
n-1ensures accurate estimates, maintaining reliability and public safety.
4. Detecting Multicollinearity Using Variance Inflation Factor (VIF)
VIF measures how much the variance of a regression coefficient is inflated due to multicollinearity:
# Python example using statsmodels
from statsmodels.stats.outliers_influence import variance_inflation_factor
import pandas as pd
data = pd.DataFrame({
'X1': [1, 2, 3, 4, 5],
'X2': [2, 4, 6, 8, 10], # Highly correlated with X1
'X3': [5, 3, 6, 2, 1]
})
vif_data = pd.DataFrame()
vif_data["feature"] = data.columns
vif_data["VIF"] = [variance_inflation_factor(data.values, i) for i in range(data.shape[1])]
print(vif_data)
High VIF (>10) indicates multicollinearity, which can distort regression results.
5. Related Articles
๐ก Key Takeaways
- Use
n-1for sample variance to avoid underestimation. - Incorrect variance can mislead confidence intervals and hypothesis tests.
- VIF helps detect multicollinearity in regression, ensuring robust model interpretation.
- Interactive examples, CLI outputs, and copy buttons enhance hands-on learning.
Featured Post
How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing
The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...
Popular Posts
-
EIGRP Stub Routing In complex network environments, maintaining stability and efficienc...
-
Modern NTP Practices – Interactive Guide Modern NTP Practices – Interactive Guide Network Time Protocol (NTP)...
-
DeepID-Net and Def-Pooling Layer Explained | Interactive Guide DeepID-Net and Def-Pooling Layer Explaine...
-
GET VPN COOP Explained Simply: Key Server Redundancy Made Easy GET VPN COOP Explained (Simple + Practica...
-
Modern Cisco ASA Troubleshooting (Post-9.7) Modern Cisco ASA Troubleshooting (Post-9.7) With evolving netwo...
-
When Machine Learning Looks Right but Goes Wrong When Machine Learning Looks Right but Goes Wrong Picture a f...
-
Latent Space & Vector Arithmetic Explained | AI Image Transformations Latent Space & Vector Arit...
-
Process Synchronization – Interactive OS Guide Process Synchronization – Interactive Operating Systems Guide In an operati...
-
Event2Mind – Teaching Machines Human Intent and Emotion Event2Mind: Teaching Machines to Understand Human Intent...
-
Linear Regression vs Classification – Interactive Guide Linear Regression vs Classification – Interactive Theory Guide Line...