Saturday, August 3, 2024

Calculating the Interquartile Range (IQR) and Identifying Outliers

Calculating Percentiles, Interquartile Range (IQR), Outliers and Box Plot

Calculating Percentiles, Interquartile Range (IQR), and Outlier Detection

Understanding percentiles, quartiles, and the Interquartile Range (IQR) is essential in statistics and data analysis. These concepts help identify how data is distributed and help detect unusual values known as outliers.

In this guide, you will learn how to calculate the IQR, detect outliers, and apply the technique to grouped datasets.


๐Ÿ“š Table of Contents


Understanding Interquartile Range (IQR)

The Interquartile Range (IQR) measures the spread of the middle 50% of a dataset. It is widely used in statistics because it is resistant to extreme values.

The IQR is calculated using two quartiles:

  • Q1 (First Quartile) – 25th percentile
  • Q3 (Third Quartile) – 75th percentile

The formula is:


IQR = Q3 − Q1

This means the IQR represents the range where the middle half of the data lies.


Step-by-Step IQR Calculation Example

Consider the dataset:


3, 7, 8, 5, 12, 7, 9, 15

Step 1 — Sort the numbers

3, 5, 7, 7, 8, 9, 12, 15

Sorting the numbers helps locate the quartiles accurately.

Step 2 — Find the Median Since there are 8 numbers, the median is the average of the 4th and 5th values.

Median = (7 + 8) / 2

Median = 7.5


Step 3 — Divide the Data Lower Half:

3, 5, 7, 7

Upper Half:

8, 9, 12, 15


Step 4 — Calculate Q1 Median of lower half:

Q1 = (5 + 7) / 2

Q1 = 6


Step 5 — Calculate Q3 Median of upper half:

Q3 = (9 + 12) / 2

Q3 = 10.5


Step 6 — Calculate IQR

IQR = Q3 − Q1

IQR = 10.5 − 6

IQR = 4.5

So the IQR = 4.5

Identifying Outliers Using IQR

Outliers are data points that are significantly different from other observations.

To detect them using IQR:

  • Lower Bound = Q1 − 1.5 × IQR
  • Upper Bound = Q3 + 1.5 × IQR
Example Calculation

Q1 = 6

Q3 = 10.5

IQR = 4.5

Lower Bound = 6 − 1.5 × 4.5

Lower Bound = -0.75

Upper Bound = 10.5 + 1.5 × 4.5

Upper Bound = 17.25

Any value outside this range is considered an outlier.

For the dataset:


3, 5, 7, 7, 8, 9, 12, 15

All values fall between **-0.75 and 17.25**, therefore: No outliers exist in this dataset.

Replacing Outliers on a Groupby Basis

When working with grouped datasets (for example grouped by weather event), outliers should be handled within each group separately.

This prevents distortion across different categories.

Example scenario:

  • Event = Snow
  • Median Temperature = 20
  • Upper Bound = 24
  • Observed Value = 28
Since 28 exceeds the allowed range:

Replace 28 → 24

This technique keeps the data consistent while removing extreme variations.

Python Code Example


import pandas as pd

def replace_outliers(group, column):

    Q1 = group[column].quantile(0.25)

    Q3 = group[column].quantile(0.75)

    IQR = Q3 - Q1

    lower = Q1 - 1.5 * IQR

    upper = Q3 + 1.5 * IQR

    group[column] = group[column].clip(lower, upper)

    return group

df = df.groupby("event").apply(lambda g: replace_outliers(g,"temperature"))


CLI Output Example

Example console output after applying the transformation:


$ python outlier_cleaning.py

Processing dataset...

Event: Snow

Original value: 28

Upper bound: 24

Replaced value: 24

Processing completed successfully.


๐Ÿ’ก Key Takeaways

  • The IQR measures the spread of the middle 50% of the data.
  • Q1 is the 25th percentile and Q3 is the 75th percentile.
  • Outliers are values outside 1.5 × IQR from quartiles.
  • Handling outliers per group maintains dataset integrity.
  • Replacing extreme values with bounds prevents skewed analysis.


By understanding IQR and outlier detection, analysts can ensure that datasets remain reliable, accurate, and ready for meaningful insights.

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts