Saturday, August 3, 2024

Calculating Percentiles, Interquartile Range, and Creating a Box Plot for a Dataset

### 1. Calculate Percentiles

Percentiles indicate the value below which a given percentage of observations fall. We'll calculate the 25th percentile (Q1), the 50th percentile (median or Q2), and the 75th percentile (Q3).

Given data: `2, 5, 6, 7, 15, 15, 35, 43, 52, 65, 77, 88, 105, 199, 208, 1000`

**Sorted Data**: `2, 5, 6, 7, 15, 15, 35, 43, 52, 65, 77, 88, 105, 199, 208, 1000`

- **25th Percentile (Q1)**: This is the value at the 25% mark. For a dataset of 16 values, this is the 4th value, because 25% of 16 is 4. The 4th value is `7`.

- **50th Percentile (Median or Q2)**: This is the middle value. Since the dataset has 16 values (even number), the median is the average of the 8th and 9th values. The 8th and 9th values are `43` and `52`. So, the median is `(43 + 52) / 2 = 47.5`.

- **75th Percentile (Q3)**: This is the value at the 75% mark. For a dataset of 16 values, this is the 12th value. The 12th value is `88`.

### 2. Calculate Interquartile Range (IQR)

The IQR is the range between the 25th percentile (Q1) and the 75th percentile (Q3):

IQR = Q3 - Q1 = 88 - 7 = 81 

### 3. Box Plot

A box plot visually represents the distribution of data based on quartiles. Here's a description of how to plot it:

1. **Minimum**: The smallest value (2).
2. **Q1**: 7.
3. **Median (Q2)**: 47.5.
4. **Q3**: 88.
5. **Maximum**: The largest value (1000).

**Box Plot Summary**:
- **Box**: Drawn from Q1 to Q3, with a line at the median (47.5).
- **Whiskers**: Extend from the minimum (2) to Q1 and from Q3 to the maximum (1000).
- **Outliers**: Values that fall outside 1.5 times the IQR from Q1 or Q3.

Let's calculate the potential outliers:
- Lower Bound = Q1 - 1.5 * IQR = 7 - 1.5 * 81 = -115.5 (no values below this)
- Upper Bound = Q3 + 1.5 * IQR = 88 + 1.5 * 81 = 212.5

Outliers are values above 212.5. The only value above this is `1000`, so `1000` is an outlier.

### Summary of Results

- **25th Percentile (Q1)**: 7
- **50th Percentile (Median, Q2)**: 47.5
- **75th Percentile (Q3)**: 88
- **Interquartile Range (IQR)**: 81
- **Outlier**: 1000

**Box Plot**:

   2    7     47.5    88         1000
   |-----|--------|---------|-------------|

The box plot would have a box from 7 to 88 with a line at 47.5 and whiskers extending to the minimum value (2) and the maximum value (1000), highlighting 1000 as an outlier.

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts