๐ข Titanic Survival Analysis using Treemap Visualization
๐ Table of Contents
- Problem Statement
- Goal of Analysis
- Why Treemap?
- Hierarchy Breakdown
- Mathematical Understanding
- Code Example
- CLI Output
- Insights
- Key Takeaways
- Related Articles
๐ Problem Statement
The objective is to analyze survival patterns of passengers aboard the Titanic using categorical features such as class, gender, and embarkation location.
Traditional charts often fail to capture multi-level relationships effectively. Therefore, a more structured visualization is required.
๐ฏ Goal of Analysis
- Understand survival distribution
- Compare groups across multiple variables
- Identify hidden patterns
- Create intuitive visualization
๐ณ Why Treemap?
A treemap is ideal for hierarchical data visualization because:
- Represents nested categories
- Uses area size for magnitude
- Uses color for additional dimension
Each rectangle represents a group, and its size corresponds to passenger count.
๐ง Hierarchical Structure
The treemap follows this hierarchy:
- Class (1st, 2nd, 3rd)
- Sex (Male, Female)
- Embark Town
- Survival Status
๐ Expand Explanation
This structure allows drilling down from broad categories (class) into detailed insights (survival). Each level adds context, improving interpretability.
๐ Mathematical Understanding
Survival Rate Formula
Survival Rate = (Number of Survivors / Total Passengers) × 100
Group Proportion
Group Size ∝ Number of passengers in category
Color Encoding
Color Scale = f(Survival Status)
๐ Deep Explanation
Treemap area is proportional to frequency counts. Color mapping often uses normalized values between 0 and 1. For example:
normalized_value = (value - min) / (max - min)
This ensures consistent color scaling across categories.
๐ Advanced Mathematical Explanation
To fully understand the treemap visualization, we need to break down the mathematical relationships behind survival rates, proportions, and hierarchical aggregation.
1. Survival Probability
The survival probability for any group is calculated as:
P(Survival) = Number of Survivors / Total Passengers in Group
This gives a value between 0 and 1, where:
- 0 → No one survived
- 1 → Everyone survived
2. Percentage Conversion
Survival Rate (%) = P(Survival) × 100
3. Hierarchical Aggregation
Treemap works by aggregating counts at each level:
Total(Class) = ฮฃ Passengers in that Class Total(Class, Sex) = ฮฃ Passengers in that subgroup
Each rectangle size is proportional to:
Area ∝ Number of Passengers
4. Conditional Probability Insight
We can also analyze survival using conditional probability:
P(Survival | Female, 1st Class) = Survivors(Female, 1st Class) / Total(Female, 1st Class)
5. Color Normalization (for Treemap)
Color intensity is calculated using normalization:
Normalized Value = (x - min) / (max - min)
This ensures consistent color mapping across all groups.
๐ Why This Matters
These calculations ensure that:
- Rectangle sizes accurately represent population
- Colors reflect survival likelihood
- Comparisons remain statistically valid
๐ป Code Example (Python - Plotly)
import plotly.express as px
import pandas as pd
df = px.data.titanic()
fig = px.treemap(
df,
path=['class', 'sex', 'embark_town', 'survived'],
color='survived',
color_continuous_scale='RdBu'
)
fig.show()
๐ฅ CLI Output Sample
Loading dataset... Processing hierarchy... Generating treemap... ✔ Class grouped ✔ Gender segmented ✔ Embarkation mapped Treemap rendered successfully!
๐ Expand CLI Explanation
This output simulates a pipeline where:
- Data is loaded
- Categories are grouped
- Visualization is generated
๐ Key Insights from Treemap
- First-class passengers had higher survival rates
- Females survived more than males
- Third-class males had lowest survival
- Embarkation town influenced outcomes slightly
These patterns become immediately visible through area and color differences.
๐ฏ Key Takeaways
- Treemap simplifies complex hierarchical data
- Combines size and color for dual insights
- Excellent for categorical comparison
- Improves decision-making clarity
๐ Final Thoughts
Treemaps are a powerful tool for visual storytelling in data science. When applied to the Titanic dataset, they reveal survival patterns in a clear, hierarchical, and intuitive manner.
By combining structure, color, and scale, this approach transforms raw data into meaningful insights.
No comments:
Post a Comment