Showing posts with label class. Show all posts
Showing posts with label class. Show all posts

Tuesday, December 31, 2024

Treemap of Titanic Dataset: Survival Analysis by Class, Sex, Embarkation Town


Titanic Survival Analysis Treemap Guide

๐Ÿšข Titanic Survival Analysis using Treemap Visualization

๐Ÿ“‘ Table of Contents


๐Ÿ“Œ Problem Statement

The objective is to analyze survival patterns of passengers aboard the Titanic using categorical features such as class, gender, and embarkation location.

Traditional charts often fail to capture multi-level relationships effectively. Therefore, a more structured visualization is required.


๐ŸŽฏ Goal of Analysis

  • Understand survival distribution
  • Compare groups across multiple variables
  • Identify hidden patterns
  • Create intuitive visualization
๐Ÿ’ก Core Goal: Build a visual hierarchy that reveals survival trends clearly.

๐ŸŒณ Why Treemap?

A treemap is ideal for hierarchical data visualization because:

  • Represents nested categories
  • Uses area size for magnitude
  • Uses color for additional dimension

Each rectangle represents a group, and its size corresponds to passenger count.


๐Ÿง  Hierarchical Structure

The treemap follows this hierarchy:

  1. Class (1st, 2nd, 3rd)
  2. Sex (Male, Female)
  3. Embark Town
  4. Survival Status
๐Ÿ“‚ Expand Explanation

This structure allows drilling down from broad categories (class) into detailed insights (survival). Each level adds context, improving interpretability.


๐Ÿ“ Mathematical Understanding

Survival Rate Formula

Survival Rate = (Number of Survivors / Total Passengers) × 100

Group Proportion

Group Size ∝ Number of passengers in category

Color Encoding

Color Scale = f(Survival Status)
๐Ÿ“– Deep Explanation

Treemap area is proportional to frequency counts. Color mapping often uses normalized values between 0 and 1. For example:

normalized_value = (value - min) / (max - min)

This ensures consistent color scaling across categories.


๐Ÿ“ Advanced Mathematical Explanation

To fully understand the treemap visualization, we need to break down the mathematical relationships behind survival rates, proportions, and hierarchical aggregation.

1. Survival Probability

The survival probability for any group is calculated as:

P(Survival) = Number of Survivors / Total Passengers in Group

This gives a value between 0 and 1, where:

  • 0 → No one survived
  • 1 → Everyone survived

2. Percentage Conversion

Survival Rate (%) = P(Survival) × 100

3. Hierarchical Aggregation

Treemap works by aggregating counts at each level:

Total(Class) = ฮฃ Passengers in that Class
Total(Class, Sex) = ฮฃ Passengers in that subgroup

Each rectangle size is proportional to:

Area ∝ Number of Passengers

4. Conditional Probability Insight

We can also analyze survival using conditional probability:

P(Survival | Female, 1st Class) =
Survivors(Female, 1st Class) / Total(Female, 1st Class)

5. Color Normalization (for Treemap)

Color intensity is calculated using normalization:

Normalized Value = (x - min) / (max - min)

This ensures consistent color mapping across all groups.

๐Ÿ“– Why This Matters

These calculations ensure that:

  • Rectangle sizes accurately represent population
  • Colors reflect survival likelihood
  • Comparisons remain statistically valid
๐Ÿ’ก Key Insight: Treemap is not just visual — it is mathematically grounded in probability and aggregation.

๐Ÿ’ป Code Example (Python - Plotly)

import plotly.express as px
import pandas as pd

df = px.data.titanic()

fig = px.treemap(
    df,
    path=['class', 'sex', 'embark_town', 'survived'],
    color='survived',
    color_continuous_scale='RdBu'
)

fig.show()

๐Ÿ–ฅ CLI Output Sample

Loading dataset...
Processing hierarchy...
Generating treemap...

✔ Class grouped
✔ Gender segmented
✔ Embarkation mapped

Treemap rendered successfully!
๐Ÿ“‚ Expand CLI Explanation

This output simulates a pipeline where:

  • Data is loaded
  • Categories are grouped
  • Visualization is generated

๐Ÿ” Key Insights from Treemap

  • First-class passengers had higher survival rates
  • Females survived more than males
  • Third-class males had lowest survival
  • Embarkation town influenced outcomes slightly

These patterns become immediately visible through area and color differences.


๐ŸŽฏ Key Takeaways

  • Treemap simplifies complex hierarchical data
  • Combines size and color for dual insights
  • Excellent for categorical comparison
  • Improves decision-making clarity

๐Ÿ“Œ Final Thoughts

Treemaps are a powerful tool for visual storytelling in data science. When applied to the Titanic dataset, they reveal survival patterns in a clear, hierarchical, and intuitive manner.

By combining structure, color, and scale, this approach transforms raw data into meaningful insights.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts