This blog explores data science and networking, combining theoretical concepts with practical implementations. Topics include routing protocols, network operations, and data-driven problem solving, presented with clarity and reproducibility in mind.
๐ข Titanic Dataset Visualization Using Sunburst Chart
The Titanic dataset is one of the most widely used datasets in data science. In this guide, we’ll explore how to visualize it using a sunburst chart—a powerful way to understand hierarchical relationships.
The objective is to analyze survival patterns of passengers aboard the Titanic using categorical features such as class, gender, and embarkation location.
Traditional charts often fail to capture multi-level relationships effectively. Therefore, a more structured visualization is required.
๐ฏ Goal of Analysis
Understand survival distribution
Compare groups across multiple variables
Identify hidden patterns
Create intuitive visualization
๐ก Core Goal: Build a visual hierarchy that reveals survival trends clearly.
๐ณ Why Treemap?
A treemap is ideal for hierarchical data visualization because:
Represents nested categories
Uses area size for magnitude
Uses color for additional dimension
Each rectangle represents a group, and its size corresponds to passenger count.
๐ง Hierarchical Structure
The treemap follows this hierarchy:
Class (1st, 2nd, 3rd)
Sex (Male, Female)
Embark Town
Survival Status
๐ Expand Explanation
This structure allows drilling down from broad categories (class) into detailed insights (survival).
Each level adds context, improving interpretability.
๐ Mathematical Understanding
Survival Rate Formula
Survival Rate = (Number of Survivors / Total Passengers) × 100
Group Proportion
Group Size ∝ Number of passengers in category
Color Encoding
Color Scale = f(Survival Status)
๐ Deep Explanation
Treemap area is proportional to frequency counts. Color mapping often uses normalized values between 0 and 1.
For example:
normalized_value = (value - min) / (max - min)
This ensures consistent color scaling across categories.
๐ Advanced Mathematical Explanation
To fully understand the treemap visualization, we need to break down the mathematical relationships behind survival rates, proportions, and hierarchical aggregation.
1. Survival Probability
The survival probability for any group is calculated as:
P(Survival) = Number of Survivors / Total Passengers in Group
This gives a value between 0 and 1, where:
0 → No one survived
1 → Everyone survived
2. Percentage Conversion
Survival Rate (%) = P(Survival) × 100
3. Hierarchical Aggregation
Treemap works by aggregating counts at each level:
Total(Class) = ฮฃ Passengers in that Class
Total(Class, Sex) = ฮฃ Passengers in that subgroup
Each rectangle size is proportional to:
Area ∝ Number of Passengers
4. Conditional Probability Insight
We can also analyze survival using conditional probability:
These patterns become immediately visible through area and color differences.
๐ฏ Key Takeaways
Treemap simplifies complex hierarchical data
Combines size and color for dual insights
Excellent for categorical comparison
Improves decision-making clarity
๐ Final Thoughts
Treemaps are a powerful tool for visual storytelling in data science. When applied to the Titanic dataset,
they reveal survival patterns in a clear, hierarchical, and intuitive manner.
By combining structure, color, and scale, this approach transforms raw data into meaningful insights.