Tuesday, August 20, 2024

Categorizing and Sampling Titanic Dataset by Age Group

Passenger Data Categorization & Sampling in Python – Complete Guide

🚢 Passenger Categorization & Sampling in Python

📑 Table of Contents

Introduction
Categorizing Passengers
Extracting Samples
Concatenating Data
Complete Code Example
CLI Output
Mathematical Logic
Key Takeaways
Related Articles

📘 Introduction

Data analysis often begins with structuring raw data into meaningful categories. In this guide, we explore how to categorize passengers based on age and class, extract meaningful samples, and combine them for analysis using Python.

💡 Goal: Transform raw passenger data into structured insights using categorization and sampling.

🧠 Step 1: Categorizing Passengers

We create a new column called age_group using conditional logic.

🎯 Rules

Age < 18 and Class = 1 → Young Rich
Age > 50 → Senior
Age between 18–50 → Middle-aged
Others → Other

📖 Why Categorization Matters

Categorization helps simplify complex datasets by grouping similar records. This improves analysis, visualization, and decision-making.

📊 Step 2: Extract Samples

Sampling allows us to examine a smaller subset of the dataset without processing everything.

Filter dataset by each age group
Select first 3 entries using .head(3)

💡 Insight: Sampling reduces complexity while preserving representative data.

🔗 Step 3: Concatenate Data

We combine all sampled groups into a single DataFrame using:

pd.concat([group1, group2, group3])

This allows easy comparison across categories.

💻 Complete Code Example

import pandas as pd
import numpy as np

# Sample dataset
df['age_group'] = np.where(
    (df['Age'] < 18) & (df['Pclass'] == 1), 'Young Rich',
    np.where(df['Age'] > 50, 'Senior',
    np.where((df['Age'] >= 18) & (df['Age'] <= 50), 'Middle-aged', 'Other'))
)

# Sampling
young_rich = df[df['age_group'] == 'Young Rich'].head(3)
senior = df[df['age_group'] == 'Senior'].head(3)
middle = df[df['age_group'] == 'Middle-aged'].head(3)
other = df[df['age_group'] == 'Other'].head(3)

# Concatenation
final_sample = pd.concat([young_rich, senior, middle, other])

print(final_sample)

🖥 CLI Output Sample

   Name      Age  Pclass  age_group
1  John      15      1    Young Rich
2  Alice     60      2    Senior
3  Mark      35      3    Middle-aged

📂 Expand CLI Explanation

The output displays categorized passengers along with their selected attributes. Each row represents a sampled passenger from different categories.

📐 Mathematical Logic Behind Conditions

The categorization logic can be expressed mathematically:

YoungRich = (Age < 18) ∧ (Class = 1)
Senior = (Age > 50)
Middle = (18 ≤ Age ≤ 50)

📖 Expand Mathematical Explanation

Logical operators such as AND (∧) and inequalities define how conditions are applied. This ensures each passenger falls into exactly one category.

🧮 Mathematical Explanation of Categorization Logic

The passenger categorization can be formally described using mathematical logic and piecewise functions. Each passenger is assigned to exactly one category based on their age (A) and class (C).

📌 Piecewise Function Representation

f(A, C) =
{
  "Young Rich"     if (A < 18) ∧ (C = 1)
  "Senior"         if (A > 50)
  "Middle-aged"    if (18 ≤ A ≤ 50)
  "Other"          otherwise
}

📊 Logical Breakdown

A < 18 → Young passengers
C = 1 → First-class passengers
A > 50 → Senior passengers
18 ≤ A ≤ 50 → Middle-aged group

📖 Expand Deep Explanation

This logic follows a hierarchical evaluation similar to nested conditional statements. The first condition has higher priority, meaning if a passenger satisfies (A < 18 AND C = 1), they are immediately classified as "Young Rich".

Mathematically, this ensures:

Mutual Exclusivity → No passenger belongs to more than one group
Collective Exhaustiveness → Every passenger is categorized

In Boolean algebra terms:

YoungRich = (A < 18) ∧ (C = 1)
Senior = (A > 50)
Middle = (A ≥ 18) ∧ (A ≤ 50)
Other = NOT (YoungRich ∨ Senior ∨ Middle)

This guarantees a complete partitioning of the dataset.

💡 Key Insight: This is a classic example of a piecewise function combined with Boolean logic used in data classification.

🎯 Key Takeaways

Use np.where for conditional categorization
Sampling helps simplify analysis
pd.concat merges datasets efficiently
Structured data improves insights

📌 Final Thoughts

This workflow demonstrates how simple transformations can make datasets far more useful. By combining categorization, sampling, and merging techniques, you gain better control over your data analysis process.

Pages

Tuesday, August 20, 2024

Categorizing and Sampling Titanic Dataset by Age Group

🚢 Passenger Categorization & Sampling in Python

📑 Table of Contents

📘 Introduction

🧠 Step 1: Categorizing Passengers

🎯 Rules

📊 Step 2: Extract Samples

🔗 Step 3: Concatenate Data

💻 Complete Code Example

🖥 CLI Output Sample

📐 Mathematical Logic Behind Conditions

🧮 Mathematical Explanation of Categorization Logic

📌 Piecewise Function Representation

📊 Logical Breakdown

🎯 Key Takeaways

📌 Final Thoughts

No comments:

Post a Comment

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers