๐ข Passenger Categorization & Sampling in Python
๐ Table of Contents
- Introduction
- Categorizing Passengers
- Extracting Samples
- Concatenating Data
- Complete Code Example
- CLI Output
- Mathematical Logic
- Key Takeaways
- Related Articles
๐ Introduction
Data analysis often begins with structuring raw data into meaningful categories. In this guide, we explore how to categorize passengers based on age and class, extract meaningful samples, and combine them for analysis using Python.
๐ง Step 1: Categorizing Passengers
We create a new column called age_group using conditional logic.
๐ฏ Rules
- Age < 18 and Class = 1 → Young Rich
- Age > 50 → Senior
- Age between 18–50 → Middle-aged
- Others → Other
๐ Why Categorization Matters
Categorization helps simplify complex datasets by grouping similar records. This improves analysis, visualization, and decision-making.
๐ Step 2: Extract Samples
Sampling allows us to examine a smaller subset of the dataset without processing everything.
- Filter dataset by each age group
- Select first 3 entries using
.head(3)
๐ Step 3: Concatenate Data
We combine all sampled groups into a single DataFrame using:
pd.concat([group1, group2, group3])
This allows easy comparison across categories.
๐ป Complete Code Example
import pandas as pd
import numpy as np
# Sample dataset
df['age_group'] = np.where(
(df['Age'] < 18) & (df['Pclass'] == 1), 'Young Rich',
np.where(df['Age'] > 50, 'Senior',
np.where((df['Age'] >= 18) & (df['Age'] <= 50), 'Middle-aged', 'Other'))
)
# Sampling
young_rich = df[df['age_group'] == 'Young Rich'].head(3)
senior = df[df['age_group'] == 'Senior'].head(3)
middle = df[df['age_group'] == 'Middle-aged'].head(3)
other = df[df['age_group'] == 'Other'].head(3)
# Concatenation
final_sample = pd.concat([young_rich, senior, middle, other])
print(final_sample)
๐ฅ CLI Output Sample
Name Age Pclass age_group 1 John 15 1 Young Rich 2 Alice 60 2 Senior 3 Mark 35 3 Middle-aged
๐ Expand CLI Explanation
The output displays categorized passengers along with their selected attributes. Each row represents a sampled passenger from different categories.
๐ Mathematical Logic Behind Conditions
The categorization logic can be expressed mathematically:
YoungRich = (Age < 18) ∧ (Class = 1) Senior = (Age > 50) Middle = (18 ≤ Age ≤ 50)
๐ Expand Mathematical Explanation
Logical operators such as AND (∧) and inequalities define how conditions are applied. This ensures each passenger falls into exactly one category.
๐งฎ Mathematical Explanation of Categorization Logic
The passenger categorization can be formally described using mathematical logic and piecewise functions.
Each passenger is assigned to exactly one category based on their age (A) and class (C).
๐ Piecewise Function Representation
f(A, C) =
{
"Young Rich" if (A < 18) ∧ (C = 1)
"Senior" if (A > 50)
"Middle-aged" if (18 ≤ A ≤ 50)
"Other" otherwise
}
๐ Logical Breakdown
- A < 18 → Young passengers
- C = 1 → First-class passengers
- A > 50 → Senior passengers
- 18 ≤ A ≤ 50 → Middle-aged group
๐ Expand Deep Explanation
This logic follows a hierarchical evaluation similar to nested conditional statements. The first condition has higher priority, meaning if a passenger satisfies (A < 18 AND C = 1), they are immediately classified as "Young Rich".
Mathematically, this ensures:
- Mutual Exclusivity → No passenger belongs to more than one group
- Collective Exhaustiveness → Every passenger is categorized
In Boolean algebra terms:
YoungRich = (A < 18) ∧ (C = 1) Senior = (A > 50) Middle = (A ≥ 18) ∧ (A ≤ 50) Other = NOT (YoungRich ∨ Senior ∨ Middle)
This guarantees a complete partitioning of the dataset.
๐ฏ Key Takeaways
- Use
np.wherefor conditional categorization - Sampling helps simplify analysis
pd.concatmerges datasets efficiently- Structured data improves insights
๐ Final Thoughts
This workflow demonstrates how simple transformations can make datasets far more useful. By combining categorization, sampling, and merging techniques, you gain better control over your data analysis process.
No comments:
Post a Comment