Reshaping & Transforming DataFrames in Pandas
A practical guide to stack, reshape, pivot, melt, and reorient data
When working with tabular data, reshaping and transforming your DataFrame is a common and powerful requirement. Pandas provides multiple tools to reorganize data depending on your analytical needs.
This guide walks through several essential techniques using a simple two-column DataFrame.
Sample DataFrame
import pandas as pd
df = pd.DataFrame({
"A": [1, 2, 3],
"B": [4, 5, 6]
})
print(df)
1️⃣ Reshape into a Series and Back to a DataFrame
Objective & Explanation
The goal is to flatten the DataFrame into a single column.
- stack() converts columns into rows
- reset_index() removes the hierarchical index
- Convert the result back into a DataFrame
series = df.stack().reset_index(drop=True) single_col_df = series.to_frame(name="values") print(single_col_df)
Benefit
This approach is useful when you need a single list of all values for aggregation, iteration, or visualization.
2️⃣ Reshape into a 2×3 DataFrame
Objective & Explanation
This method reorganizes the data into a specific shape.
- Flatten values using
values.flatten() - Reshape using NumPy-style reshaping
reshaped = pd.DataFrame(df.values.flatten().reshape(2, 3)) print(reshaped)
Benefit
Useful for reformatting data for specific matrix-based operations or visualization layouts.
3️⃣ Pivoting a DataFrame
Objective & Explanation
Pivoting reorganizes the DataFrame so that:
- One column becomes the index
- Another column becomes the columns
- Remaining values fill the table
df_pivot = df.copy() df_pivot["index"] = ["row1", "row2", "row3"] pivoted = df_pivot.pivot(index="index", columns="A", values="B") print(pivoted)
Benefit
Pivoting is ideal for reorienting data to highlight relationships between categories or dimensions.
4️⃣ Melt and Pivot
Objective & Explanation
This two-step transformation provides maximum flexibility:
- Melt: Convert wide data into long format
- Pivot: Rebuild the DataFrame in a new structure
melted = df.melt(var_name="variable", value_name="value") print(melted) pivot_back = melted.pivot(columns="variable", values="value") print(pivot_back)
Benefit
This pattern is especially useful for complex datasets, time series analysis, and tidy-data workflows.
๐ก Key Takeaways
- Stack simplifies data into a single column
- Reshape reorganizes values into a new matrix
- Pivot reorients data for clarity
- Melt + Pivot enables flexible transformations
- Choosing the right method depends on your analytical goal
No comments:
Post a Comment