Yet Another Data Science Blog: DataFrame operations

Tuesday, August 20, 2024

How Axis Works in NumPy and pandas: A Clear Guide

In both NumPy and pandas, the term "axis" refers to a dimension along which operations are performed. However, the concept is used differently in each library due to their distinct structures.

### NumPy

In NumPy, arrays are multidimensional (e.g., 1D, 2D, 3D), and the `axis` parameter refers to the dimensions of these arrays. Here’s how it works:

- **1D Array**: The axis is always `0`, which refers to the only dimension available.

- **2D Array**: Axis `0` refers to rows (the first dimension), and axis `1` refers to columns (the second dimension).

- **3D Array**: Axis `0` refers to layers, axis `1` refers to rows, and axis `2` refers to columns.

When applying functions like `np.sum()` or `np.mean()`, you specify the axis to indicate along which dimension the operation should be performed. For example:

import numpy as np

array = np.array([[1, 2, 3], [4, 5, 6]])

# Sum along axis 0 (sum over rows)

result_axis0 = np.sum(array, axis=0) # Output: [5 7 9]

# Sum along axis 1 (sum over columns)

result_axis1 = np.sum(array, axis=1) # Output: [ 6 15]

### pandas

In pandas, which operates primarily with DataFrames and Series, the `axis` parameter is used differently:

- **DataFrame**: Axis `0` refers to rows (along the vertical axis), and axis `1` refers to columns (along the horizontal axis). When using functions like `df.sum()` or `df.mean()`, specifying `axis=0` will apply the function column-wise, while `axis=1` will apply it row-wise.

- **Series**: The Series object is essentially a 1D array with an index, so axis is always `0`, and operations are applied along this single dimension.

Example with DataFrame:

import pandas as pd

df = pd.DataFrame({

'A': [1, 2, 3],

'B': [4, 5, 6]

})

# Sum along axis 0 (sum over columns)

result_axis0 = df.sum(axis=0) # Output: A 6, B 15

# Sum along axis 1 (sum over rows)

result_axis1 = df.sum(axis=1) # Output: 0 5, 1 7, 2 9

In summary, while `axis` in NumPy and pandas both denote dimensions along which operations are performed, NumPy uses it to denote array dimensions directly, and pandas uses it to refer to the orientation within DataFrames and Series.

Yet Another Data Science Blog

Pages

Tuesday, August 20, 2024

How Axis Works in NumPy and pandas: A Clear Guide

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

Popular Posts

Posts Per Category

🎮 AI Fun Zone

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Explore AI Hub

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers