Showing posts with label Series operations. Show all posts
Showing posts with label Series operations. Show all posts

Tuesday, August 20, 2024

How Axis Works in NumPy and pandas: A Clear Guide

In both NumPy and pandas, the term "axis" refers to a dimension along which operations are performed. However, the concept is used differently in each library due to their distinct structures.

### NumPy

In NumPy, arrays are multidimensional (e.g., 1D, 2D, 3D), and the `axis` parameter refers to the dimensions of these arrays. Here’s how it works:

- **1D Array**: The axis is always `0`, which refers to the only dimension available.
- **2D Array**: Axis `0` refers to rows (the first dimension), and axis `1` refers to columns (the second dimension).
- **3D Array**: Axis `0` refers to layers, axis `1` refers to rows, and axis `2` refers to columns.

When applying functions like `np.sum()` or `np.mean()`, you specify the axis to indicate along which dimension the operation should be performed. For example:


import numpy as np

array = np.array([[1, 2, 3], [4, 5, 6]])
# Sum along axis 0 (sum over rows)
result_axis0 = np.sum(array, axis=0) # Output: [5 7 9]
# Sum along axis 1 (sum over columns)
result_axis1 = np.sum(array, axis=1) # Output: [ 6 15]


### pandas

In pandas, which operates primarily with DataFrames and Series, the `axis` parameter is used differently:

- **DataFrame**: Axis `0` refers to rows (along the vertical axis), and axis `1` refers to columns (along the horizontal axis). When using functions like `df.sum()` or `df.mean()`, specifying `axis=0` will apply the function column-wise, while `axis=1` will apply it row-wise.

- **Series**: The Series object is essentially a 1D array with an index, so axis is always `0`, and operations are applied along this single dimension.

Example with DataFrame:


import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# Sum along axis 0 (sum over columns)
result_axis0 = df.sum(axis=0) # Output: A 6, B 15
# Sum along axis 1 (sum over rows)
result_axis1 = df.sum(axis=1) # Output: 0 5, 1 7, 2 9


In summary, while `axis` in NumPy and pandas both denote dimensions along which operations are performed, NumPy uses it to denote array dimensions directly, and pandas uses it to refer to the orientation within DataFrames and Series.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts