Monday, October 7, 2024

Sigmoid vs Tanh: Understanding Key Activation Functions in Neural Networks

Sigmoid vs Tanh Activation Functions | Complete Deep Learning Guide

Sigmoid vs Tanh Activation Functions (Complete Guide)

๐Ÿ“Œ Table of Contents


Introduction

Activation functions are the backbone of neural networks. Without them, a neural network would behave like a simple linear model, no matter how many layers it has.

๐Ÿ’ก Activation functions introduce non-linearity, allowing neural networks to learn complex patterns.

Sigmoid Function

The Sigmoid (logistic) function converts any input into a probability value between 0 and 1.

$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$

๐Ÿ“Š Interpretation

  • If \( x \to +\infty \), then output → 1
  • If \( x \to -\infty \), then output → 0
  • Output range: (0,1)
  • Used in binary classification
  • Suffers from vanishing gradient

๐Ÿ’ป Code Example

import numpy as np def sigmoid(x): return 1 / (1 + np.exp(-x))

Tanh Function

The Tanh function expands the output range to include negative values.

$$ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} $$

๐Ÿ“Š Interpretation

  • If \( x \to +\infty \), output → 1
  • If \( x \to -\infty \), output → -1
  • Output range: (-1,1)
  • Zero-centered
  • Better gradient flow than Sigmoid

๐Ÿ’ป Code Example

import numpy as np def tanh(x): return np.tanh(x)

๐Ÿ“Š Mathematical Deep Dive

Derivative of Sigmoid

$$ \sigma'(x) = \sigma(x)(1 - \sigma(x)) $$

This derivative becomes very small when \( \sigma(x) \) is near 0 or 1 → causing vanishing gradients.

Derivative of Tanh

$$ \tanh'(x) = 1 - \tanh^2(x) $$

This maintains stronger gradients near zero compared to Sigmoid.

Vanishing Gradient Concept

Gradient-based learning depends on:

$$ \frac{\partial L}{\partial w} $$

If gradients shrink → learning slows dramatically.


Comparison

Feature Sigmoid Tanh
Range (0,1) (-1,1)
Zero-centered No Yes
Gradient Weak Stronger
Usage Output layer Hidden layers

When to Use Each

  • Sigmoid: Binary classification, probabilities
  • Tanh: Hidden layers, faster convergence

Modern Perspective (ReLU)

Today, ReLU is preferred:

$$ f(x) = \max(0, x) $$

It avoids vanishing gradients for positive values.

๐Ÿ’ก Sigmoid & Tanh are still important for understanding neural networks.

๐ŸŽฏ Key Takeaways

  • Sigmoid outputs probabilities
  • Tanh is zero-centered
  • Both suffer from vanishing gradients
  • ReLU is modern default

Conclusion

Sigmoid and Tanh are foundational activation functions that shaped modern deep learning. Understanding their mathematical behavior provides insight into how neural networks learn.

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts