Monday, October 7, 2024

Sigmoid vs Tanh: Understanding Key Activation Functions in Neural Networks

Sigmoid vs Tanh Activation Functions | Complete Deep Learning Guide

Sigmoid vs Tanh Activation Functions (Complete Guide)

📌 Table of Contents

Introduction
Sigmoid Function
Tanh Function
Mathematical Understanding
Comparison
When to Use Each
Modern Perspective (ReLU)
Key Takeaways
Related Articles

Introduction

Activation functions are the backbone of neural networks. Without them, a neural network would behave like a simple linear model, no matter how many layers it has.

💡 Activation functions introduce non-linearity, allowing neural networks to learn complex patterns.

Sigmoid Function

The Sigmoid (logistic) function converts any input into a probability value between 0 and 1.

$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$

📊 Interpretation

If $ x \to +\infty $, then output → 1
If $ x \to -\infty $, then output → 0

Output range: (0,1)
Used in binary classification
Suffers from vanishing gradient

💻 Code Example


import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Tanh Function

The Tanh function expands the output range to include negative values.

$$ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} $$

📊 Interpretation

If $ x \to +\infty $, output → 1
If $ x \to -\infty $, output → -1

Output range: (-1,1)
Zero-centered
Better gradient flow than Sigmoid

💻 Code Example


import numpy as np

def tanh(x):
    return np.tanh(x)

📊 Mathematical Deep Dive

Derivative of Sigmoid

$$ \sigma'(x) = \sigma(x)(1 - \sigma(x)) $$

This derivative becomes very small when $ \sigma(x) $ is near 0 or 1 → causing vanishing gradients.

Derivative of Tanh

$$ \tanh'(x) = 1 - \tanh^2(x) $$

This maintains stronger gradients near zero compared to Sigmoid.

Vanishing Gradient Concept

Gradient-based learning depends on:

$$ \frac{\partial L}{\partial w} $$

If gradients shrink → learning slows dramatically.

Comparison

Feature	Sigmoid	Tanh
Range	(0,1)	(-1,1)
Zero-centered	No	Yes
Gradient	Weak	Stronger
Usage	Output layer	Hidden layers

When to Use Each

Sigmoid: Binary classification, probabilities
Tanh: Hidden layers, faster convergence

Modern Perspective (ReLU)

Today, ReLU is preferred:

$$ f(x) = \max(0, x) $$

It avoids vanishing gradients for positive values.

💡 Sigmoid & Tanh are still important for understanding neural networks.

🎯 Key Takeaways

Sigmoid outputs probabilities
Tanh is zero-centered
Both suffer from vanishing gradients
ReLU is modern default

Conclusion

Sigmoid and Tanh are foundational activation functions that shaped modern deep learning. Understanding their mathematical behavior provides insight into how neural networks learn.

Pages

Monday, October 7, 2024

Sigmoid vs Tanh Activation Functions (Complete Guide)

📌 Table of Contents

Introduction

Sigmoid Function

📊 Interpretation

💻 Code Example

Tanh Function

📊 Interpretation

💻 Code Example

📊 Mathematical Deep Dive

Derivative of Sigmoid

Derivative of Tanh

Vanishing Gradient Concept

Comparison

When to Use Each

Modern Perspective (ReLU)

🎯 Key Takeaways

Conclusion

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers