Skip to the content.

Neuron

An artificial neuron is a mathematical function that mimics the behavior of a biological neuron in our brain.
It receives multiple inputs, processes them using weights, adds bias, and passes them through an activation function to produce an output.

It is the core building block of any neural network.

Image Credit

Structure of an Artificial Neuron

  1. Inputs:
    • Data (features), e.g., Age, Income, etc.
  2. Weights:
    • Importance given to each input.
  3. Summation:
    • Weighted sum of inputs:
\[\large z = \underset{i=1}{\sum}^{n} w{\scriptstyle i} x{\scriptstyle i} + b\]

Where:

  1. Activation Function:
    • Applies a function to add non-linearity (like decision-making).
    • Examples: Sigmoid, ReLU, Tanh.
  2. Output:
    • Result after applying activation function.

Mathematical Formula

\[\large \text{Output} = \text{Activation} \left(\underset{i=1}{\sum}^{n} w{\scriptstyle i} x{\scriptstyle i} + b\right)\]

Where:


Analogy:

Imagine you are a teacher grading students based on: - Homework (weight = 0.4) - Exam (weight = 0.6)

You add the scores with different importance:

Final Grade = (Homework × 0.4) + (Exam × 0.6)

If the result is above a threshold, you decide Pass; else, Fail.

This is what a neuron does!


Key Concepts in Artificial Neuron

Component Role
Inputs (x1, x2, …) Raw data
Weights (w1, w2, …) Importance of each input
Bias (b) Adjustment to the output
Summation Weighted sum of inputs and bias
Activation Function Adds non-linearity (like decision-making)
Output Final result

Why do we use Activation Functions in Neurons?


Visual Representation of an Artificial Neuron

Why So Many Neurons in One Hidden Layer?

1. Each Neuron Learns a Pattern (Feature Extractor)

More neurons → More patterns/features can be learned.

2. Complex Data Requires More Capacity

More neurons → Better ability to capture complex patterns.

3. Universal Approximation Theorem

Layer Typical Neurons Example Why?
Input Layer Matches number of features (e.g., 8 for diabetes dataset) 1 neuron per feature
Hidden Layer 1 2x-3x of input neurons (e.g., 16) Capture various combinations of features
Hidden Layer 2 Smaller than first hidden layer (e.g., 8) Condense and refine patterns
Output Layer Depends on task (1 neuron for binary classification) Gives the final prediction

Analogy

Be careful

Problem Cause Solution
Underfitting Too few neurons, model too simple Add more neurons/layers
Overfitting Too many neurons, model memorizes data Reduce neurons, add regularization methods

Thumb Rules for Choosing Number of Neurons in Hidden Layers

1. Rule of Pyramid Shape

2. Between Input & Output Size

A simple formula to start:

\[\large \text {Neurons in Hidden Layer} = \frac{Input Neurons + Output Neurons}{2}\]

3. Heuristic: Power of 2

4. Don’t Over-Engineer Early

5. Empirical Tuning is Key

6. Regularization is Important

Dataset Size Recommended Neurons (Hidden Layer 1) Why?
Small (<1k samples) 1-2x input features Avoid overfitting
Medium (10k samples) 2-3x input features Balance complexity & generalization
Large (>100k samples) Start with higher (32-64 neurons) Can afford more capacity without overfitting

Why Do We Need Many Hidden Layers in Neural Networks?

1. Shallow vs Deep Neural Networks

2. Simple Problems → Few Layers

3. Why More Hidden Layers?

Hidden Layers Role / Purpose
1st hidden layer Learns basic features (e.g., edges in images, basic relations in data)
2nd hidden layer Learns more abstract features (e.g., shapes, combinations of previous features)
3rd & beyond Learns complex patterns (e.g., object parts, interactions, high-level concepts)

4. Real-World Example

5. Mathematical Justification

6. Practical Reasons for Many Layers

Reason Why It Helps
Hierarchical Learning Learns from simple to complex patterns step-by-step.
Reusability of Features Lower layers learn features reused by higher layers.
Better Representations Can model very complex non-linear relationships.
Scalability Easier to handle large datasets & tasks.

7. But Beware of Overfitting!


Key Takeaways: