The main aim of this question is to understand why we need activation functions in a neural network. You can start off by giving a simple explanation of how neural networks are built:
Step 1: Calculate the sum of all the inputs (X) according to their weights and include the bias term:
Z = (weights * X) + bias
Step 2: Apply an activation function to calculate the expected output:
Y = Activation(Z)
Steps 1 and 2 are performed at each layer. If you recollect, this is nothing but forward propagation! Now, what if there is no activation function?
Our equation for Y essentially becomes:
Y = Z = (weights * X) + bias
Wait – isn’t this just a simple linear equation? Yes – and that is why we need activation functions. A linear equation will not be able to capture the complex patterns in the data – this is even more evident in the case of deep learning problems.
In order to capture non-linear relationships, we use activation functions, and that is why a neural network without an activation function is just a linear regression model.