This is a pretty intuitive answer. As we saw above, we perform the convolution on ‘x’ one step at a time, to the right, and in the end, we got Z with dimensions 2 X 2, for X with dimensions 3 X 3.
Thus, to make the input size similar to the filter size, we make use of padding – adding 0s to the input matrix such that its new size becomes at least 7 X 7. Thus, the output size would be using the formula:
Dimension of image = (n, n) = 5 X 5
Dimension of filter = (f,f) = 7 X 7
Padding = 1 (adding 1 pixel with value 0 all around the edges)
Dimension of output will be (n+2p-f+1) X (n+2p-f+1) = 1 X 1