Firstly, this is one of the most important Machine Learning Interview Questions.
In the real world, we deal with multi-dimensional data. Thus, data visualization and computation become more challenging with the increase in dimensions. In such a scenario, we might have to reduce the dimensions to analyze and visualize the data easily. We do this by:
Removing irrelevant dimensions
Keeping only the most relevant dimensions
This is where we use Principal Component Analysis (PCA).
Finding a fresh collection of uncorrelated dimensions (orthogonal) and ranking them on the basis of variance are the goals of Principal Component Analysis.
The Mechanism of PCA:
Compute the covariance matrix for data objects
Compute the Eigen vectors and the Eigen values in a descending order
To get the new dimensions, select the initial N Eigen vectors
Finally, change the initial n-dimensional data objects into N-dimensions
Example: Below are the two graphs showing data points (objects) and two directions: one is ‘green’ and the other is ‘yellow.’ We got the Graph 2 by rotating the Graph 1 so that the x-axis and y-axis represent the ‘green’ and ‘yellow’ directions, respectively.
Explain Principal Component Analysis (PCA)
Output from PCA
After the rotation of the data points, we can infer that the green direction (x-axis) gives us the line that best fits the data points.
Here, we are representing 2-dimensional data. But in real-life, the data would be multi-dimensional and complex. So, after recognizing the importance of each direction, we can reduce the area of dimensional analysis by cutting off the less-significant ‘directions.’