The purpose of this article is to explain PCA (Principal Component Analysis)




What is PCA in Data Science Toolkit?



Purpose: The purpose of PCA analysis is to convert a set of columns that are possibly correlated into a set of uncorrelated variables called principal components.  It makes sense to combine multiple columns into principal components in order to speed up the run time of a model, reduce noise or otherwise optimize a model.  PCA is a one step in the predictive modeling process.  



Additional Information on PCA:



Outputs: (4 Visualization):

  • PC Values on Original Data - Scatter Plot
  • PCA Rotation Matrix table
  • Variance explained by each PC bar chart. There will be one bar for each principal component.
  • PCA Plot for top components - Scatter Plot


- Note: Columns are also appended to the original data for PC


Example:



See user guide how to setup PCA: Data Science Toolkit PCA User Guide: How to setup PCA (Principal Components Analysis)



For additional Information watch the video:


Data Science Toolkit: PCA from Ruths.ai on Vimeo.