Data Science Toolkit PCA User Guide: How to set up PCA (Principal Components Analysis) : Ruths.ai Product Support

The purpose of this article is to explain how to set up PCA (Principal Component Analysis)

Purpose

The purpose of PCA analysis is to convert a set of columns that are possibly correlated into a set of uncorrelated variables called principal components. It makes sense to combine multiple columns into principal components in order to speed up the run time of a model, reduce noise or otherwise optimize a model. PCA is a one step in the predictive modeling process.

Prep

Convert categorical variable to binary columns

Limitations

Only runs on complete case data, so is very sensitive to missing data
Only runs on numeric data, so categorical variables would need to be converted to binary columns

Steps to Run:

Go to the Tools menu
Select the Data Science option
Select the PCA option
Enter the data table, columns to be included and a tolerance if desired

PCA Input

Data table
Columns that would be put into a predictive model (remember the PCA will attempt to combine)
Option to Add All column or Clear All columns checked.
Tolerance - cutoff in variance (from 0 to 1) of the first PC, reduces the number of PCs returned. Leave blank to not enforce a tolerance cutoff.
Click Ok when finishing selection.

Outputs(4 Visualization):

PC Values on Original Data - Scatter Plot
PCA Rotation Matrix table
Variance explained by each PC bar chart. There will be one bar for each principal component.
PCA Plot for top components - Scatter Plot

Note: Columns are also appended to the original data table for PC

Example

How to Filter a subset of a data

Open the filter panel by clicking the filter icon on the top bar
Choose the correct filtering scheme
Click "Refresh data table" icon on PC Values on Original Data.
- Note: This also syncs with all the other visualization on PCA.

See PCA in action video below

Data Science Toolkit: PCA from Ruths.ai on Vimeo.

For additional information on RAI Data Science Toolkit documentation, click here.

Data Science Toolkit PCA User Guide: How to set up PCA (Principal Components Analysis) Print