*The purpose of this article is to explain how to set up PCA (Principal Component Analysis)*

*The purpose of this article is to explain how to set up PCA (Principal Component Analysis)*

### Purpose

The purpose of PCA analysis is to convert a set of columns that are possibly correlated into a set of uncorrelated variables called principal components. It makes sense to combine multiple columns into principal components in order to speed up the run time of a model, reduce noise or otherwise optimize a model. PCA is a one step in the predictive modeling process.

**Prep**

- Convert categorical variable to binary columns

**Limitations**

- Only runs on complete case data, so is very sensitive to missing data
- Only runs on numeric data, so categorical variables would need to be converted to binary columns

**Steps to Run:**

- Go to the
**Tools**menu - Select the
**Data Science**option - Select the
**PCA**option - Enter the data table, columns to be included and a tolerance if desired

**PCA Input**

- Data table
- Columns that would be put into a predictive model (remember the PCA will attempt to combine)
- Option to Add All column or Clear All columns checked.
- Tolerance - cutoff in variance (from 0 to 1) of the first PC, reduces the number of PCs returned. Leave blank to not enforce a tolerance cutoff.
- Click Ok when finishing selection.

**Outputs(4 Visualization): **

- PC Values on Original Data - Scatter Plot
- PCA Rotation Matrix table
- Variance explained by each PC bar chart. There will be one bar for each principal component.
- PCA Plot for top components - Scatter Plot

**Note**: Columns are also appended to the original data table for PC

**Example**

#### How to Filter a subset of a data

- Open the filter panel by clicking the filter icon on the top bar
- Choose the correct filtering scheme
- Click "Refresh data table" icon on PC Values on Original Data.

- Note: This also syncs with all the other visualization on PCA.

#### See PCA in action video below

Data Science Toolkit: PCA from Ruths.ai on Vimeo.

For additional information on RAI Data Science Toolkit documentation, click here.