The purpose of this article is to explain how to set up QQ Plot


Purpose

The purpose of the QQ plot is to test for normal distribution in a column of data.  Many predictive models require that the data be normally distributed.  If it is not, then the model will not work well.  This plot can help a user determine if their data is normally distributed and whether or not it should be fed into a given predictive model.

Limitation

  • Can only run one column at a time

Steps to Run:

  1. Go to the Tools menu
  2. Scroll down to Data Science 
  3. Select QQ Plot
  4. Configure the Inputs (See below)

QQ Plot Inputs:

  1. Data table - Choose your data table for this analysis.
  2. Value Column - Match the value column from the data table (1.) 
  3. Group Columns - Choose the enable group column 
  4. In the drop-down, select the group column
  5. Log Scale - Option to use Log Scale 

Outputs:

  • The output is a single scatter plot visualization.  
  • Normal Quantiles are on the x-axis and the Sample Quantiles are on the y-axis.  

Example:

Interpretation

  • If the data is normally distributed, it will fall along the line in the plot.
  • If the data is not normal, then you can apply transformations, select a subset of data, or remove outliers.

How to filter a subset of a data:

  1. Open the filter panel by clicking the filter icon on the top bar.
  2. Choose the correct filtering scheme 
  3. Click Refresh Data Table icon on the QQ Plot visualization.

See RAI QQ Plot video below:

For additional information on RAI Data Science Toolkit documentation, click here.