The purpose of this article is to explain how to set up QQ Plot
Purpose
The purpose of the QQ plot is to test for normal distribution in a column of data. Many predictive models require that the data be normally distributed. If it is not, then the model will not work well. This plot can help a user determine if their data is normally distributed and whether or not it should be fed into a given predictive model.
Limitation
- Can only run one column at a time
Steps to Run:
- Go to the Tools menu
- Scroll down to Data Science
- Select QQ Plot
- Configure the Inputs (See below)
QQ Plot Inputs:
- Data table - Choose your data table for this analysis.
- Value Column - Match the value column from the data table (1.)
- Group Columns - Choose the enable group column
- In the drop-down, select the group column
- Log Scale - Option to use Log Scale
Outputs:
- The output is a single scatter plot visualization.
- Normal Quantiles are on the x-axis and the Sample Quantiles are on the y-axis.
Example:
Interpretation
- If the data is normally distributed, it will fall along the line in the plot.
- If the data is not normal, then you can apply transformations, select a subset of data, or remove outliers.
How to filter a subset of a data:
- Open the filter panel by clicking the filter icon on the top bar.
- Choose the correct filtering scheme
- Click Refresh Data Table icon on the QQ Plot visualization.
See RAI QQ Plot video below:
Data Science Toolkit - QQ Plot from Ruths.ai on Vimeo.