The purpose of this article is to explain how to setup QQ Plot
Purpose: The purpose of the QQ plot is to test for normal distribution in a column of data. Many predictive models require that the data be normally distributed. If it is not, then the model will not work well. This plot can help a user determine if their data is normally distributed and whether or not it should be fed into a given predictive model.
- Can only run one column at a time
Steps to Run:
- Go to the Tools menu
- Select the Data Science option
- Select the QQ plot option
- Enter the data table, value column name, options to group column and use log scale the analysis should be performed on
- Data table
- Value Column name
- Check group columns
- In the drop down select the group column
- log scale
- The output is a single scatter plot visualization.
- Normal Quantiles are on the x-axis and the Sample Quantiles are on the y-axis.
- If the data is normally distributed, it will fall along the line in the plot.
- If the data is not normal, then you can apply transformations, select a subset of data, or remove outliers.
How to filter a subset of a data:
- Open the filter panel by clicking the filter icon on the top bar.
- Choose the correct filtering scheme
- Click "Refresh Data Table" icon on the QQ Plot visualization.