The purpose of this article is to explain how to setup QQ Plot



Purpose: The purpose of the QQ plot is to test for normal distribution in a column of data.  Many predictive models require that the data be normally distributed.  If it is not, then the model will not work well.  This plot can help a user determine if their data is normally distributed and whether or not it should be fed into a given predictive model.




Limitations: 

  • Can only run one column at a time




Steps to Run:

  1. Go to the Tools menu
  2. Select the Data Science option
  3. Select the QQ plot option
  4. Enter the data table, value column name, options to group column and use log scale the analysis should be performed on









Inputs: 

  1. Data table
  2. Value Column name
  3. Check group columns 
  4. In the drop down select the group column
  5. log scale






Outputs: 

  • The output is a single scatter plot visualization.  
  • Normal Quantiles are on the x-axis and the Sample Quantiles are on the y-axis.  



Example:




Interpretation: 

  • If the data is normally distributed, it will fall along the line in the plot.
  • If the data is not normal, then you can apply transformations, select a subset of data, or remove outliers.




How to filter a subset of a data:

  1. Open the filter panel by clicking the filter icon on the top bar.
  2. Choose the correct filtering scheme 
  3. Click "Refresh Data Table" icon on the QQ Plot visualization.






For additional information see video: