The purpose of this article is to explain Random Forest 




What is Random Forest in Data Science Toolkit?



Purpose: The Random Forest model is a form of multivariate analysis.  It uses an ensemble learning method for classification and regression.  The model constructs a "forest" of decision trees and generally does not overfit.




Outputs:

  • There are 6 outputs
  • The OOB Error Rate - Number of Tree lines chart shows the reduction in error as more tree are added to the ensemble.
  • The Definition text area provides an explanation of error rate and % variance. 
  • The Training Set Stats table shows a print out of summary information on the model parameter and goodness of fit.
  • The importance per Variable bar chart shows which variable in the model were the most important.
  • The Actual vs Predicted scatter plots shows the values of the predicted response variable versus the actual value of the response variable in the data set. 
  • The predicted response vs predictor column scatter plot shows how closely predicted the response column based on the unique values of all the variable in the model. 


- Note: The new prediction data will be added to a column in the original data table.




Example:




See user guide how to setup Random Forest: Data Science Toolkit Random Forest User Guide: How to setup Random Forest



For additional information watch the video:

Data Science Toolkit: Random Forest from Ruths.ai on Vimeo.