The purpose of this article is to explain Random Forest
What is the Random Forest in Data Science Toolkit?
Purpose
The Random Forest model is a form of multivariate analysis. It uses an ensemble learning method for classification and regression. The model constructs a "forest" of decision trees and generally does not overfit.
Outputs
- There are 6 outputs
- The OOB Error Rate - Number of Treelines chart shows the reduction in error as more tree are added to the ensemble.
- The Definition text area provides an explanation of error rate and % variance.
- The Training Set Stats table shows a print out of summary information on the model parameter and goodness of fit.
- The importance per Variable bar chart shows which variable in the model was the most important.
- The Actual vs Predicted scatter plots shows the values of the predicted response variable versus the actual value of the response variable in the dataset.
- The predicted response vs predictor column scatter plot shows how closely predicted the response column based on the unique values of all the variable in the model.
Note: The new prediction data will be added to a column in the original data table.
Example:
Data Science Toolkit Random Forest User Guide: How to setup Random Forest
See Random Forest in action below
Data Science Toolkit: Random Forest from Ruths.ai on Vimeo.
For additional information on RAI Data Science Toolkit documentation, click here.