The purpose of this article is to explain how to set up Data Description
Purpose:
The purpose of this extension is to provide the user with a statistical summary of their data that can be used to determine what data preparation tasks are necessary for further analysis.
Inputs:
- Data table
Steps to Run:
- Go to the Tools menu
- Select the Data Science option
- Select the Data Description option
- Enter the data table the analysis should be performed on.
Data Description Inputs
- Data Table - Choose your data table
- Check box for transformation for :
[*, 1/x , log10, sqrt] - Click Ok.
Outputs: Without Transformation
- The output is a single table.
- The first column (column) specifies the column being analyzed.
- The second column (count) counts the number of records, which should be the same number for all columns, as it does not exclude nulls.
- The third column (type) lists the data type of the specified column.
- The fourth column (num.na) counts how many cells have nulls.
- The remaining columns describe the data in terms of min, max, mean, median and other statistical measures.
With Transformation:
- 1 table visualization ( 17 columns) of statistical summary and distribution characteristic of each numeric column (same as above)
- 1 table visualization (5 columns) for transformation results for normality.
How to filter a subset of a data:
- Open the filter panel by clicking the filter icon on the top bar.
- Choose the correct filtering scheme
- Click "Refresh Data Table" icon on the Data Description table.
See Data Description Video below
For additional information on RAI Data Science Toolkit documentation, click here.