The purpose of this article is to explain how to set up Data Description



Purpose:

 The purpose of this extension is to provide the user with a statistical summary of their data that can be used to determine what data preparation tasks are necessary for further analysis.

Inputs: 

  1. Data table

Steps to Run:

  1. Go to the Tools menu
  2. Select the Data Science option
  3. Select the Data Description option
  4. Enter the data table the analysis should be performed on.

Data Description Inputs

  1. Data Table - Choose your data table 
  2. Check box for transformation for :
    [*, 1/x , log10, sqrt]
  3. Click Ok.

Outputs: Without Transformation

  • The output is a single table.
  • The first column (column) specifies the column being analyzed.   
  • The second column (count) counts the number of records, which should be the same number for all columns, as it does not exclude nulls.
  • The third column (type) lists the data type of the specified column.
  • The fourth column (num.na) counts how many cells have nulls.
  • The remaining columns describe the data in terms of min, max, mean, median and other statistical measures.

With Transformation: 

  • 1 table visualization ( 17 columns) of statistical summary and distribution characteristic of each numeric column (same as above)
  • 1 table visualization (5 columns) for transformation results for normality. 

How to filter a subset of a data:

  1. Open the filter panel by clicking the filter icon on the top bar.
  2. Choose the correct filtering scheme 
  3. Click "Refresh Data Table" icon on the Data Description table.

See Data Description Video below

For additional information on RAI Data Science Toolkit documentation, click here.