The purpose of this article is to explain Data Description


What is Data Description in Data Science Toolkit?

Purpose

The purpose of this extension is to provide the user with a statistical summary of their data that can be used to determine what data preparation tasks are necessary for further analysis.

Outputs:( No Transformations)

  • The output is a single table with 17 columns of statistical summary and distribution characteristic of each numeric column.
  • The first column (column) specifies the column being analyzed.
  • The second column (count) counts the number of records, which should be the same number for all columns, as it does not exclude nulls.
  • The third column (type) list the data type of the specified column.
  • The fourth column (num.na) counts how many cells have nulls.
  • The remaining columns describe the data in terms of min, max, median, and other statistical measures. 

    With Transformation

  • The output is a single table with 17 columns of statistical summary and distribution characteristic of each numeric column. (same as above)
  • 1 table visualization (5 columns) for transformation results for normality.

Note: Tables will be added to a new data table to use for further evaluation.

Example: (No Transformation)

Example: ( With Transformation below)

Data Science Toolkit Data Description User Guide: How to set up Data Description

See Data Description video below

Data Science Toolkit: Data Description from Ruths.ai on Vimeo.

For additional information on RAI Data Science Toolkit documentation, click here.