The purpose of this article is to explain Data Description




What is Data Description in Data Science Toolkit?


Purpose: The purpose of this extension is to provide the user with a statistical summary of their data that can be used to determine what data preparation tasks are necessary for further analysis.



Outputs:( No Transformations)


  • The output is a single table with 17 columns of statistical summary and distribution characteristic of each numeric column.
  • The first column (column) specifies the column being analyzed.
  • The second column (count) counts the number of records, which should be the same number for all columns, as it does not exclude nulls.
  • The third column (type) list the data type of the specified column.
  • The fourth column (num.na) counts how many cells have nulls.
  • The remaining columns describe the data in terms of min, max, median, and other statistical measures. 


    With Transformation

  • The output is a single table with 17 columns of statistical summary and distribution characteristic of each numeric column. (same as above)
  • 1 table visualization (5 columns) for transformation results for normality.


- Note: Tables will be added to a new data table to use for further evaluation.



Example: (No Transformation)





Example: ( With Transformation)




See user guide how to setup Data Description: Data Science Toolkit Data Description User Guide: How to setup Data Description



Additional Information watch video: 


Data Science Toolkit: Data Description from Ruths.ai on Vimeo.