Skip to content

Exploratory Data Analysis (EDA)

EDA is an approach of analyzing datasets to summarize their main characteristics, often using statistical graphics and other data visualization methods.

Think of this as telling story through data or presenting data in a way that actually conveys a point, which does not come to one easy. Some people have a knack of it, while most learn it over time.

Why EDA?

  • Explore - get a feel of the data, null values, missing values, data size, what can be plotted, etc
  • Inform - to inform your findings downstream and establish some baselines. For example, if the data is categorical, trees work well with them
  • Communicate - communicate results effectively based on audience, as business may not have the same background as you do about machine learning metrics like AUC or RMSC.

Data Types

The kind of data types that you'll come across:

  • Quantitative / numerical continuous - 1, 3.5, 10^10
  • Quantitative / numerical discrete - 1, 2, 3, 4
  • Qualitative / categorical unordered - cat, dog, whale
  • Qualitative / categorical ordered - good, better, best
  • Date or time
  • Text