Basic Statistical Concepts for Data Science

Basic statistics

As a data scientist, it is important to have a deep understanding of statistics. Here, I introduce basic statistical concepts and quantities.

Types of measurements and variables

Important statistical concepts include the following:

  • Types of measurement scales
  • Nomenclature for variables: dependent vs independent variables

Statistical quantities

You should definitely know about the following, frequently used statistical quantities:

  • Centrality measures: mean and median, mode
  • Measure of dispersion: standard deviation, variance, covariance, interquartile-range
  • Interval estimates: confidence intervals

Probability distributions

Commonly occuring probability distributions are:

  • Uniform distribution: all values are equally likely
  • Normal distribution: a bell-shaped curve, typical for many population characteristics (e.g. IQs, heights)
  • Poisson distribution: an integer distribution that is ideal for count data
  • Exponential distribution: a heavy-tailed distribution

Posts on basic statistics

You can find eplanations of basic statistical concepts and their use in R in the following posts.

Statistical Nomenclature for Variables

0

Variables can be identified by their value as well as their role. Variables are categorized into quantitative, categorical, and ordinal variables, depending on their values. Moreover, when variables are used in statistical models, additional terms are used to indicate their role such as dependent, independent, and confounding variable.