Data Analysis

Data analysis

Although all posts in this blog are somehow concerned with analyzing data, not all of them lead to new insights. Posts in the analysis series exhibit at least one of the following two properties:

  • The analysis of the data is comprehensive (i.e.¬†involving multiple approaches)
  • The analysis leads to new insights

Posts in the analysis series

The following posts are concerned with the analysis of individual data sets.

Dimensionality Reduction for Visualization and Prediction

Dimensionality Reduction for Visualization and Prediction

0

Dimensionality reduction is primarily used for exploring data and for reducing the feature space in machine learning applications. In this post, I investigate techniques such as PCA to obtain insights from a whiskey data set and show how PCA can be used to improve supervised approaches. Finally, I introduce the notion of the whiskey twilight zone.

Finding a Suitable Linear Model for Ozone Prediction

Finding a Suitable Linear Model for Ozone Prediction

0

Although ordinary least-squares regression is often used, it is not appropriate for all types of data. Using the airquality data set, I try to find a generalized linear model that fits the data better. For this purpose, I use the following methods: weighted regression, Poisson regression, and imputation.