Dimensionality reduction is primarily used for exploring data and for reducing the feature space in machine learning applications. In this post, I investigate techniques such as PCA to obtain insights from a whiskey data set and show how PCA can be used to improve supervised approaches. Finally, I introduce the notion of the whiskey twilight zone.
Although all posts in this blog are somehow concerned with analyzing data, not all of them lead to new insights. Posts in the analysis series exhibit at least one of the following two properties:
- The analysis of the data is comprehensive (i.e. involving multiple approaches)
- The analysis leads to new insights
Posts in the analysis series
The following posts are concerned with the analysis of individual data sets.
Although ordinary least-squares regression is often used, it is not appropriate for all types of data. Using the airquality data set, I try to find a generalized linear model that fits the data better. For this purpose, I use the following methods: weighted regression, Poisson regression, and imputation.