Dimensionality reduction is primarily used for exploring data and for reducing the feature space in machine learning applications. In this post, I investigate techniques such as PCA to obtain insights from a whiskey data set and show how PCA can be used to improve supervised approaches. Finally, I introduce the notion of the whiskey twilight zone.
In this post, I clean up and augment a data set that provides their taste characteristics. The improved data set is augmented with the regions where the distilleries are situated, as well as their geological location in terms of longitude and latitude.
Radar plots are exceptional for visualizing the properties of individual objects. Here, I demonstrate how to draw radar plots in R by plotting the properties of whiskeys from several distilleries.
Generalized linear models (GLMs) are related to conventional linear models but there are some important differences. For example, GLMs are based on the deviance rather than the conventional residuals and they enable the use of different distributions and linker functions. This post investigates how these aspects influence the interpretation of GLMs.
Although ordinary least-squares regression is often used, it is not appropriate for all types of data. Using the airquality data set, I try to find a generalized linear model that fits the data better. For this purpose, I use the following methods: weighted regression, Poisson regression, and imputation.
Linear machine learning models are very convenient for interpretation. This post discusses the following aspects: residuals, coefficients, standard errors, p-values, the F-statistic, and much more.
People without technical backgrounds can have a hard time understanding plots. A less formal means for conveying information is provided by infographics, which are easily understandable. This post compares several free tools for creating engaging infographics.
Box plots are limited since they only show Q1, Q2, and Q3. Box plot alternatives such as the beeswarm and violin plot, however, provide more information about the overall distribution of the data.
Line plots are ideally suited for visualizing time series data. Using some stock market data, I demonstrate how line plots can be generated using native R, the MTS package, and ggplot.
Staticman is an API that can be used to implement a commenting system for static websites. Here, I discuss how I managed to set up my own instance of the Staticman API and how it can be integrated into a Hugo site.