## Posts

### Interpreting Generalized Linear Models

0
Interpreting generalized linear models (GLM) obtained through glm is similar to interpreting conventional linear models. Here, we will discuss the differences that need to be considered. Basics of GLMs GLMs enable the use of linear models in cases where the response variable has an error distribution that is non-normal. Each distribution is associated with a specific canonical link function. A link function $$g(x)$$ fulfills $$X \beta = g(\mu)$$. For example, for a Poisson distribution, the canonical link function is $$g(\mu) = \text{ln}(\mu)$$.

### Finding a Suitable Linear Model for Ozone Prediction

0
In a previous post, I have introduced the airquality data set in order to demonstrate how linear models are interpreted. In this post, I will start with a basic linear model and, from there, try to find a linear model with a better fit. Data preprocessing Since the airquality data set contains some missing values, we will remove those before we begin to fit models and select 70% of the samples for training and use the remainder for testing:

### Interpreting Linear Prediction Models

0
Although linear models are one of the simplest machine learning techniques, they are still a powerful tool for predictions. This is particularly due to the fact that linear models are especially easy to interpret. Here, I discuss the most important aspects when interpreting linear models by example of ordinary least-squares regression using the airquality data set. The airquality data set The airquality data set contains 154 measurements of the following four air quality metrics as obtained in New York:

### Getting Your Point Across with Infographics

0
Nowadays, infographics are everywhere. Fortunately, you do not have to be a professional designer to create them because there are several free platforms that assist you in creating engaging infographics. In this post, I compare three freely available tools for creating static infographics: Venngage, easelly, and Infogram. Each of the tools is reviewed according to three criteria: Customizability: number of available templates, graphics, fonts and so on. User experience: how easy is it to design/deploy infographics?

### Box Plot Alternatives: Beeswarm and Violin Plots

1
Box plots are great as they do not only indicate the median value but also show the variation of the measurements in terms of the 1st and 3rd quartiles. There are, however, also plots that provide a bit of additional information. Here, we take a closer look at potential alternatives to the box plot: the beeswarm and the violin plot. The beeswarm plot An implementation of the beeswarm plot is available via the beeswarm package.

### Visualizing Time-Series Data with Line Plots

2

The line plot is the go-to plot for visualizing time-series data (i.e. measurements for several points in time) as it allows for showing trends along time. Here, we’ll use stock market data to show how line plots can be created using native R, the MTS package, and ggplot.

### Staticman: An Alternative to Disqus for Comments on Static Sites

10

Comments are an important aspect of many websites, particularly blogs, whose success depends on their ability to create communities. However, including comments is inherently more difficult for static websites than for dynamic websites (e.g. managed through Wordpress). With Hugo, comments can be easily integrated via Disqus. The disadvantage, however, is that foreign JavaScript code needs to be executed and that the comments are not part of the page itself. Here, I will explain how comments can be integrated into a web page using Staticman.

### Bar Plots and Error Bars

0

Bar plots display quantities according to the height of bars. Since standard bar plots do not indicate the level of variation in the data, they are most appropriate for showing individual values (e.g. count data) rather than aggregates of several values (e.g. arithmetic means). Although variation can be shown through error bars, this is only appropriate if the data are normally distributed.

### Comparing Medians and Inter-Quartile Ranges Using the Box Plot

0

The box plot is useful for comparing the quartiles of quantitative variables. More specifically, lower and upper ends of a box (the hinges) are defined by the first (Q1) and third quartile (Q3). The median (Q2) is shown as a horizontal line within the box. Additionally, outliers are indicated by the whiskers of the boxes whose definition is implementation-dependent. For example, in geom_boxplot of ggplot2, whiskers are defined by the inter-quartile range (IQR = Q3 - Q1), extending no further than 1.5 * IQR.

### Using probability distributions in R: dnorm, pnorm, qnorm, and rnorm

0

R is a great tool for working with distributions. However, one has to know which specific function is the right wrong. Here, I’ll discuss which functions are available for dealing with the normal distribution: dnorm, pnorm, qnorm, and rnorm.