Data Visualization

Data visualization

Humans are visual creatures. Thus, visualization is one of the most important tools for conveying information and data scientists should be adapt at selecting appropriate visualizations.

Which plot is appropriate?

Choosing an appropriate plot for a given set of data can be hard because there are so many types of plots such as scatter plots, box plots, and histograms. Fortunately, I have created an overview of the most important plots, when they are appropriate, and how they can be used in R.

Posts about data visualization

The following posts deal with topics from data visualization.

Linear, Quadratic, and Regularized Discriminant Analysis

Linear, Quadratic, and Regularized Discriminant Analysis

Discriminant analysis encompasses methods that can be used for both classification and dimensionality reduction. Linear discriminant analysis (LDA) is particularly popular because it is both a classifier and a dimensionality reduction technique. Quadratic discriminant analysis (QDA) is a variant of LDA that allows for non-linear separation of data. Finally, regularized discriminant analysis (RDA) is a compromise between LDA and QDA. This post focuses mostly on LDA and explores its use as a classification and visualization technique, both in theory and in practice.
Radar plots

Radar plots

0
Radar plots visualize several variables using a radial layout. This plot is most suitable for visualizing and comparing the properties associated with individual objects. In the following, we will use a radar plot for comparing the characteristics of whiskeys from different distilleries. A data set on whiskey Some of you may already know that radar plots are well-suited for visualizing whiskey flavors. I saw this type of visualization first, when I visited the Talisker distillery, the only whiskey distillery on the Isle of Skye.
Getting Your Point Across with Infographics

Getting Your Point Across with Infographics

Nowadays, infographics are everywhere. Fortunately, you do not have to be a professional designer to create them because there are several free platforms that assist you in creating engaging infographics. In this post, I compare three freely available tools for creating static infographics: Venngage, easelly, and Infogram. Each of the tools is reviewed according to three criteria: Customizability: number of available templates, graphics, fonts and so on. User experience: how easy is it to design/deploy infographics?
Box Plot Alternatives: Beeswarm and Violin Plots

Box Plot Alternatives: Beeswarm and Violin Plots

1
Box plots are great as they do not only indicate the median value but also show the variation of the measurements in terms of the 1st and 3rd quartiles. There are, however, also plots that provide a bit of additional information. Here, we take a closer look at potential alternatives to the box plot: the beeswarm and the violin plot. The beeswarm plot An implementation of the beeswarm plot is available via the beeswarm package.
Visualizing Time-Series Data with Line Plots

Visualizing Time-Series Data with Line Plots

2

The line plot is the go-to plot for visualizing time-series data (i.e. measurements for several points in time) as it allows for showing trends along time. Here, we’ll use stock market data to show how line plots can be created using native R, the MTS package, and ggplot.

Bar Plots and Error Bars

Bar Plots and Error Bars

0

Bar plots display quantities according to the height of bars. Since standard bar plots do not indicate the level of variation in the data, they are most appropriate for showing individual values (e.g. count data) rather than aggregates of several values (e.g. arithmetic means). Although variation can be shown through error bars, this is only appropriate if the data are normally distributed.

Comparing Medians and Inter-Quartile Ranges Using the Box Plot

Comparing Medians and Inter-Quartile Ranges Using the Box Plot

0

The box plot is useful for comparing the quartiles of quantitative variables. More specifically, lower and upper ends of a box (the hinges) are defined by the first (Q1) and third quartile (Q3). The median (Q2) is shown as a horizontal line within the box. Additionally, outliers are indicated by the whiskers of the boxes whose definition is implementation-dependent. For example, in geom_boxplot of ggplot2, whiskers are defined by the inter-quartile range (IQR = Q3 - Q1), extending no further than 1.5 * IQR.

Determining the Distribution of Data Using Histograms

Determining the Distribution of Data Using Histograms

0

It is always useful to spend some time exploring a new data set before processing it further and analyzing it. One of the most convenient ways to get a feel for the data is plotting a histogram. The histogram is a tool for visualizing the frequency of measurements in terms of a bar plot. Here we’ll take a closer look at how the histogram can be used in R.