ROC and precision-recall curves are a staple for the interpretation of binary classifiers. This post gives an intuition on how these curves are constructed and their associated AUCs are interpreted.
Inference is concerned with learning about the data generation process, while prediction is concerned with estimating the outcome for new observations. These contrasting principles are associated with the the generative modeling and machine learning communities. Here, I showcase the differences and similarities between the two concepts and offer insights about what the practitioners from both fields can learn from each other.
For multi-class prediction scenarios, we can use similar performance measures as for binary classification. Here, I explain how we can obtain the (weighted) accuracy, micro- and macro-averaged F1-scores, and a generalization of the AUC to the multi-class setting.
datascienceblog.net now exists already for more than one month. In this post I offer a look behind the scenes of the blog and show the progress that has been made with respect to content, features of the blog, SEO, and search traffic.
Linear discriminant analysis (LDA) is a classification and dimensionality reduction technique that is particularly useful for multi-class prediction problems. In this post I investigate the properties of LDA and the related methods of quadratic discriminant analysis and regularized discriminant analysis.
Bayesian modeling does not have to be tedious. Using probabilistic programming it is relatively easy to implement statistical models that make use of MCMC sampling. In this post, I explore probabilistic programming using Stan.
Performance measures for feature selection should consider the complexity of the model in addition to the fit of the model. Popular feature selection criteria are the adjusted R squared, the Cp statistic, and the AIC.
Precision and recall are frequently used for model selection. However, compared to sensitivity and recall, these performance metrics are not generally valid and should only be used in certain settings.
One of the main criteria indicating the quality of a machine learning models is its predictive performance. However, suitable performances measures differ depending on the prediction task. This post investigates the most commonly used quantities that are used for selecting regression and classification models.
Variables can be identified by their value as well as their role. Variables are categorized into quantitative, categorical, and ordinal variables, depending on their values. Moreover, when variables are used in statistical models, additional terms are used to indicate their role such as dependent, independent, and confounding variable.