Need a holiday from data science? Then this page is for you because this category encompasses all the posts that are not directly associated with data science. Until now, these posts have mostly dealt with blogging with Hugo but let’s see what the future brings. Anyway, I don’t plan to stray too far away from the intended focus of the blog, so there should never be too many posts under this category.
Machine learning is a field of artificial intelligence (AI) that is concerned with learning from data. Machine learning has three components:
Supervised learning: Fitting predictive models using data for which outcomes are available. Unsupervised learning: Transforming and partitioning data where outcomes are not available. Reinforcement learning: on-line learning in environments where not all events are observable. Reinforcement learning is frequently applied in robotics. Posts on machine learning In the following posts, machine learning is applied to solve problems using R.
Humans are visual creatures. Thus, visualization is one of the most important tools for conveying information and data scientists should be adapt at selecting appropriate visualizations.
Which plot is appropriate? Choosing an appropriate plot for a given set of data can be hard because there are so many types of plots such as scatter plots, box plots, and histograms. Fortunately, I have created an overview of the most important plots, when they are appropriate, and how they can be used in R.
As a data scientist, it is important to have a deep understanding of statistics. Here, I introduce basic statistical concepts and quantities.
Types of measurements and variables Important statistical concepts include the following:
Types of measurement scales Nomenclature for variables: dependent vs independent variables Statistical quantities You should definitely know about the following, frequently used statistical quantities:
Centrality measures: mean and median, mode Measure of dispersion: standard deviation, variance, covariance, interquartile-range Interval estimates: confidence intervals Probability distributions Commonly occuring probability distributions are:
Using statistical tests, it is possible to make a statement about the significance of a set of measurements by calculating a test statistic. If it is unlikely to obtain a test statistic at least as extreme as the observed value, then the result is significant. For example, at a significance level of 5%, the probability of a false positive test result would be bounded by roughly 5%.
Parametric vs non-parametric tests There is a multitude of tests for determining statistical significance.
This website makes use of cookies to enhance the browsing experience and provide additional functionality. None of this data can or will be used to identify or contact you. This website makes use of third party cookies, see the details in the privacy policy. This website makes use of tracking cookies, see the details in the privacy policy.
If you prefer, you can select which types of cookies you feel acceptable:
To learn more about how this website uses cookies or localStorage, please read our PRIVACY POLICY.
By clicking Allow cookies you give your permission to this website to store small bits of data on your device.
By clicking Disallow cookies, or by scrolling the page, you deny your consent to store any cookies and localStorage data for this website, eventually deleting already stored cookies (some parts of the site may stop working properly).