Introduction

Machine learning is a branch of artificial intelligence that is about learning from data, i.e., extracting knowledge from data. It is an area of active research and roughly speaking, it stands at the intersection of statistics, artificial intelligence.

Arthur Samuel (1959): "Machine Learning gives computers the ability to learn without being explicitly programmed.”

Daily Life Examples

Machine learning algorithms are being used in many aspects of our everyday life. Face detection (cellphone's camera), voice recognition (Amazon Alexa, Siri), spam-detection (email app), fraud detection (credit card company), weather predictions (weather app), music recommendations (Spotify) etc. to name only a few.

Categories of Machine Learning

In a broad perspective, machine learning algorithms fall into two main categories: supervised learning and unsupervised learning.

Spervised Learning

Machine learning algorithms that make predictions by generalizing what they've learned from a given dataset of both inputs and outcomes (labels) are called supervised learning algorithms. The $i$th record in the dataset has featueres $x_{i,1},\,x_{i,2},\, \dots,\,x_{i,n}$ which form the input, and a corresponding outcome $y_i$. The relationship between the input data and outcome can be descibed using a function $f$ where $$ f([x_{i,1},\,x_{i,2},\, \dots,\,x_{i,n}])=f(X_i)=y_i $$ For example, when building a house price prediction model, your dataset with $n$ records might have 5 feature columns area, build year (age), zipcode, city, state, crime rate, and a column $y$ denoting the outcome, the house price.

$$ X=\left[ \begin{array}{c} X_{1}\\[1ex] X_{2}\\ \vdots\\ X_{n}\\[1ex] \end{array}\right]=\left[ \begin{array}{c|c|c|c|c} \strut\smash{\overbrace{x_{1,1}}^{\text{age}}} & \strut\smash{\overbrace{x_{1,2}}^{\text{zipcode}}} & \strut\smash{\overbrace{x_{1,3}}^{\text{crime}}} & \strut\smash{\overbrace{x_{1,4}}^{\text{city}}} & \strut\smash{\overbrace{x_{1,5}}^{\text{state}}} \\ x_{2,1}&x_{2,2} &x_{2,3} &x_{2,4} &x_{2,5} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ x_{n,1}&x_{n,2} &x_{n,3} &x_{n,4} &x_{n,5} \end{array}\right] \begin{matrix} \leftarrow \text{record } 1 \\[1ex] \leftarrow \text{record } 2 \\ \\[0.5ex] \leftarrow \text{record } n \end{matrix} $$ $$ {f}(X) = y, $$ The goal is to train a model $\hat{f}$ using these features and outcomes so that $$ \hat{f}(X) = \hat{y} \approx y. $$

In other words, we are trying to find a model/function/map that best maps the input variables to the outcome. The difference between machine learning algorithm lies in the strategy that they use to find this function.

Unsupervised Learning

Unsupervised learning algorithms differ from supervised algorithms in that the outcomes are no longer available, i.e., the data is not labeled. Our goal is no longer making predictions using the inputs simply because there is no label to predict! Instead, we focus on task such as

  • Clustering the input into groups that share similar features
  • Finding the most important features of the given input data known as Principal Component Analysis (PCA)
  • Finding those records in the dataset that significantly differ in one or more features from the rest of the dataset (anomaly detection)

Goal

As we saw earlier, the goal of machine learning algorithms, be it supervised or unsupervised, is to accurately predict the outcome $y$ for new data $X$ that the model has not seen before. The difference between these models is in the assumptions that they make about the mapping function $\hat{f}$; Is it linear or nonlinear, continous or discrete, differentiable or not, and so on.

Machine Learning vs. Statistical Inference

Both machine learning and statistical models can be used to make predictions. The main purpose of using statistical models, however, is to understand the relationship between the variables. Even though the statistical models can be used to make predictions, that is not the reason they have been developed. The same holds true for machine learning algorithms. We can gain some insights between the input/output variables if the machine learning algorithm is not too complex, e.g., simple linear regression. But as the degree of complexity of a model increases, interpretation of the relationship between the input and outcome variables becomes harder and harder. In other words, these algorithms sacrifice interpretability for predictive power.