Supervised Machine Learning — An Introduction To Support Vector Machines

Common use cases for logistic regression and walk through the steps of applying it for data forecasting.

In my last post, I walked you through Logistic Regression as a binary classification model. In this blog, I’ll share an overview of Support Vector Machines (SVM). I’ll cover common use cases for applying this model, its advantages, and the steps of applying this classification algorithm for data forecasting. Let’s continue the overview of supervised machine learning models!

Support Vector Machines: The basics

SVM is one of the most popular models to use for classification. It can be used for regression or ranking as well, but it’s the most common use case is classification. SVM is often used for image or text classification, face or speech recognition, document categorization. You can read more about support vector machines here.

Imagine we have 100 pictures of dogs and cats, and our task is to train our machine learning model to identify an image of a cat or a dog for the next 50 unseen pictures. In this scenario, the first given 100 pictures are our training set where we “teach” the model to recognize where a cat or a dog is, and the next set is a testing data for which the model will run a prediction (classification).

In the implementation, SVM looks for a line (or lines) that correctly classifies the number of input data points (dogs and cats features from our training set). From those lines, it chooses the one which has the longest distance to the given closest points which are called support vectors:

Image for post

Original Image Source: Rohith Gandhi, Medium

Advantages of support vector machines

Choosing the right classification model depends on many factors: memory size, over-fitting tendency, parameterizations, number of features, etc. SVM is popular for its high accuracy and low computation power. The model prevents the form of over-fitting and has good generalizations. Finally, it works very well for both linear and nonlinear data, with unstructured and structured data with the help of Kernels.

Predictive Modeling With SVM

1. Problem Definition And Tools Overview

I’ll use the Titanic challenge again (as I did in my previous article here) to walk through the steps of predictive modeling. As a quick refresher: there are two datasets that include real Titanic passenger information like name, age, gender, social class, fare, etc. One dataset, “Training,” has binary (yes or no) data for 891 passengers. Another one is “Testing” for which we have to predict which passengers will survive. We can use the training set to teach our SVM model with the given data patterns, and then use our testing set to evaluate its performance. SVM doesn’t return the probability, but rather directly gives us the binary value of Survived or Didn’t Survive (1 or 0).

I’m using Python Pandas, Seaborn statistical graph, and Scikit-Learn ML package for analysis and modeling.

2. Exploratory Data Analysis

We can conduct an exploratory data analysis to get the feeling of a dataset, its patterns, and features.

To start with below is a chart which illustrates the survival rate for a training dataset:

Image for post

We can see that the not survived rate is higher. Our goal is to predict this rate for the rest of the passengers in our testing set using data we have about them.

This graph below us the social class break down based on survived rate:

Image for post

As we see, the first class has the highest survival rate, and the third class — the lowest.

We have to run more analysis for other input features like age, social class, family size. The full exploratory analysis is available here.

3. Feature Engineering

There is one more step before our modeling: feature engineering. We have to clean and prepare our data for prediction. To understand which data features have to be to transform, we can build a correlation plot to see the connections between given features:

Image for post

We don’t have high correlations which might affect our prediction model. Besides age, we also might need to look into parents/children (parch) and fare values.

Some of the things we can do with the missing age data values:

Image for post

Or missing fare values:

Image for post

4. Prediction And Model Evaluation

The same way as the last time, we use a test split feature, and will run the forecast for 20% our sample:

Image for post

And, running our prediction using SVM:

Image for post

We got a 58% accuracy score which is much lower than the one we had last time using Logistic Regression (76%). That might be due to some noise in our data. We can increase the accuracy score by doing more feature engineering to extract the most value from the input features.

There is no one perfect algorithm that would work the best for every problem. Given the size and structure of the dataset, you should try many different algorithms, such as Logistic Regression, Support Vector Machines, KNN, Naive Bayes Classifier, Decision Tree Classifier, Random Forest for your problem, maybe tune them and then select “a winner”.