In Data Science, the confusion matrix is a measure of the health of a model. In particular, it helps to measure the performance of a supervised learning model. For this article, I will detail how to create a confusion matrix for a binary classification model both manually and using an sklearn built-in function called `metrics.confusion_matrix`

. I will show both examples using Python 3 in a Jupyter Notebook.

To calculate a confusion matrix for a binary classification model, you need to have two lists, the same length, that consist of the 1’s and 0’s that represent each row in your test data. One of the lists is the actual known values and the second list is the prediction values.

For example, lets say your binary classification model is to predict who lived (1) or died (0) on the Titanic. Given a list of details for 1300 passengers (the x data), for each passenger in the list we are also given a column that indicates whether that particular passenger survived or perished (the y data). This y data can be translated into 1’s and 0’s representing survival, and is our “known” or “actual” values. The list of predictions (“preds”) comes from how well the model performed in making its survival predictions based on the x data. For example, lets say our known y data looks like this:

Actual: [1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1]

where each 1 corresponds to a row in the x data of a person who survived. And when our model made its predictions, it produced a list the same length, where it gets some of the answers correct, but obviously some of the answers wrong:

Preds: [0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1]

Each number in these two lists corresponds to the same row in the x data. The first value in the “actual” list corresponds to the first value in the “preds” list, so we can see that the correct answer for the first passenger is 1, but the predicted value for that passenger in the preds list is 0. This means that our model predicted incorrectly for the first passenger. The first passenger survived (1), but our model predicted that the passenger did not survive (0).

With a binary classification system there are four possible outcomes for each pair of actual and prediction values.

- If the actual value is 1 and the prediction is 1, this is considered a True Positive or
**TP**. - If the actual value is 0 and the prediction is 0, this is considered a True Negative or
**TN**. - If the actual value is 1 and the prediction is 0, this is considered a False Negative or
**FN**. - If the actual value is 0 and the prediction is 1, this is considered a False Positive or
**FP**.

As can be seen from the list above, wrong predictions begin with the word “False” and correct predictions begin with the word “True”.

Given the definitions above, it is not difficult to iterate through both lists and determine whether each prediction is a TP, TN, FP, or FN. I created a function that does just that, but counts each outcome and places the results into a matrix array that looks like this:

cm = [[TN, FP],[FN, TP]]

My function is called manually_calculate_tp_tn_fp_fn, and looks like this:

I also wrote a function that would display the confusion matrix in a nice 2 x 2 box:

Here is a demonstration of how to generate a confusion matrix using sklearn’s built-in function called `metrics.confusion_matrix`

. This also uses my function called display_confusion_matrix to nicely display it:

Note: In order to use the sklearn function, you must import it first:

from sklearn.metrics import classification_report, confusion_matrix

After getting the confusion matrix from sklearn, here I used my function to manually confirm the values:

For more information about confusion matrix terminology see:

http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/