First I’d like to explain what is the use of a confusion matrix. Classification Problems are solved using Supervised Machine learning algorithms. In these problems, our goal is to categories an object using its features. For e.g, Identify a fruit using its taste, color and size or check out if a patient has a disease or not using symptoms. Building a model is not a one time deal, we have to do many experiments and record the output and check the performance of the model on each experiment.
Confusion Matrix
So the Confusion Matrix is the technique we use to measure the performance of classification models. This post is dedicated to explaining the confusion matrix using real-life examples and In the end, you’ll be able to construct a confusion matrix and evaluate the performance model.
The Confusion Matrix is in a tabular form where each row represents actual classes and columns are predicated classes. As the name suggests, it’s really confusing for beginners to understand it. We create a table where each column has a special meaning and tells the number of correct or incorrect predictions with respect to actual values.
PREDICATED CLASS | |||
POSITIVE | NEGATIVE | ||
ACTUAL CLASS | POSITIVE | TRUE POSITIVE | FALSE NEGATIVE |
NEGATIVE | FALSE POSITIVE | TRUE NEGATIVE |
Keep in mind, This is very important that you understand the above 4 terms otherwise you won’t be able to go further in evaluating the process.
Understand Confusion Matrix
Problem Statement – Check the accuracy of a model which predict if a user is coronavirus infected or not using symptoms
Experiment 1:
Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected.
Prediction – Our model observes the symptoms and predicted that 2 users are infected and 8 users are not infected.
In our problem, Infected users are labeled as Positive Class and non-infected users are labeled as Negative Class.
TRUE POSITIVE
It’s a correct classification. It tells how many positive classes are correctly classified.
Calculation – Lab Result reported that 3 users are infected and our model says 2 users are infected. So TRUE POSITIVE is 2
The result – 2 out of 3 infected users are correctly classified.
FALSE NEGATIVE
It’s an incorrect classification. It tells how many positive classes are incorrectly classified.
Calculation – Labs say 3 users are infected and our model says 2 users are infected so 1 infected user is incorrectly classified. So FALSE NEGATIVE is 1
The Result – 1 out of 3 infected users are incorrectly classified.
Please note the difference between TRUE POSITIVE and FALSE NEGATIVE.
FALSE POSITIVE
It’s an incorrect classification. It tells how many negative classes are incorrectly classified.
Calculation – There is no wrong prediction about negative classes. All 7 negative classes are correctly classified. FALSE POSITIVE is 0
The Result – 0 out of 7 non-infected users are incorrectly classified.
TRUE NEGATIVE
It’s a correct classification. It tells how many negative classes are correctly classified.
Calculation – Labs say 7 users are not infected and our model says 8 users are not infected so all non-infected users are correctly classified. So TRUE NEGATIVE is 7
The Result – 7 out of 7 non-infected users are correctly classified.
Trick – here first word (TRUE or NEGATIVE) donates if model predicted correctly or not and the second word (POSITIVE or NEGATIVE) is predicated class.
This all is confusing. Right? Let’s do two more examples so everything will be clear.
Experiment 2:
Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected.
Prediction – Our model observes the symptoms and predicts that 4 users are infected and 6 users are not infected.
PREDICATED CLASS | |||
POSITIVE | NEGATIVE | ||
ACTUAL CLASS | POSITIVE | 3 | 0 |
NEGATIVE | 1 | 6 |
Experiment 3:
Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected.
Prediction – Our model observes the symptoms and predicts that 8 users are infected and 2 users are not infected.
PREDICATED CLASS | |||
POSITIVE | NEGATIVE | ||
ACTUAL CLASS | POSITIVE | 3 | 0 |
NEGATIVE | 5 | 2 |
Conclusion
The first step to evaluating a model is to construct the confusion matrix. This confusing matrix measures the performance of your model and the goal to keep TRUE POSITIVE and TRUE NEGATIVE high and FALSE NEGATIVE and FALSE POSITIVE low.
Term | Result | Meaning |
TRUE POSITIVE(TP) | Correct Classification | Positive class Identified as Positive |
FALSE POSITIVE(FP) | Incorrect Classification | Positive class Identified as Negative |
TRUE NEGATIVE(TN) | Correct Classification | Negative class Identified as Negative |
FALSE NEGATIVE(FN) | Incorrect Classification | Negative class Identified as Positive |