Choosing Performance metric for Imbalanced Classification Problem

seen from Türkiye
seen from Türkiye
seen from China

seen from Argentina
seen from United States

seen from Ireland

seen from Canada
seen from China
seen from Malaysia
seen from China
seen from China
seen from United States
seen from United Kingdom
seen from China
seen from China
seen from South Korea
seen from Malaysia

seen from Canada
seen from Germany
seen from United States
Choosing Performance metric for Imbalanced Classification Problem
Class Imbalance in ML
Imbalanced classification refers to the classification predictive modelling problem where the number of examples in the training dataset for each class label is not balanced.
If there is a dataset consisting of 10000 genuine and 10 fraudulent transactions, the classifier will tend to classify fraudulent transactions as genuine transactions. The reason can be easily explained by the numbers. Suppose the machine learning algorithm has two possibly outputs as follows:
Model 1 classified 7 out of 10 fraudulent transactions as genuine transactions and 10 out of 10000 genuine transactions as fraudulent transactions.
Model 2 classified 2 out of 10 fraudulent transactions as genuine transactions and 100 out of 10000 genuine transactions as fraudulent transactions.
If we take the number of mistakes made as to the performance of the model, Model 1 has only 17 errors but Model 2 has 102 errors. However, if we want to minimize the fraudulent transactions we should use Model 2. But any machine learning algorithm will generally pick Model 1 resulting in passing a lot of fraudulent transactions unrestricted.
Better Metrics
We can better metrics than just counting the errors, such as:
True Positive (TP) – An example that is positive and is classified correctly as positive
True Negative (TN) – An example that is negative and is classified correctly as negative
False Positive (FP) – An example that is negative but is classified wrongly as positive
False Negative (FN) – An example that is positive but is classified wrongly as negative
Now let's find the performance of our models with respect to our new metrics.
In our case, our primary focus is to reduce the number of fraudulent transactions as much as possible, i.e lesser number of false negatives. So, calculating the False Negative rate for both our Models,
Model 1:
FNR_M1 = 7/ (7+3)
FNR_M1 = 0.7
Model 2:
FNR_M2 = 2/ (2+8)
FNR_M2 = 0.2
Now we see that the False Negative rate of Model 1 is at 70% while the False Negative rate of Model 2 is just 20% which makes it a better classifier.