Every important decision boils down to a “yes” or “no” choice – should I buy, or not buy? Should I offer a payment plan, or not? Should I agree, or not agree?

The logistic regression model tries to predict the probability of an event happening by analyzing past data, using a 1 or 0 outcome in terms of “yes” or “no.” To understand how practical this method is, here are a few examples:

  • Customer Retention Strategy – Based on historical data, the model can predict the likelihood of a specific customer leaving, allowing us to focus on that segment.
  • Credit/Loan Approval – The model can assess the probability of a customer defaulting on a loan/credit, and recommend whether to grant the loan or not.
  • Fraud Diagnosis – A credit card company can detect the likelihood of fraudulent transactions based on card usage statistics.

The model’s formula looks like this (Excel file at the end of the article):

It may look complicated, but the essence is that a combination of features is related to the probability of “yes” or “no” via a linear equation. It will be easy to understand with an example:

Let’s assume I want to predict whether a debtor will pay their debt or default, based on historical data.

First, I need the history – data about debtors and actual statistics on which ones failed to pay their debts. Below is a table with 5 parameters:

  • Loan Amount
  • Annual Income
  • Credit Rating
  • Current Delinquency
  • Previous Period Defaults

The last column shows whether the customer actually defaulted, i.e., whether they were unable to pay their debt.

The data that the model receives can be anything, as long as it is logically related to the target probability of the outcome.

The next step involves building formulas for the Solver function (details will not be covered here):

Solver must arrange the data in the yellow column in such a way that the logistic probability reaches its maximum, which will ultimately provide predictions in the form of 1’s and 0’s, or “yes” and “no.” The upper right corner shows the probability (Cut-off) coefficient, which determines whether the customer falls into the “yes” or “no” category. In simple terms – it is stated that if the default (Cut-off) probability is above 0.6, then place “yes” in the cell, otherwise “no.”

The next step is comparing the predictions made by the model to the reality:

We will calculate coefficients such as:

  • Accuracy – the proportion of correct predictions in the total statistics – (30+459)/500;
  • Precision – the proportion of true positives among predicted positives – 459/462;
  • Recall – the proportion of of true positives among actual positives – 459/467;

Based on this data, we generate sensitivity analysis graphs that help us choose the optimal Cut-off probability level:

For example, here, the curves intersect at a certain point, indicating a good level for the model (0.5).

Then, the Receiver Operating Characteristic (ROC) curve is built based on False Positive and True Positive statistics:

This curve also helps us determine the right Cut-off level – we need to select a level where False Positive is minimized, and True Positive is maximized – in our case, it’s 0.55. This means that if the model shows a default probability above 0.55, the loan should not be given, and vice versa.

P.S.

Such calculations are usually done in specialized programs, Excel is not ideal for complex analysis, but it is often sufficient.

Logistic Regression Excel File

Source of Idea:
Business Analytics: Data Analysis & Decision Making by S. Albright, & Wayne Winston.