Credit Scoring Fundamentals: From Logistic Regression to Machine Learning

Credit scoring is the quantitative backbone of lending. A score estimates the probability that a borrower will default within a defined time window, typically 12 months.

The Classic Approach: Logistic Regression

Logistic regression maps a set of borrower characteristics (income, debt-to-income ratio, payment history, credit utilization) to a probability of default. It is interpretable, fast to deploy, and regulatory-friendly — auditors can inspect the coefficients and explain decisions to applicants under GDPR or ECOA requirements.

Weight of Evidence (WoE) encoding and Information Value (IV) remain standard preprocessing steps: they transform raw variables into monotonic, bounded inputs that stabilize the model.

Scorecard Format

The raw log-odds from a logistic model are typically converted into a points-based scorecard — a lookup table where each characteristic-value combination contributes a defined number of points. The final score maps to a PD estimate and a risk grade.

Machine Learning Alternatives

Gradient boosted trees (XGBoost, LightGBM) consistently outperform logistic regression on Gini coefficient benchmarks. The tradeoff: they are harder to explain, require more governance overhead, and can encode protected attributes indirectly through correlated proxies.

SHAP values have become the standard tool for post-hoc explainability in ML credit models, enabling feature-level contribution decomposition on individual predictions.

Choosing the Right Approach

	Logistic	GBM
Interpretability	High	Medium (with SHAP)
Performance (Gini)	Baseline	+5–15% typical
Regulatory ease	High	Medium
Development time	Low	Medium–High

For most retail lending contexts, a well-tuned logistic scorecard with WoE encoding delivers 90–95% of the performance with a fraction of the governance cost.