Credit scoring is the quantitative backbone of lending. A score estimates the probability that a borrower will default within a defined time window, typically 12 months.
The Classic Approach: Logistic Regression
Logistic regression maps a set of borrower characteristics (income, debt-to-income ratio, payment history, credit utilization) to a probability of default. It is interpretable, fast to deploy, and regulatory-friendly — auditors can inspect the coefficients and explain decisions to applicants under GDPR or ECOA requirements.
Weight of Evidence (WoE) encoding and Information Value (IV) remain standard preprocessing steps: they transform raw variables into monotonic, bounded inputs that stabilize the model.
Scorecard Format
The raw log-odds from a logistic model are typically converted into a points-based scorecard — a lookup table where each characteristic-value combination contributes a defined number of points. The final score maps to a PD estimate and a risk grade.
Machine Learning Alternatives
Gradient boosted trees (XGBoost, LightGBM) consistently outperform logistic regression on Gini coefficient benchmarks. The tradeoff: they are harder to explain, require more governance overhead, and can encode protected attributes indirectly through correlated proxies.
SHAP values have become the standard tool for post-hoc explainability in ML credit models, enabling feature-level contribution decomposition on individual predictions.
Choosing the Right Approach
| Logistic | GBM | |
|---|---|---|
| Interpretability | High | Medium (with SHAP) |
| Performance (Gini) | Baseline | +5–15% typical |
| Regulatory ease | High | Medium |
| Development time | Low | Medium–High |
For most retail lending contexts, a well-tuned logistic scorecard with WoE encoding delivers 90–95% of the performance with a fraction of the governance cost.