Design of Credit Model Design IIM Fintech Abrg

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Model Design: Ensuring Business Success of

Models

Copyright © 2020 by Boston Consulting Group. All rights reserved.


What Model to Build and Why?

November,2023 0
• Most common biases and how they creep into organizations
and analytics
– Availability: What is easily recalled or data most easily available
must be the one most relevant!!
Beware of the
– Confirmation: Looking for data or selective evidence
Biases in interpretations- often confused with 'business knowledge'
Decision
Making: – Anchoring: Initial value or interpretation persuades future
analytical findings to revolve around the past findings.
They can creep
into the analytical – Representativeness: By the degree to which characteristics
process via data, conform to a stereotypical perception of members of that group. Ex:

Copyright © 2020 by Boston Consulting Group. All rights reserved.


process Badly designed demographic data based application scorecard Or
adjustments, Small Sample Bias
assumptions – Survivorship: Analysis is done on specimens or samples which are
successful and neglecting samples which have failed;
– Motivational: Incentives, real or perceived, often lead to
probability estimates that do not accurately reflect his or her true
beliefs. Ex: Analysts hosued in different BU draw different conclusion
1

from same data


Data Driven Decision Making
Getting the Signal From the Noise

What is
data?
Data
• "Raw" observations • Structured or unstructured
• Limited meaning of its own • Unprocessed or Processed

What is
information

Copyright © 2020 by Boston Consulting Group. All rights reserved.


? Information
• Connecting the dots • May or may not be realistic
• Unraveling patterns/Trends/relationship • Skill required to connect dots

What is Predicted
value
knowledge?
Knowledge
• Know well to predict
• Validate understanding from previous relationship
2
Context Setting

Framing
-Essential to define the problem in pure business terms
-Does it require a model to solve the problem?
-Can the Problem be solved without building the model?

Nature of the 'Modeling' Problem


- Where in the observe-act-loop are we playing- Descriptive/
Predictive / Prescriptive
Converting the -Is it a problem of cognition? 'see', 'hear'
-Is it a problem of categorization? – 'people-like-you'
business -Is it problem of ordinal classification? –Hi-hazard, Low-hazard
problem into -Is it a problem of future event prediction? –Probability of default

an analytical What Modelling Outcome Will Solve the problem

Copyright © 2020 by Boston Consulting Group. All rights reserved.


-Acquiring/Accessing the Information- profiling customer
problem -Information aggregation? Reconstruction? Generation?
-Information driven insights? Pattern matching ?
-Information based decision-quality improvement

Nature of the Impact


-Outcome Irreversible vs reversible
-Action Repeatable vs non-repeatable
-Low Value-High Value 3
Context Setting

Different paths, But Same Destination?


One may lie, the other is dumb!
Statistical Learning Machine Learning
1. Sample Studies to understand population 1. Computer Science / Information Theory
Genesis 2. Belief in an underlying distribution with 2. Signal processing
defined(usually) parameters 3. Accuracy of capturing and interpreting signal supersedes
3. Samples are representative of that other considerations
underlying distribution 4. No assumptions of inherent underlying distribution; statistical
significance of predictors considered irrelevant

1. 'Statistical significance' explicitly strives of 1. Computationally intensive bootstrapping


Mechanism of differentiating signal from noise- chances of /Iterations

Copyright © 2020 by Boston Consulting Group. All rights reserved.


'Prediction' repeatability of observation-pushes for higher
confidence interval 2. No explicit concept of weights, 'weights' are
2. Variables tend to have 'weights' associated with incidental and related to importance in
it segregating events

1. Spurious regressions 1. Overfitting


Challenges and 2. Correlation obsession 2. Data Secularity- signal processing roots- data can be
3. Works best on 'structured data' unstructured
Holy Grails 3. Explain ability
4. Causal Inferencing
4. 'Gini' obsession- accuracy by whatever means

4
Context Setting

Mapping nature of the problem with algorithm :Choice of pure econometric to higher
order ANN- tradeoff between predictability vs stability vs explainability
High Value vs Low Value: Tendency to go for manual/pure judgmental for very hi-value
decisions- of course human bias impacts and decision quality not rigorously tested; Low value
decisions more likely to be automated; Bionic/ insight informed decision more optimal

Decisive vs Interactive: Is the agent interacting with the model to fine-tune results ex:
google search or the model throws its output after running an algo- pattern matching

Cost of Wrong Decision: ' Netflix' presenting movie you do not like vs giving away the wrong
loan or approving a transaction which should have been tagged as 'money laundering' activity

Copyright © 2020 by Boston Consulting Group. All rights reserved.


Epistemological Bias: Does the human agent, correctly or incorrectly, believes that they have
been solving the problem- Underwriting of large ticket loan; Is there a track record , societal
recognition that humans have been deciding on the same- legal judgement; A new data intense
problem with no track record of human's deciding on the same has a low epistemological bias
Repetitive Vs 'One-time': Models used in campaigns for product offering Vs actually giving a
loan/ making an investment

Data Structure: Structured Data vs Unstructured data


5
Designing a Predictive Risk Model
What Business Problem Does a Risk Model Solve?

1 2 3

Copyright © 2020 by Boston Consulting Group. All rights reserved.


Reduce Losses Grow Profitably Lend Faster

6
Element of a risk model design
Improving the risk
decision quality
Choosing Model with
highest business
What is the data, tech relevance
Exactly ,How will & org ability
the risk score be
used? Ensuring Adoption

Model Selection • Conversion of Score to


Usable format
Sophistication of Approach
• Meets all statistical
Business Problem requirements • Reason Code
• Data Availability

Copyright © 2020 by Boston Consulting Group. All rights reserved.


• Binary Yes-No decision OR Risk • length of historical data • Intuitiveness & Explain
• Multi-dimensional-validation
based differential • Level data digitization ability of variables and
• Power
underwriting treatment • Data capability results.
• Calibration
• Others
• Will it be used for pricing? • Technical maturity • Optimizing score-credit
• Flexibility of production policy interaction
• Choice of Metrics
• Will it be supporting Human platforms
• GINI vs KS (non-
Underwriter or facilitate exhaustive)
automatic decision-making • Business Maturity
• Temporal Stability
• Incentive structure
• Reporting lines
• Senior management
commitment
7
Variable's selection

The most significant variables are selected through multiple criteria

1 2 3 4
# Variables

50-5000 10-50

Missing value treatment Trend inspection & Identify similarly behaving Variable selection
transformation of variables variables – treat multicollinearity
• Diagnose nature of missing • Identify the intuitively trending • Variables that make same prediction and • Most significant variables post the
Activity

values variables will be independently sufficient first two steps to be included in

Copyright © 2020 by Boston Consulting Group. All rights reserved.


• Select imputation method • Transformation of variables will be • To eliminate 'double counting' in credit the model
• Impute missing values required to make the variables score
significant

Imputation
Bivariate plots Correlation matrix Predictive power – IV value
• Depends on the % of missing • To identify similarly performing • Variable with highest predictive
• PD trends plotted for each
Statistical tool /

values binary variables power from each Component


variable
analyses

• Four different imputation • Check for VIF


methods P value
Linear, Binary & WOE binning as
• Level of significance
per the observed trend
Univariate-logit
• Test significance of each 8
transformed variable
Default definition
%Accounts 50% Peak event >75% Peak event Peak event
(x+ DPD)
point point point
8

M1
6
M2
..

4 M3
M4
...
2 M5
..

Cohort analysis to 0
MOB1 MOB3 MOB6 MOB9 MOB12 MOB15 MOB18 MOB21 MOB24 MOB27 MOB30
identify optimal Month on Books (MOB)
default definition
which has a high • Monthly cohort plots (loan originated in same month) are tracked and evaluated w.r.t

Copyright © 2020 by Boston Consulting Group. All rights reserved.


different intensities(30+DPD,60+DPD, 90+DPD) for finalizing the bad loan definition
likelihood of
• Month on books (MOB) point where the line starts to stabilize or month over month
capturing lifetime (MoM) bad loan growth rate starts dropping is selected as possible definition option
losses
• Cohort curves stabilizing late might indicate a very high MOB evaluation which needs
a tradeoff with the data availability from a feasibility point of view

9
Segmentation

Choice of segmentation framework


The need to segment needs to be ascertained

1 2 3

Copyright © 2020 by Boston Consulting Group. All rights reserved.


Event-density based Event-driver based Information richness

10
Model calibration

Appropriate techniques to be adopted for different model calibration issue

Parallel offset Intersection Entangled

Default rate
Default rate Default rate
Calibration
challenges

Overpredicted
default

Underpredicted
default

Copyright © 2020 by Boston Consulting Group. All rights reserved.


0 0 0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

Deciles Deciles Deciles


Techniques

• Linear regression • Kernel regression


• Logit transform
• Testing ODR & prob with goodness of
to fix

• Model of models • Decision tress


fit test (parametric/non-parametric)

11
Validating the model performance out-side of training data is critical for
stake-holder confidence
Muti-dimensional model validation
a b c d e
Power Rank ordering Stability Temporal effect Calibration
Ability of the model to Model should be able to Power of individual Model should show Whether the synthetic
differentiate between rank order defaults in a variables and overall reasonable predictability ‘probability score’
good Vs bad monotonical fashion model should show outside the defined coming out of the model
reasonable stability model time period actually closely maps to
• Gini High power model does across seasons & the default rate
– for accuracy not necessitate rank business cycles • Temporal decay
across ordering
population • Bivariate strength • Hosmer-Lemeshow
• KS to be tested on

Copyright © 2020 by Boston Consulting Group. All rights reserved.


• Brier score
– accuracy at censored defaults
point
• AUC/ROC • PSI
• CSI

Focusing just on 'Power' The Model's Calibration is essential


of the model by performance should be is the probability score
measuring Gini / KS / stable. Wide of a model is to be
Precision-recall is just a fluctuations is model's directly used in further
start and by itself often performance reduce the decision
not enough utility of the model
12
Note: A model with High Gini need not be well calibrated and vice versa and same is true for rank ordering of defaults across bins

You might also like