Design of Credit Model Design IIM Fintech Abrg

Model Design: Ensuring Business Success of
Models
Copyright © 2020 by Boston Consulting Group. All rights reserved.

What Model to Build and Why?
November,2023 0
• Most common biases and how they creep into organizations
and analytics
– Availability: What is easily recalled or data most easily available
must be the one most relevant!!
Beware of the
– Confirmation: Looking for data or selective evidence
Biases in interpretations- often confused with 'business knowledge'
Decision
Making: – Anchoring: Initial value or interpretation persuades future
analytical findings to revolve around the past findings.
They can creep
into the analytical – Representativeness: By the degree to which characteristics
process via data, conform to a stereotypical perception of members of that group. Ex:

process Badly designed demographic data based application scorecard Or
adjustments, Small Sample Bias
assumptions – Survivorship: Analysis is done on specimens or samples which are
successful and neglecting samples which have failed;
– Motivational: Incentives, real or perceived, often lead to
probability estimates that do not accurately reflect his or her true
beliefs. Ex: Analysts hosued in different BU draw different conclusion
1
from same data

Data Driven Decision Making
Getting the Signal From the Noise
What is
data?
Data
• "Raw" observations • Structured or unstructured
• Limited meaning of its own • Unprocessed or Processed
What is
information

? Information
• Connecting the dots • May or may not be realistic
• Unraveling patterns/Trends/relationship • Skill required to connect dots
What is Predicted
value
knowledge?
Knowledge
• Know well to predict
• Validate understanding from previous relationship
2
Context Setting
Framing
-Essential to define the problem in pure business terms
-Does it require a model to solve the problem?
-Can the Problem be solved without building the model?
Nature of the 'Modeling' Problem

- Where in the observe-act-loop are we playing- Descriptive/
Predictive / Prescriptive
Converting the -Is it a problem of cognition? 'see', 'hear'
-Is it a problem of categorization? – 'people-like-you'
business -Is it problem of ordinal classification? –Hi-hazard, Low-hazard
problem into -Is it a problem of future event prediction? –Probability of default
an analytical What Modelling Outcome Will Solve the problem

-Acquiring/Accessing the Information- profiling customer
problem -Information aggregation? Reconstruction? Generation?
-Information driven insights? Pattern matching ?
-Information based decision-quality improvement
Nature of the Impact

-Outcome Irreversible vs reversible
-Action Repeatable vs non-repeatable
-Low Value-High Value 3
Context Setting
Different paths, But Same Destination?

One may lie, the other is dumb!
Statistical Learning Machine Learning
1. Sample Studies to understand population 1. Computer Science / Information Theory
Genesis 2. Belief in an underlying distribution with 2. Signal processing
defined(usually) parameters 3. Accuracy of capturing and interpreting signal supersedes
3. Samples are representative of that other considerations
underlying distribution 4. No assumptions of inherent underlying distribution; statistical
significance of predictors considered irrelevant
1. 'Statistical significance' explicitly strives of 1. Computationally intensive bootstrapping

Mechanism of differentiating signal from noise- chances of /Iterations

'Prediction' repeatability of observation-pushes for higher
confidence interval 2. No explicit concept of weights, 'weights' are
2. Variables tend to have 'weights' associated with incidental and related to importance in
it segregating events
1. Spurious regressions 1. Overfitting

Challenges and 2. Correlation obsession 2. Data Secularity- signal processing roots- data can be
3. Works best on 'structured data' unstructured
Holy Grails 3. Explain ability
4. Causal Inferencing
4. 'Gini' obsession- accuracy by whatever means
4
Context Setting
Mapping nature of the problem with algorithm :Choice of pure econometric to higher
order ANN- tradeoff between predictability vs stability vs explainability
High Value vs Low Value: Tendency to go for manual/pure judgmental for very hi-value
decisions- of course human bias impacts and decision quality not rigorously tested; Low value
decisions more likely to be automated; Bionic/ insight informed decision more optimal
Decisive vs Interactive: Is the agent interacting with the model to fine-tune results ex:
google search or the model throws its output after running an algo- pattern matching
Cost of Wrong Decision: ' Netflix' presenting movie you do not like vs giving away the wrong
loan or approving a transaction which should have been tagged as 'money laundering' activity

Epistemological Bias: Does the human agent, correctly or incorrectly, believes that they have
been solving the problem- Underwriting of large ticket loan; Is there a track record , societal
recognition that humans have been deciding on the same- legal judgement; A new data intense
problem with no track record of human's deciding on the same has a low epistemological bias
Repetitive Vs 'One-time': Models used in campaigns for product offering Vs actually giving a
loan/ making an investment
Data Structure: Structured Data vs Unstructured data

5
Designing a Predictive Risk Model
What Business Problem Does a Risk Model Solve?
1 2 3

Reduce Losses Grow Profitably Lend Faster
6
Element of a risk model design
Improving the risk
decision quality
Choosing Model with
highest business
What is the data, tech relevance
Exactly ,How will & org ability
the risk score be
used? Ensuring Adoption
Model Selection • Conversion of Score to

Usable format
Sophistication of Approach
• Meets all statistical
Business Problem requirements • Reason Code
• Data Availability

• Binary Yes-No decision OR Risk • length of historical data • Intuitiveness & Explain
• Multi-dimensional-validation
based differential • Level data digitization ability of variables and
• Power
underwriting treatment • Data capability results.
• Calibration
• Others
• Will it be used for pricing? • Technical maturity • Optimizing score-credit
• Flexibility of production policy interaction
• Choice of Metrics
• Will it be supporting Human platforms
• GINI vs KS (non-
Underwriter or facilitate exhaustive)
automatic decision-making • Business Maturity
• Temporal Stability
• Incentive structure
• Reporting lines
• Senior management
commitment
7
Variable's selection
The most significant variables are selected through multiple criteria
1 2 3 4
# Variables
50-5000 10-50
Missing value treatment Trend inspection & Identify similarly behaving Variable selection
transformation of variables variables – treat multicollinearity
• Diagnose nature of missing • Identify the intuitively trending • Variables that make same prediction and • Most significant variables post the
Activity
values variables will be independently sufficient first two steps to be included in

• Select imputation method • Transformation of variables will be • To eliminate 'double counting' in credit the model
• Impute missing values required to make the variables score
significant
Imputation
Bivariate plots Correlation matrix Predictive power – IV value
• Depends on the % of missing • To identify similarly performing • Variable with highest predictive
• PD trends plotted for each
Statistical tool /
values binary variables power from each Component

variable
analyses
• Four different imputation • Check for VIF

methods P value
Linear, Binary & WOE binning as
• Level of significance
per the observed trend
Univariate-logit
• Test significance of each 8
transformed variable
Default definition
%Accounts 50% Peak event >75% Peak event Peak event
(x+ DPD)
point point point
8
M1
6
M2
..
4 M3
M4
...
2 M5
..
Cohort analysis to 0
MOB1 MOB3 MOB6 MOB9 MOB12 MOB15 MOB18 MOB21 MOB24 MOB27 MOB30
identify optimal Month on Books (MOB)
default definition
which has a high • Monthly cohort plots (loan originated in same month) are tracked and evaluated w.r.t

different intensities(30+DPD,60+DPD, 90+DPD) for finalizing the bad loan definition
likelihood of
• Month on books (MOB) point where the line starts to stabilize or month over month
capturing lifetime (MoM) bad loan growth rate starts dropping is selected as possible definition option
losses
• Cohort curves stabilizing late might indicate a very high MOB evaluation which needs
a tradeoff with the data availability from a feasibility point of view
9
Segmentation
Choice of segmentation framework

The need to segment needs to be ascertained
1 2 3

Event-density based Event-driver based Information richness
10
Model calibration
Appropriate techniques to be adopted for different model calibration issue
Parallel offset Intersection Entangled
Default rate
Default rate Default rate
Calibration
challenges
Overpredicted
default
Underpredicted
default

0 0 0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Deciles Deciles Deciles

Techniques
• Linear regression • Kernel regression

• Logit transform
• Testing ODR & prob with goodness of
to fix
• Model of models • Decision tress

fit test (parametric/non-parametric)
11
Validating the model performance out-side of training data is critical for
stake-holder confidence
Muti-dimensional model validation
a b c d e
Power Rank ordering Stability Temporal effect Calibration
Ability of the model to Model should be able to Power of individual Model should show Whether the synthetic
differentiate between rank order defaults in a variables and overall reasonable predictability ‘probability score’
good Vs bad monotonical fashion model should show outside the defined coming out of the model
reasonable stability model time period actually closely maps to
• Gini High power model does across seasons & the default rate
– for accuracy not necessitate rank business cycles • Temporal decay
across ordering
population • Bivariate strength • Hosmer-Lemeshow
• KS to be tested on

• Brier score
– accuracy at censored defaults
point
• AUC/ROC • PSI
• CSI
Focusing just on 'Power' The Model's Calibration is essential

of the model by performance should be is the probability score
measuring Gini / KS / stable. Wide of a model is to be
Precision-recall is just a fluctuations is model's directly used in further
start and by itself often performance reduce the decision
not enough utility of the model
12
Note: A model with High Gini need not be well calibrated and vice versa and same is true for rank ordering of defaults across bins

Design of Credit Model Design IIM Fintech Abrg

Uploaded by

Copyright:

Available Formats

Design of Credit Model Design IIM Fintech Abrg

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design of Credit Model Design IIM Fintech Abrg

Uploaded by

Copyright:

Available Formats

Model Design: Ensuring Business Success of

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Copyright © 2020 by Boston Consulting Group. All rights reserved.

from same data

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Nature of the 'Modeling' Problem

an analytical What Modelling Outcome Will Solve the problem

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Nature of the Impact

Different paths, But Same Destination?

1. 'Statistical significance' explicitly strives of 1. Computationally intensive bootstrapping

Copyright © 2020 by Boston Consulting Group. All rights reserved.

1. Spurious regressions 1. Overfitting

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Data Structure: Structured Data vs Unstructured data

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Model Selection • Conversion of Score to

Copyright © 2020 by Boston Consulting Group. All rights reserved.

The most significant variables are selected through multiple criteria

values variables will be independently sufficient first two steps to be included in

Copyright © 2020 by Boston Consulting Group. All rights reserved.

values binary variables power from each Component

• Four different imputation • Check for VIF

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Choice of segmentation framework

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Appropriate techniques to be adopted for different model calibration issue

Parallel offset Intersection Entangled

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Deciles Deciles Deciles

• Linear regression • Kernel regression

• Model of models • Decision tress

Copyright © 2020 by Boston Consulting Group. All rights reserved.

Focusing just on 'Power' The Model's Calibration is essential

You might also like