The AI Insurance Pricing Company
The AI Insurance Pricing Company
The AI Insurance Pricing Company
2021
Modeling Approaches :
2021
Actuary vs. Data Scientist
The main differences
Source: https://proactuary.com/actuary-vs-data-scientist/
CONFIDENTIAL 3
Actuary vs. Data Scientist
The main differences
Source: https://proactuary.com/actuary-vs-data-scientist/
CONFIDENTIAL
2 Approaches of Modeling
Data-scientists and Actuaries typically have different focuses during modeling
● Obsession with
Context Data ● Reliance on artificial
adverse-selection risk intelligence and
● Expert-judgement before automation
models “score” ● Obsession with scores
● Models transparency ● Models understanding
used for sanity checks
CONFIDENTIAL 5
Classic ML approach
Global Parameters and Model Parameters
GLOBAL PARAMETERS
Models creation is automated :
- The user defines global parameters GBM
- The algorithm fits on the data and produces ● Number of trees
the model. ● Trees depth
● Learning Rate
FIT
The model itself is often less important than the
global parameters.
MODEL PARAMETERS
For instance, when building a GBM, a user will find
GBM
the global parameters maximizing the back-test
results (through a k-fold), not the best model. ● Ensemble of trees
(split points, split variables,
leaves estimates)
CONFIDENTIAL 7
Indirect Models Explanations
Black-box models can be analysed
Most ML models are black-boxes: they can’t be directly understood, but can be analysed.
For instance, a Gradient Boosting generates predictions from an ensemble of decision trees:
Each tree leverages all the dimensions of the data, generating interactions between the variables.
+95 other
= + + + + + ... trees
GBMs are really great because they just work : it is straightforward to produce automatically good models.
As a GBM typically involves hundreds of trees of depth 2 to 6 (generating 2 to 6-ways interactions), this model is not directly understandable
by a human.
CONFIDENTIAL 8
Example of black-box analysis
PDP : understand the global impact
Driver Age
For example: a Partial Dependence Plot (PDP)) and Individual Conditional Expectation (ICE) showing the impact of a driver’s age.
CONFIDENTIAL 9
Example of black-box analysis
ICE: visualize the conditional impacts
Driver Age
For example: a Partial Dependence Plot (PDP)) and Individual Conditional Expectation (ICE) showing the impact of a driver’s age.
CONFIDENTIAL 10
Example of black-box analysis
ICE: visualize the conditional impacts
Driver Age
For example: a Partial Dependence Plot (PDP)) and Individual Conditional Expectation (ICE) showing the impact of a driver’s age.
CONFIDENTIAL 11
Classic Actuarial approach
Direct Models Visualization
CONFIDENTIAL 13
Direct Models Visualization
Actuaries have been focussing during the past 30 years on the GAM modeling, because it allows the modeler to decompose the model’s
effects and:
● Validate them
● “Force” them if no exposure is available
+5 other
= + + + + + ...
variables
Driver Age Driving Experience Vehicle Speed Contract Mileage Vehicle Age
CONFIDENTIAL 14
Analysing a GAM
Only a limited number of variables play a role; each variable’s impact is fully known
CONFIDENTIAL 15
Mixing ML & Actuarial
approaches
Trees Ensembles and GAMs
Strengths and Limits
Strengths associated with Trees ensembles models are related to their creation process.
CONFIDENTIAL 17
Global Parameters and Model Parameters
Applying ML to GAMs
GLOBAL PARAMETERS
GAM
It is possible to design an algorithm fitting GAMs, ● Smoothness level
based on 2 global parameters: ● Parsimony level
- Level of smoothness : how significant should
the selected effects be ?
- Level of parsimony : how many variable
should be included in the model ? FIT
.
CONFIDENTIAL 18
1. Parsimony has a cost (but it is worth it)
Understanding / Accuracy trade-off
Complex GAM
Higher Accuracy
Bad models
Simple GAM
No Model Linear Models
Better Understanding
The accuracy is measured on a back-test; actual results when moving to productions will not be
CONFIDENTIAL 19
1. Parsimony has a cost (but it is worth it)
Grid-search result
Grid-search results:
each point represents
one model.
Gini Score
Number of Variables
CONFIDENTIAL 20
1. Parsimony has a cost (but it is worth it)
Grid-search result
Grid-search results:
each point represents
one model.
Gini Score
Number of Variables
CONFIDENTIAL 21
2. When you start looking, good models are hard to define
What is overfitting ?
Model on the left has stronger results on the back-test but does not inspire much trust.
Model on the right might lead to better results once deployed in production.
CONFIDENTIAL 22
3. Interact with the models
Spotting the issues is nice..
Number of Rooms
CONFIDENTIAL 23
3. Interact with the models
… solving the issues is better !
Number of Rooms
CONFIDENTIAL 24
4. Price Update & Fading Memory
Parsimonious price update is key
Vehicle Age
A clear model structure allows an easy identification of mismatches between an old model and new data,
CONFIDENTIAL and of the coefficients causing them. 25
4. Price Update & Fading Memory
Parsimonious price update is key
Mismatch
Good Fit
Vehicle Age
A clear model structure allows an easy identification of mismatches between an old model and new data,
CONFIDENTIAL and of the coefficients causing them. 26
4. Price Update & Fading Memory
Parsimonious price update is key
Vehicle Age
It is now easy to fix these mismatches in a parsimonious way, leveraging the elements of the models that are
CONFIDENTIAL still good. This parsimonious updates eases the model validation process and provides a fading memory, 27
mixing the information from the old model and the new data.
5. When you start looking, good models are hard to define
What should we do with time-consistency ?
● Allows automated models ● Minimizing the back-test Models must allow the inclusion of expertise,
creation error is not enough safety and provide extrapolation capabilities.
● Based on statistical criteria ● Performance can’t be
measures before deployments Transparent modeling can and should be
● Easy to measure & reproduce
(and sometimes not even combined with machine-learning techniques.
● Data-driven after)
● Pushes toward complexity
● Direct interactions with the Transparency is not “under-sophistication” or
over understanding
model itself is key to include all “primitiveness” but realism and efficiency.
the operational constraints.
CONFIDENTIAL 30
Thank you !