The AI Insurance Pricing Company

The AI Insurance Pricing Company
2021
Modeling Approaches :
Data-Scientists & Pricing Actuaries
2021
Actuary vs. Data Scientist
The main differences
...a belief that actuaries of the

future would increasingly
incorporate a lot more data science
and thinking into their day to day
jobs and actuaries would also
possibly move beyond the
traditional realms of insurance.
Source: https://proactuary.com/actuary-vs-data-scientist/
CONFIDENTIAL 3
Actuary vs. Data Scientist
The main differences
Going forward, the skill set

collectively known as data
science will be borne by a new
generation of data savvy business
specialists and subject matter
experts who are able to imbue
analysis with their deep domain
knowledge (...)
Source: https://proactuary.com/actuary-vs-data-scientist/
CONFIDENTIAL
2 Approaches of Modeling
Data-scientists and Actuaries typically have different focuses during modeling
Heavy reliance on the Actuaries DS Data-driven

data’s meaning focus on focus on approach
● Obsession with
Context Data ● Reliance on artificial
adverse-selection risk intelligence and
● Expert-judgement before automation
models “score” ● Obsession with scores
● Models transparency ● Models understanding
used for sanity checks
CONFIDENTIAL 5
Classic ML approach
Global Parameters and Model Parameters
GLOBAL PARAMETERS
Models creation is automated :
- The user defines global parameters GBM
- The algorithm fits on the data and produces ● Number of trees
the model. ● Trees depth
● Learning Rate
FIT
The model itself is often less important than the
global parameters.
MODEL PARAMETERS
For instance, when building a GBM, a user will find
GBM
the global parameters maximizing the back-test
results (through a k-fold), not the best model. ● Ensemble of trees
(split points, split variables,
leaves estimates)
CONFIDENTIAL 7
Indirect Models Explanations
Black-box models can be analysed
Most ML models are black-boxes: they can’t be directly understood, but can be analysed.
For instance, a Gradient Boosting generates predictions from an ensemble of decision trees:
Each tree leverages all the dimensions of the data, generating interactions between the variables.
+95 other
= + + + + + ... trees
GBMs are really great because they just work : it is straightforward to produce automatically good models.
As a GBM typically involves hundreds of trees of depth 2 to 6 (generating 2 to 6-ways interactions), this model is not directly understandable
by a human.
For this reason, powerful model-analysis tools have been developed.
CONFIDENTIAL 8
Example of black-box analysis
PDP : understand the global impact
Driver Age
For example: a Partial Dependence Plot (PDP)) and Individual Conditional Expectation (ICE) showing the impact of a driver’s age.
CONFIDENTIAL 9
ICE: visualize the conditional impacts
Driver Age
CONFIDENTIAL 10
ICE: visualize the conditional impacts
Driver Age
CONFIDENTIAL 11
Classic Actuarial approach
Direct Models Visualization
To be understood, models must be:

While model interpretability techniques can
● Reductible: the models can be splitted and
be applied to any model, a direct model visualized piece-by-piece
understanding is restricted to the specific
class of models: ● Parsimonious: the model must incorporate a
limited number of effects to be analizable
This class of models restrict human-understandable models to simple categorie:

➔ simple rules
➔ shallow trees
➔ Generalized Additive Models (including GLMs), with parsimonious interactions
CONFIDENTIAL 13
Direct Models Visualization
Actuaries have been focussing during the past 30 years on the GAM modeling, because it allows the modeler to decompose the model’s
effects and:
● Validate them
● “Force” them if no exposure is available
The GAM models are defined by the shape of their formula:
+5 other
= + + + + + ...
variables
Driver Age Driving Experience Vehicle Speed Contract Mileage Vehicle Age
Here the model itself is visualized and fully understood by a human.
CONFIDENTIAL 14
Analysing a GAM
Only a limited number of variables play a role; each variable’s impact is fully known
CONFIDENTIAL 15
Mixing ML & Actuarial
approaches
Trees Ensembles and GAMs
Strengths and Limits
Strengths associated with Trees ensembles models are related to their creation process.
Strengths associated with GAMs are related to their models structure.
Trees Ensembles GAM
Models structure Models structure

● Sum of small effect of all the variables ● Sum of effects of single variables.
● Trees depth
Models Understanding Models Understanding

● Via reverse-engineering or local analysis ● Direct visualization.
Models Creation Models Creation

● Machine learning ● Human-creation
● Machine-learning
CONFIDENTIAL 17
Global Parameters and Model Parameters
Applying ML to GAMs
GLOBAL PARAMETERS
GAM
It is possible to design an algorithm fitting GAMs, ● Smoothness level
based on 2 global parameters: ● Parsimony level
- Level of smoothness : how significant should
the selected effects be ?
- Level of parsimony : how many variable
should be included in the model ? FIT
We developed this algorithm : Models can be MODEL PARAMETERS

generated automatically for many values of the
GAM
global parameters (machine-learning Grid-Search
● Effect functions values
approach), tested on back-tests and fully analysed.
(one function per selected
variable)
.
CONFIDENTIAL 18
1. Parsimony has a cost (but it is worth it)
Understanding / Accuracy trade-off
Complex GAM Black-box models

Best models with interactions (GBMs, RF, NN…)
Complex GAM
Higher Accuracy
Bad models
Simple GAM
No Model Linear Models
Better Understanding
The accuracy is measured on a back-test; actual results when moving to productions will not be
CONFIDENTIAL 19
Grid-search result
Grid-search results:
each point represents
one model.
Gini Score
The gain in models quality and

the fading marginal
improvement are clearly
visible.
Number of Variables
CONFIDENTIAL 20
Grid-search result
Grid-search results:
each point represents
one model.
Gini Score
The gain in models quality and

the fading marginal
improvement are clearly
visible.
Number of Variables
CONFIDENTIAL 21
2. When you start looking, good models are hard to define
What is overfitting ?
Which model should be selected??
Out-of-sample Gini: 20.5% Out-of-sample Gini: 21%

Robust model Noisy Model
Driver Age Driver Age
Model on the left has stronger results on the back-test but does not inspire much trust.
Model on the right might lead to better results once deployed in production.
CONFIDENTIAL 22
3. Interact with the models
Spotting the issues is nice..
Number of Rooms
CONFIDENTIAL 23
3. Interact with the models
… solving the issues is better !
Number of Rooms
CONFIDENTIAL 24
4. Price Update & Fading Memory
Parsimonious price update is key
Vehicle Age
A clear model structure allows an easy identification of mismatches between an old model and new data,
CONFIDENTIAL and of the coefficients causing them. 25
Mismatch
Good Fit
Vehicle Age
A clear model structure allows an easy identification of mismatches between an old model and new data,
CONFIDENTIAL and of the coefficients causing them. 26
Vehicle Age
It is now easy to fix these mismatches in a parsimonious way, leveraging the elements of the models that are
CONFIDENTIAL still good. This parsimonious updates eases the model validation process and provides a fading memory, 27
mixing the information from the old model and the new data.
What should we do with time-consistency ?
CONFIDENTIAL External Risk Assessment 28

What should we do with time-consistency ?
CONFIDENTIAL External Risk Assessment 29

Conclusion
Mixing Data-Science automation and Actuarial Expertise
ML & Back-test Actuarial expertise Understanding and capability to interact with a

performance and transparency model is key ; model’s simplicity has value.
● Allows automated models ● Minimizing the back-test Models must allow the inclusion of expertise,
creation error is not enough safety and provide extrapolation capabilities.
● Based on statistical criteria ● Performance can’t be
measures before deployments Transparent modeling can and should be
● Easy to measure & reproduce
(and sometimes not even combined with machine-learning techniques.
● Data-driven after)
● Pushes toward complexity
● Direct interactions with the Transparency is not “under-sophistication” or
over understanding
model itself is key to include all “primitiveness” but realism and efficiency.
the operational constraints.
CONFIDENTIAL 30
Thank you !

The AI Insurance Pricing Company

Uploaded by

Copyright:

Available Formats

The AI Insurance Pricing Company

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The AI Insurance Pricing Company

Uploaded by

Copyright:

Available Formats

The AI Insurance Pricing Company

Data-Scientists & Pricing Actuaries

...a belief that actuaries of the

Going forward, the skill set

Heavy reliance on the Actuaries DS Data-driven

For this reason, powerful model-analysis tools have been developed.

To be understood, models must be:

This class of models restrict human-understandable models to simple categorie:

The GAM models are deﬁned by the shape of their formula:

Here the model itself is visualized and fully understood by a human.

Strengths associated with GAMs are related to their models structure.

Trees Ensembles GAM

Models structure Models structure

Models Understanding Models Understanding

Models Creation Models Creation

We developed this algorithm : Models can be MODEL PARAMETERS

Complex GAM Black-box models

The gain in models quality and

The gain in models quality and

Which model should be selected??

Out-of-sample Gini: 20.5% Out-of-sample Gini: 21%

Driver Age Driver Age

CONFIDENTIAL External Risk Assessment 28

CONFIDENTIAL External Risk Assessment 29

ML & Back-test Actuarial expertise Understanding and capability to interact with a

You might also like