Map578 5
Map578 5
Map578 5
-
Collaborative and Reliable Learning
9 lectures:
1 Lectures on some crucial topics (Lectures 1 - 5)
2 Reading articles in small groups (Lectures 6- 7)
3 Presenting those articles (Lectures 8 - 9)
9 lectures:
1 Lectures on some crucial topics (Lectures 1 - 5)
2 Reading articles in small groups (Lectures 6- 7)
3 Presenting those articles (Lectures 8 - 9)
Spreadsheet for paper selection (the list is indicative and will be evolving over
the next 1-3 weeks)
https://docs.google.com/spreadsheets/d/
1WkmYHmFUMnS0FjM8UhX1z2S2xKGCj0vUJnnSRCHIR2E
Aymeric DIEULEVEUT
Assistant professor at Polytechnique, CMAP
Interests:
Optimization and Statistics, links between the two aspects
Large Scale Learning
Federated, Distributed, Privacy preserving learning
Contact: [email protected]
El Mahdi EL MHAMDI
Assistant professor at Polytechnique, CMAP
Interests:
Distributed systems, distributed algorithms
Robustness, fault tolerance
Computable ethics (mathematics, analytical philosophy, social sciences. . . )
Contact: [email protected]
Apprentissage supervisé
Objectif
Fonction de perte
Alors
g ∗ (X) = E[Y |X].
c’est la fonction de régression.
Taille C %
Err. Approximation &
Err. Estimation %
10
8
6
4
2
0
0 2 4 6 8 10
complexite
Problèmes
Comment choisir C?
Comment sélectionner une règle de décision dans C ?
Approche “Générative”
Solution: Estimer la fonction de régression η(X) = P ( Y = 1 | X) et substituer
cet estimateur dans la règle bayésienne: Modèles linéaires (généralisés),
méthodes à noyaux, k-plus proches voisins, Bayes naı̈f,...
Approche “Optimisation”
Solution: Minimiser le risque empirique (ou une borne supérieure du risque
empirique): machines à vecteurs de supports, réseaux de neurones,
Dn = {(X1 , Y1 ), . . . , (Xn , Yn )}
Optimisation
arg min R
b n (g) (1)
g∈C
Optimisation
arg min R
b n (g) (1)
g∈C
Distribution
1 Several workers/agents/nodes share the data or the model
2 Data distribution : each holds a share of the data
3 Model distribution : each holds a share of the model
N
X
arg min Fi (θ) (3)
θ∈C⊂Rd i=1
N
X
arg min Fi (θ) (3)
θ∈C⊂Rd i=1
1 N
Lool FI FN
loss
overall n
F w Fi
loss fu
I 1 NN
Distribution
jin R z na
Di Dj
Heterogeneity
1 Averaging consensus:
N
X
arg min Fi (θ) (4)
θ∈C⊂Rd i=1
2 Adaptation:
N
X
arg min Fi (θi ) (5)
(θi )i∈[N ] ∈C N ⊂(Rd )N i=1
I 1 N
a
Observations
z si je z
k 1
Communication constraints
1 Communication can be the bottleneck in distributed systems: can we get a
speedup?
2 Uploading and downloading updates: saturation of networks, bandwidth?
3 Un-availability of some workers?
1 N a
Upload
Download
a oo.co
Federated Learning
3 Tackles both
the averaging consensus problem
the adaptation problem
4 Data distribution:
cross silo
cross device
5 Some concerns
Privacy
Non i.i.d. agents
Optimization with bandwidth constraints, Partial participation
6 Important implementation issues
ARPANET cold war motivation: how to make sure information is still available
and possible to disseminate after a nuclear attack?
ARPANET cold war motivation: how to make sure information is still available
and possible to disseminate after a nuclear attack?
How to manage the messy problem of synchronizing nodes, making them agree
on values despite crashes, errors in data, asynchrony etc.
How to manage the messy problem of synchronizing nodes, making them agree
on values despite crashes, errors in data, asynchrony etc.
The proxy will inevitably be different from the intended goal, but we can study
the mismatch and avoid pitfalls.
Industrial perspective:
1 How to get people to engage
2 Loss functions from business models
Insurance:
1 Liability for loss of privacy in a Privacy preserving framework
Legal aspects:
1 How to we define privacy in the law?
2 Role of GDPR?
Economics:
1 Data valuation?
2 Value Sharing?
Cryptography:
1 Role of homomorphic encryption?
2 MPC?
*: but for which you will be (technically) better equipped by the end of the
class.