0% found this document useful (0 votes)
19 views

Lecture 11

This document summarizes exponential families and generalized linear models. It discusses how many common probability distributions like Gaussian, binomial, and Poisson are members of the exponential family. Exponential families have natural parameters, sufficient statistics, and a log normalizer function. Generalized linear models extend linear regression by linking the linear predictor to the response via a link function and assuming the response comes from an exponential family distribution.

Uploaded by

Tiffany Persad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lecture 11

This document summarizes exponential families and generalized linear models. It discusses how many common probability distributions like Gaussian, binomial, and Poisson are members of the exponential family. Exponential families have natural parameters, sufficient statistics, and a log normalizer function. Generalized linear models extend linear regression by linking the linear predictor to the response via a link function and assuming the response comes from an exponential family distribution.

Uploaded by

Tiffany Persad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED

LINEAR MODELS

HANI GOODARZI AND SINA JAFARPOUR

1. E XPONENTIAL FAMILY.
Exponential family comprises a set of flexible distribution ranging both
continuous and discrete random variables. The members of this family have
many important properties which merits discussing them in some general
format. Many of the probability distributions that we have studied so far are
specific members of this family:
• Gaussian: Rp
• Multinomial: categorical
• Bernoulli: binary {0, 1}
• Binomial: counts of success/failure
• Von mises: sphere
• Gamma: R+
• Poisson: N+
• Laplace: R+
• Exponential: R+
• Beta: (0, 1)
• Dirichlet: ∆ (Simplex)
• Weibull: R+
• Weishart: symmetric positive-definite matrices
All these distributions follow the general format:

p(x|η) = h(x) exp η > t(x) − a(η) ;



(1)

where, η is called “natural parameter”, t(x) is “sufficient statistic” (a statistic


is a function of data), h(x) is the “underlying measure” and a(η) is called
“log normalizer”, which ensures that the distribution integrates to one. Hence,
Z
h(x) exp η > t(x) dx.

a(η) = log

We start by showcasing a number of known distributions and illustrate that


they are indeed members of the exponential family.
1
2 HANI GOODARZI AND SINA JAFARPOUR

1.1. Bernoulli. Bernoulli distribution is defined on a binary (0 or 1) ran-


dom variable using parameter π where π = Pr(x = 1). The Bernoulli
distribution can be written as:
(2) p(x|π) = π x (1 − π)1−x .
In order to convert Equation (2) to the general exponential format (Equa-
tion (1)), we rewrite it as,
p(x|π) = exp{log π x (1 − π)1−x }

(3)
= exp{x log π + (1 − x) log(1 − π)}
 
π
= exp x log + log(1 − π)
1−π
In Equation (3),
π
• η = log 1−π ,
• t(x) = x,
• a(η) = − log(1 − π),
• and h(x) = 1.
To put a(η) in its correct form, we use the relationship between η and π:
π
(4) η = log ⇒
1−π  
1−π 1
−η = log = log −1 ⇒
π π
1
e−η = −1⇒
π
1
π = = σ(η)
1 + e−η
Consequently,
(5) a(η) = log(1 + eη ),
and
(6) p(x|η) = σ(−η)e−ηx .

1.2. Multinomial. Although not discussed in the class, it is important to


see this process for the multinomial distribution as well. While the Bernoulli
is defined with the parameter π, multinomial has a vector of parameters µk
where k goes from 1 to M :
M
Y M
X
p(x|µ) = µxkk = exp{ xk log µk },
k=1 k=1
LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS 3

M
where x = (x1 , x2 , · · · , xN )> and
P
µk = 1. Following the same process
k=1
as Bernoulli, we have:
M
X
p(x|η) = exp{η > x + log(1 + ηk )−1 },
k=1

where
exp(ηk )
(7) µk = P = softmax(k, η).
1 + j exp(ηj )

1.3. Poisson. Poisson is a discrete distribution defined to express the num-


ber events that occur in a unit of time or space. This distribution, which is
similar to Gaussian distribution but for count data, is given by

λx e−λ 1
(8) p(x|λ) = = exp{x log λ − λ},
x! x!
where
• η = λ,
• t(x) = x,
• a(η) = λ = eη ,
• and h(x) = x!1 .

1.4. Univariate Gaussian. Similarly, the Gaussian distribution can be also


rewritten in terms of the general exponential format;

P (x|µ, σ 2 ) =
−(x − µ)2
 
1
= √ exp
2πσ 2 2σ 2
 
1 µ 1 2 1 2
(9) = √ exp .x − 2 .x − 2 µ − log(σ) ,
2π σ2 2σ 2σ

where
• η = h σµ2 2σ
−1
2 i,

• t(x) = hx x2 i,
• h(x) = √12π ,
µ2 −η12
• and a(η) = 2σ 2
+ log(σ) = 4η2
− 12 log(−2η2 ).
4 HANI GOODARZI AND SINA JAFARPOUR

2. M OMENTS OF EXPONENTIAL FAMILY.


In the family of exponential distributions, the a(η) function is in fact the
generating function. We show this by derivatizing this term:
 Z 
d a(η) d >
(10) = log exp{η t(x)}h(x)dx
dη dη
d
exp η > t(x) h(x)dx
R 

= R
exp{η > t(x)}h(x)dx
t(x)h(x) exp{η > t(x)}dx
R
= R
exp{η > t(x)}h(x)dx
t(x) exp{η > t(x)}h(x)dx
R
=
exp{−a(η)}
Z
= t(x) exp{η > t(x) − a(η)}h(x)dx
= E [t(x)] .
Likewise, it can be shown that:
d2 a(η) 2
− E [t(x)]2 .
 
(11) 2
= Var (t(x)) = E t(x)

For example, in Bernoulli distribution we have, a(η) = log(1 + eη ) is the
moment generating function. The first derivative of this function is given
by
d
d a(η) dη
(1 + eη ) 1
(12) = = = π = E[X].
dη 1+ eη 1 + e−η
In this context, µ defined as E [t(X)] can be computed from da(η) dη
which is
solely a function η. This relationship connects µ and η and since the func-
tion is convex (i.e. the second derivative is greater than 0), this relationship
is invertible. Thus we can define
(13) η = Ψ (µ).
where Ψ is a function which maps the natural (canonical) parameters to
the mean parameter.

3. G ENERALIZED LINEAR MODELS


The generalized linear model (GLM) is a powerful generalization of
linear regression to more general exponential family. Figure 3 demon-
strates the graphical model representation of a generalized linear model.
The model is based on the following assumptions:
LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS 5

F IGURE 1. Representation of a generalized linear model

• The observed input enters the model through a linear function (β > X).
• The conditional mean of response, is represented as a function of
the linear combination:
.
(14) E[Y |X] = u = f (β > X).
• The observed response is drawn from an exponential family distri-
bution with conditional mean µ, as explained in Equation (13).
Figure 3 summarizes the relationships between the variables in a GLM. It

F IGURE 2. Relationship between the variables in a general-


ized linear model

is usually convenient to work with overdispersed exponential families. We


assume that the observed response comes from the following probability
distribution:
 > 
η y − a(η)
(15) p(y|η) = h(y, η) exp .
σ
For a fixed σ, Equation (15) is an exponential family, but as a function of σ,
it is not an exponential family since h is a function of both y and σ.
As a simple example, in the case of linear regression:
n 2o
1
• h(y, σ) = √2πσ exp −y2 ,
2
• a(η) = η2 ,
• f : identity,
• Ψ : identity.
6 HANI GOODARZI AND SINA JAFARPOUR

Consequently,
−y 2 ηy − η2/2
   
1
(16) p(y|η) = √ exp exp
2πσ 2 σ
−(y − η)2
 
1
= p(y|η) = √ exp .
2πσ 2σ
Generally, we have two choice points in order to specify the generalized
linear model. The choice of the response function f , or how to treat the
linear combination of the observed input, and the choice of the exponen-
tial family distribution of the observed output y. Note that Ψ is completely
determined by choosing the exponential family. As a result, choosing ap-
propriate response function and exponential family is one of the major tasks
in probabilistic modeling, and once the choices are made, the general frame-
work of the exponential family can be applied to the modeled data.

You might also like