Act RR91-05
Act RR91-05
Act RR91-05
September 1991
For additional copies write:
ACT Research Report Series
P.O. Box 168
Iowa City, Iowa 52243
Bradley A. Hanson
September, 1991
II
I
Four-parameter Beta. Compound Binomial
Abstract
This paper presents a detailed derivation of method of moments estimates for the four-
parameter beta compound binomial strong true score model. A procedure is presented to
deal with the case in which the usual method of moments estimates do not exist or result
in invalid parameter estimates. The results presented regarding the method of moments
estimates are used to derive formulas for computing classification consistency indices under
Acknowledgement. The author thanks Robert L. Brennan for carefully reading an earlier
version of this paper and offering helpful suggestions and comments.
Four-parameter Beta Compound Binomial
The first part of this paper discusses estimation of the four-parameter beta compound
binomial model by the method of moments. Most of the material in the first part of
this paper is a restatement of material presented in Lord (1964, 1965) with some added
details. The second part of this paper uses the results presented in the first part to obtain
formulas for computing the classification consistency indexes described by Hanson and
Brennan (1990).
The purpose of this paper is to provide a detailed description of the procedures used in
computer programs written by the author which compute estimates for the four-parameter
beta compound binomial model and indexes of classification consistency based on the four-
It is assumed that the test score to be modeled is the sum of K dichotomously scored
test items (this score is referred to as the raw or observed test score). The probability
that the raw score random variable X in the population of interest equals i (i = 0 , . . . , K ),
where r is the proportion correct true score. (For simplicity of notation the dependence
of the marginal test score distribution [Pr(X = i)] on the parameters k, a, /?, /, and u
is not denoted.) The true score distribution [<7( 7* |o',/?, l,u)\ is assumed to belong to the
generalization of the usual beta distribution that in addition to the two shape parameters
(a > 0 and /5 > 0 ) has parameters for the lower (/) and upper (u) limits of the distribution
( 0 < Z < U < 1 ) . The four-parameter beta density function [defined on the interval (f,u)]
Four-parameter Beta Compound Binomial
where B(a,f3) is the beta function, which is related to the gamma function [r(x )j by
n,n ^ = n « ) m
-^ (^ 5 p )
-p/ . Q\ '
T(a + P)
two term approximation to the compound binomial distribution. The probability density
1965, Equation 5)
P r(X = i | r, k) = p(i \K , r)
tion 3 involves the parameter k (note that this is distinct from upper case I\ which is
used in this paper to donate the number of items on the test), in addition to the binomial
gives a method of estimating k by setting the theoretical value of the average error variance
(or reliability) under the two term approximation of the compound binomial distribution
(which is a function of k) equal to an estimate of the average error variance (or reliabil
ity). The number-correct true score variance, assuming the conditional error distribution
Equation 39)
K 2 a l - ( K - 2 k ) ^ x ( K - ^ x)
K ( I < - l ) + 2k ’ 1 ’
where fix and g \ are the raw score mean and variance. Subtracting Equation 4 from a\
gives the average error variance (denoted under the two term approximation of the
_2 a2
x [ K ( K - l ) + 2 k - K 2} + ( K - 2 k ) ftl( K - ,ix)
a‘ K { K - l ) + 2k '
9
Four-parameter Beta Compound Binomial
. K [ { K - \ ){cl - o p - K o\ + Hx( K - „ , ) ]
2 [/ii( K - p z ) - - o*)]
Given an estimate of the average error variance, o \, and estimates of jix and cr\ in Equation
6 , an estimate of k can be calculated. The value of k as a function of the reliability (/?), raw
score mean, and raw score variance is given by substituting 0 ^ (1 — p) for in Equation
6.
distribution since for some values of r and i it is possible that P r(X = i \r, k) < 0. Lord
(1965) states that for usual values of K and k negative probabilities are typically negligible
for .01 < r < .99 so that for practical purposes it is appropriate to treat P r(X = i j r, k)
After a value of k has been determined there are two steps involved in the estimation of
the observed score distribution under the four-parameter beta compound binomial model
using the method of moments. First, the parameters of the four-parameter beta true score
distribution are estimated, and second these parameter estimates are used to produce the
Under the assumption that the conditional error distribution is given by the two term
which gives the non-central moments (moments about zero) of the proportion correct true
score distribution in terms of the factorial moments of the observed score distribution. The
first moment of the proportion correct true score distribution can be written as
where fiiT is the z-th central moment of the proportion correct true score distribution and
is the i-th factorial moment of the number correct observed score distribution (the
3
Four-parameter Beta Compound Binomial
z-th factorial moment of a random variable X is defined as £ ’[JV(J\T -f- 1 ) . . . (X -f- i — 1)]).
The z-th (i > 2) non-central moment of the proportion correct true score distribution ( ^ r )
_ (K- 2^ - n + kl
P,r K(K - 1) + ki[2 ] ’ w
where for integers j and /. j w = j ( j — 1 ) . . . (j — / + 1). Given estimates of the first four
factorial moments of the observed score distribution (these can be obtained from the central
or non-central observed score moments, see Kendall & Stuart, 1977, page 66 ), estimates of
the mean and second through fourth non-central moments of the proportion correct true
The estimates of the first four true score moments are used to produce method of
moments estimates of the parameters of the four-parameter beta true score distribution.
The four parameters to be estimated are the two shape parameters (a and /?), the lower
limit of the distribution (/) and the upper limit of the distribution (u). If the proportion
correct true score distribution is a member of the four-parameter beta family then its
mean, variance, skewness and kurtosis are given by (Johnson Sz Kotz, 1970, pages 40-44)
where fiir is the z-th central moment (moment about the mean) of the proportion correct
true score distribution. The central moments of Equation 9 can be written in terms of the
non-central moments of Equation 8 (see Kendall &; Stuart, 1977, page 58).
Four-parameter Beta Compound Binomial
Substituting the values for j 3 and j 4 given in Equation 9 into Equation 10 and simplifying
gives r = a + ft. Substituting r for a -j- ft in the expression for 7 4 given in Equation 9 and
a o = ____________6 r 2 (r + l)____________ Mn
(r + 2) (r + 3) 7 4 - 3 (r — 6) (r + 1)
Solving for a using the expressions a + ft = r and Equation 11 gives the two solutions
r I 24 (r + 1) \
a = 2\ ± Y ( r + 2 ) ( r + 3 ) 7 4 - 3 ( r - 6 ) ( r + l) J ' (12)
If the solutions given in Equation 12 are real then one of these solutions will be the value
of a and the other solution will be the value of ft. If the skewness ( 7 3 ) is positive then the
larger solution will be the value of ft, otherwise the larger solution will be the value of a.
of moments estimates of the parameters a and ft. Solving the expressions for /x'lr and fi2T
,_ j a \f^2r (Of + /? + 1)
— t—15
____ 'H E .______ (13)
1 , Py/M2t (a + & + 1)
W= Plr + ------------ ---------------- •
y/aft
Substituting estimates of // l r , fi2r and the method of moments estimates of a and ft from
The method of moments estimates of the four parameters do not exist when the
solutions for a and ft given in Equation 12 are not real. Even if solutions exist the estimates
of some parameters may not be valid (i.e., u > 1, / < 0, a < 0, ft < 0). When the method
of moments solution using the first four moments does not exist or one or more of the
the first three moments are fit to determine three of the four parameters (a , ft and u) and
the remaining parameter (/) is chosen such that the kurtosis of the fitted distribution is as
5
Four-parameter Beta Compound Binomial
near as possible to kurtosis as calculated directly from the observed score moments (using
(given a specified value of I) based on the first three moments of the proportion correct
e, = t = 1, 2, 3. (14)
a + p - f i — 1
& £1 — 2 £2 + fi £2
6 = ' 2 6 - 6 - 1 ■ ( 1 5 )
The first two non-central moments of the proportion correct true score distribution can be
written as
M ir = * + (u — 0 f i
t A t- 1
1 U—I
/ t ; r ~ 2 I ( n [ T — I ) — I 2 ( 1 0
^ (/i'Ir - / ) ( « - / ) ■
Substituting the expressions for £1 and £2 given in Equation 17 into the expression for £3
(u - 0 6 =
~/3-1-3fi[Tl2-{p'2r-f2[^ir]2)/+ Mir^2r+ (W_0([Mir!2—2^2r+ 2(
-
L\r^~^).( 18)
P - 2p>l r l + 2 (//l r )2 - p'2r + ( « - / ) ( / - ^ l r )
The third non-central moment of the proportion correct true score distribution is
6
Four-parameter Beta Compound Binomial
Substituting the expressions for £1 and £2 from Equation 17 into Equation 19 and simpli-
fying produces
The only expression involving u in Equation 20 is ( u — /)£3. Substituting the right hand
side of Equation IS for ( u — /)£* in Equation 20, solving for u and simplifying gives
t — I
T = (22)
~ I
with parameter a and (3. The parameters a and /? are given in terms of the non-central
„ f-hr* (/^lr* ~~
r* — (/^lr* )2 /^o\
g = ~ Mir* ) (Mlt* ~ ^2t-* )
^ r.-0 4r.)2
Given values of u and /, the non-central moments of the distribution of r* can be obtained
Equations 12 and 13 do not exist or are not valid, then a solution is selected such that
the first three moments are fit (which determines tz, cv, and (3) and / is chosen such that
7
Four-parameter Beta Compound Binomial
the fitted true score kurtosis is as close as possible to the true score kurtosis as calculated
The expressions for a and ft given in Equation 23 can be written in terms of I and the
first three non-central moments of the proportion correct true score distribution /^ r
and ^ 3 r)* These values of a and ft can be used in the expression for the kurtosis given in
Equation 9 to compute the fitted true score kurtosis as a function of /. The function of I
given by the squared difference of fitted true score kurtosis (a function of /) and the true
score kurtosis calculated directly from the observed score moments (using Equation 8 ) is
A method of finding the value of I that solves this constrained optimization problem is
to first compute the two solutions in which I and u are on the boundary of the parameter
space (the solution for which I = 0 and the solution for which u = 1 ). Either or both of
these solutions may not be a valid solution. The solution for which I = 0 can be computed
using Equations 21 and 23 . The solution for u — 1 can be computed by solving Equation
21 for I giving
of I along with u ~ 1 are used in Equation 23 to give estimates of a and ft. For each
of these two solutions (the solution for which / = 0 and the solution for which u = 1)
the fitted kurtosis is calculated (assuming the solutions are valid). The solution with the
smallest squared difference in the fitted kurtosis and the kurtosis calculated using Equation
8 (this squared difference will be referred to as the squared kurtosis difference) is used as
the initial solution. A grid search (Thisted, 1988, page 200) is then used for values of / > 0
to search for a solution with a smaller squared kurtosis difference than the initial solution.
For almost all the situations that have been encountered in practice either the solution
8
Four-parameter Beta Compound Binomial
for which I — 0 or the solution for which u = 1 produces the smallest squared kurtosis
difference. There are examples, though, in which a solution with I > 0 and u < 1 can
give a smaller squared kurtosis difference than either the solution for which / = 0 or the
This section discusses estimation of the observed score distribution under the four-
parameter beta compound binomial model assuming a value of k has been determined and
estimates of a, (3, /, and u have been calculated. The derivations presented in this section
closely follow those presented in Lord (1964) with some added details. These derivations
and the resulting formula for computing the observed score probabilities are not presented
in Lord (1965).
Case 1 : k = 0 . The case is first considered in which the conditional error distribution
[Pr(-Y = i | r, &)] in Equation 3 is binomial (k = 0). In this case the observed score
distribution is
( —/ + r ) “ 1 (it — t )P 1
Pr{ X = i) = j ' ( ^ ) r ‘ ( 1 - r ) * - 4
l\a+0-1 dr (2 5 )
(u - /)_
rK
Simplifying the expression of the true score density times u — / in Equation 26 and substi
9
Four-parameter Beta Compound Binomial
[(u - /) r* + l] 1 = ^ ^ [(w - /) r 1
* ]r ll~r , (28)
r=0
and
K-i
■ S S C) (*<~’) '>**’
• j (T , ) r+Q_1 (1 - r * ) ^ " 1d r' . (30)
Jo
Using the definition of the Beta function, the integral in Equation 30 can be written as
p = n = V V r (a + P) r ( o + r)r(/? + s)
1 ; f r i f r i r(a)r(/3) r(« + /3 + r + ,)
31 gives
10
Four-parameter Beta Compound Binomial
Let r' = i — r and s' = K — i — s. Changing the index of the first summation in
Equation 33 from r to r' and the index of the second summation from s to s' produces
fK
P r(X = i) =
r' =U s' —
i K-i
K —i
= ( - - o A' E E
r ' = 0 s' = 0
T
1----------------------------------------------------------------------------------------------------------------------
1—1
I
1
( ® ) » —r' ($') K —i —s1
(34)
(a + i 3 ) x - 3> - r > u —I u —I
K
X K -i K\ t! ( K - i)\
i\ ( A — i)! r'\ (i — r')! s'! (A” — i — s')!
A ! _________ 1_________
r'\ s1! (e — r')! (AT — i — 5')!
(I\ — r')! (AT — 5 ')! K\ A! 1
A! r'! ( K - r')! 5 '! (A - 5 ')! (i - r')! ( K - 1 - s')\
( K - r ’ y . ( K - s ’ )\ f K\ (I<\ 1
A ! \ r' / \ sf J (i — r')l ( A — i — s')!
1________________
1
*
11
i K —i
\
I
P r ( x = i) = («! - 0 * E E
IT
(i — r')!
1
r ' = 0 a' = 0
Lord (1964) suggests using Equation 36 to calculate the observed score distribution when
T h e d ia g o n a l m a tr ix M 2 is g iv e n b y
/ ( o ) ( ^ ) °
0 0 0
0 0 0
£
0 0 0
II
\ 0 0 0
T h e m a tr ix M 3 is g iv e n b y
/ K\K\ K\{K~2)\ KW \
K'.(q + 0 ) k a + p )x-\ K\(q + 0 ) k - 7 A '! ( « + /? ) 0 \
( K - l ) l ( K - 1)! (A ' — 1 )!( A ' —2 )
A '! ( » + /3 ) jf _ 1 A '! ( a + £ ) j< _ 2 K\(o + P )k - 3
0
..
£
11
1
to
to
2 \K\ 2 ! ( / C — )!
A !(t* + /?)o
0
K\(a+0h K\(<*+0h
WK\ V.(K -\)\
A 'K o r + ^ h K\(a+{3) 0
0 0
0>AT!
^ K \ ( a + 0 )q
0 0 0 )
T h e d ia g o n a l m a tr ix M 4 is g iv e n b y
/ ( £ ) ( £ t)° 0 0 0 \
0 (fK is r )1 0 0
0 0 0
II
\ 0 0 0 (£)(b)K)
The matrix M 5 is given by
M, =
Wh (A l ( 0) o
0
1!
W i 0?)o
1! 0! 0 0
V W o.
f\1 0 0 0 )
The elements of each of the matrices M, can be computed by first calculating an easily
computed initial element and computing the other elements as a simple factor times an
adjacent element. In addition, not all elements of the matrices need to be separately
calculated. For example, for matrix M\ only the first column needs to be computed
12
Four-parameter Beta Compound Binomial
because the numbers in the other columns are a subset of the numbers in the first column.
The fitted probability for raw score z is the (z + l)-th diagonal element of the matrix
Case 2 : k > O.When k > 0, substituting the conditional error distribution given in Equa
where po(i) is the probability given in Equation 25 of raw score i when the conditional
3, and g(r |a, /?, I, u) is the four-parameter beta true score distribution given by Equation
(K 2)!+ j ^ K-i-j
j\(Ii — 2 —j)! V 1
= ( j+ !)(*• -l - j ) K\ ^ +1(1 _
K(K-l) (j + l)!(/v - 1 —j)!
~2i(I\ — i) p(i |
A', r) -J- (z — 1) (A" + 1 — i)p(i — 1 |AT, r)] g( r \
a , /?, /, it) dr. (39)
Define Po( i ) as
•PoU) = j { K - j ) P o ( j ) - (40)
13
Four-parameter Beta Compound Binomial
The procedure for calculating the observed score distribution for a value of k > 0,
say ko, is to first calculate the observed score distribution assuming k = 0 using Equation
When k > 0 some of the probabilities computed using Equation 41 may be negative.
When negative values are computed they are usually very small in magnitude.
In this section the results presented previously will be used to obtain formulas for
computing the classification consistency indexes described by Hanson and Brennan (1990)
assuming the four-parameter beta compound binomial model holds for the test score in
question. The two types of classification consistency indexes discussed by Hanson and
Brennan (1990) are considered separately. First, the calculation of the probability of a
ate distribution of scores on two independent administrations of a test. Assuming the test
follow the four-parameter beta compound binomial model the probability of a randomly
chosen examinee obtaining raw scores i and j on the two test administrations is (using the
notation of Equation 1)
x o —l x 0 —1 K K
p=Y^ E Pr( * i = i > * 2 = j ) + E 'Z ,p<x 1= i , x 2= j), (43)
!~ 0 j=0 ! —IQ j —X0
14
Four-parameter Beta Compound Binomial
Coefficient k is defined as
_ P-Pc , 4A^
K= — c ' (44)
by Equation 42 can be computed based on the conditional error and true score distributions
estimated from a single test administration. Thus, the first step in computing the bivariate
true score distribution using data from a single test administration as previously described.
After the parameters of the true score distribution have been estimated, the bivariate
Case 1: k = 0. For computing the integral in Equation 42 the case is first considered in
The integral on the last line of Equation 45 is the probability of a test score of i + j on a test
with 2K items where the test score distribution is assumed to follow the four-parameter
beta binomial model with the same true score distribution as the test scores X\ and X 2.
This integral can be calculated using Equation 36 (using the estimated parameters of the
15
Four-parameter Beta Compound Binomial
Case 2: k > 0. When k > 0, substituting the conditional error distribution given in Equa
tion 3 into Equation 42 gives the bivariate distribution of scores on two independent
administrations of a test as
Pr(X! = i , X 2 = j ) = J ' { p ( i I K , t )
where p(i |n, r) is a binomial probability as defined in Equation 3. Using the equality
t(1 - r ) p ( j |K - 2 ,t ) = ^ _ i) ^ + 1 IK’ t
PrC*! = i , X 2 = j ) = j f jp (i |A » p O |A »
f- ( i + \ ) { K - i - l ) p ( i + 1 |K , r ) p ( j \K , t )
K(I< - 1 )
- ( j + 1) ( K - j - l ) p( i |AT, r ) p ( j + 1 |A", r)
- ( i - 1)(AT - j + l)p(* I K i t ) p U ~ 1 I K : r )
16
Four-parameter Beta Compound Binomial
- 2 (i - 1) j ( K - i + 1 ) ( K - j ) p ( i - 1 |K , r ) p ( j |K , r )
P r(X i = i , X 2 = j ) = p o ( i , j )
k f k
[(* - !) ( ; - 1)(/1T - i + 1) ( K - j + 1)] po(> ~ 1,j ~ 1)
K ( K - 1) { K ( K - l )
- 2k
+ (i - 1) ( K - 2 + 1) j ( K —j) —1 Po(« - 1, j)
{K-\)K
k
[(* “ ! ) 0 + ! ) ( ^ “ » + ! ) ( # ~ j ~ !)] Po(« - 1,J + 1)
K ( K - 1)
—2k
+ { j - l ) ( K - j + 1) - 1 Po(i,j - 1)
.( A '- l ) A '
+ { 2 [(A' - i) i + ( K - j ) j ] + ^ [ Hj ( K ~ i ) ( K - j ) ] } po(i, j)
-2k
- i ( A — i) — 1 Po(hJ + 1)
the test when the conditional error distribution is binomial (given by Equation 45).
which are used in Equation 48 to calculate the bivariate distribution of scores on two
independent administrations of the test from which coefficient p and coefficient k can be
computed.
17
Four-parameter Beta Compound Binomial
K /-To
where Tq is the true score cutoff. The false negative error rate is given by
(50)
Case 1: k = 0. The case is first considered in which the conditional error distribution is
binomial ( k = 0). Based on a series of steps analogous to those which produced Equation
30 from Equation 25 each term in the sum given in Equation 49 can be written as
r=0 5=0
where r* is defined as in Equation 22 and rj1 = (to —l)/(u — I). Using the definition of the
K — 1— 3
•*r0-( a + r,/J + i ) ( * ) ( ‘ ) ( K s * ) / ■ ' - ' ( I - « ) * - ' - \ u - i y + \ (52)
In a manner analogous to that by which Equation 36 was obtained from Equation 31,
( I < - r f) l ( K - s')!
K)
r' ) (—
\ u —I
Ir~{a + i - r ',0 + K — i — s')
f K \ f ' ~ uY 1 {ft)K-i-s'
(53)
---------------------------------------------------------------------------------------------------------------------------------------------------1
1
■«
e
1
1
c
i
Equation 53 can be used to calculate the terms in the sum of Equation 49 to give the false
Case 2: k > 0. When k > 0 an adjustment formula analogous to Equation 41 can be used.
If Z{ is the value of the integral in Equation 53 when k = 0 then the value of the integral
when k > 0 is
k
(:i - 1) (I\ - i + 1) Zi-i - 2i(K — i) Z{ + (i + 1) (I< - i - 1) Zi+i ■ (54)
K ( K - 1) L'
The false negative error rate can be calculated using the values given by Equation 53
SO that
[ U [ T 0
I P r(X = i |r, k) g{r |cv, ft, /, u) dr = 1 — I P r(X = i |r, k) g(r |a, ft, I, u) dr . (55)
Jt0 Jl
Values of the false positive and false negative error rates are computed based on the
values from Equation 53 (or Equation 54 when k > 0) and 55, respectively.
Computer Programs
in this paper for producing method of moments estimates of the true score distribution
and observed score distribution under the four-parameter beta compound binomial model
and computing classification consistency indexes are available from the author. These
19
Four-parameter Beta Compound Binomial
Macintosh programs which use these routines are also available from the author. One
program calculates the method of moments estimates of the four-parameter beta compound
binomial model, displays the results in a text window and produces graphic displays of the
fitted distributions. This program also computes fitted raw score distributions based on a
log-linear model (Holland and Thayer, 1987). Another Macintosh program computes and
displays the classification consistency indexes described for three models: the beta bino
mial, the four-parameter beta binomial, and the four-parameter beta compound binomial.
20
Four-parameter Beta Compound Binomial
References
21
I
I