[<< wikibooks] Statistics/Distributions/Hypergeometric
=== Hypergeometric Distribution ===
The hypergeometric distribution describes the number of successes in a sequence of n draws without replacement from a population of N that contained m total successes.
Its probability mass function is:

f
(
k
)
=

(

m
k

)

(

N
−
m

n
−
k

)

(

N
n

)

for all

x
∈
[
0
,
n
]

{\displaystyle f(k)={{{m \choose k}{{N-m} \choose {n-k}}} \over {N \choose n}}{\text{ for all }}x\in [0,n]}
Technically the support for the function is only where x∈[max(0, n+m-N), min(m, n)]. In situations where this range is not [0,n], f(x)=0 since for k>0,

(

0
k

)

=
0

{\displaystyle {0 \choose k}=0}
.

==== Probability Density Function ====
We first check to see that f(x) is a valid pmf. This requires that it is non-negative everywhere and that its total sum is equal to 1. The first condition is obvious. For the second condition we will start with Vandermonde's identity

∑

x
=
0

n

(

a
x

)

(

b

n
−
x

)

=

(

a
+
b

n

)

{\displaystyle \sum _{x=0}^{n}{a \choose x}{b \choose n-x}={a+b \choose n}}

∑

x
=
0

n

(

a
x

)

(

b

n
−
x

)

(

a
+
b

n

)

=
1

{\displaystyle \sum _{x=0}^{n}{{a \choose x}{b \choose n-x} \over {a+b \choose n}}=1}
We now see that if a=m and b=N-m that the condition is satisfied.

==== Mean ====
We derive the mean as follows:

E
⁡
[
X
]
=

∑

x
=
0

n

x
⋅
f
(
x
;
n
,
m
,
N
)
=

∑

x
=
0

n

x
⋅

(

m
x

)

(

N
−
m

n
−
x

)

(

N
n

)

{\displaystyle \operatorname {E} [X]=\sum _{x=0}^{n}x\cdot f(x;n,m,N)=\sum _{x=0}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}}

E
⁡
[
X
]
=
0
⋅

(

m
0

)

(

N
−
m

n
−
0

)

(

N
n

)

+

∑

x
=
1

n

x
⋅

(

m
x

)

(

N
−
m

n
−
x

)

(

N
n

)

{\displaystyle \operatorname {E} [X]=0\cdot {{{m \choose 0}{{N-m} \choose {n-0}}} \over {N \choose n}}+\sum _{x=1}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}}
We use the identity

(

a
b

)

=

a
b

(

a
−
1

b
−
1

)

{\displaystyle {\binom {a}{b}}={\frac {a}{b}}{\binom {a-1}{b-1}}}
in the denominator.

E
⁡
[
X
]
=
0
+

∑

x
=
1

n

x
⋅

(

m
x

)

(

N
−
m

n
−
x

)

N
n

(

N
−
1

n
−
1

)

{\displaystyle \operatorname {E} [X]=0+\sum _{x=1}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {{N \over n}{{N-1} \choose {n-1}}}}}

E
⁡
[
X
]
=

n
N

∑

x
=
1

n

x
⋅

(

m
x

)

(

N
−
m

n
−
x

)

(

N
−
1

n
−
1

)

{\displaystyle \operatorname {E} [X]={n \over N}\sum _{x=1}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {{N-1} \choose {n-1}}}}
Next we use the identity

b

(

a
b

)

=
a

(

a
−
1

b
−
1

)

{\displaystyle b{\binom {a}{b}}=a{\binom {a-1}{b-1}}}
in the first binomial of the numerator.

E
⁡
[
X
]
=

n
N

∑

x
=
1

n

m

(

m
−
1

x
−
1

)

(

N
−
m

n
−
x

)

(

N
−
1

n
−
1

)

{\displaystyle \operatorname {E} [X]={n \over N}\sum _{x=1}^{n}{m{{m-1 \choose x-1}{{N-m} \choose {n-x}}} \over {{N-1} \choose {n-1}}}}
Next, for the variables inside the sum we define corresponding prime variables that are one less. So N′=N−1, m′=m−1, x′=x−1, n′=n-1.

E
⁡
[
X
]
=

m
n

N

∑

x
′

=
0

n
′

(

m
′

x
′

)

(

N
′

−

m
′

n
′

−

x
′

)

(

N
′

n
′

)

{\displaystyle \operatorname {E} [X]={mn \over N}\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}}

E
⁡
[
X
]
=

m
n

N

∑

x
′

=
0

n
′

f
(

x
′

;

n
′

,

m
′

,

N
′

)

{\displaystyle \operatorname {E} [X]={mn \over N}\sum _{x'=0}^{n'}f(x';n',m',N')}
Now we see that the sum is the total sum over a Hypergeometric pmf with modified parameters. This is equal to 1. Therefore

E
⁡
[
X
]
=

n
m

N

{\displaystyle \operatorname {E} [X]={nm \over N}}

==== Variance ====
We first determine E(X2).

E
⁡
[

X

2

]
=

∑

x
=
0

n

f
(
x
;
n
,
m
,
N
)
⋅

x

2

=

∑

x
=
0

n

(

m
x

)

(

N
−
m

n
−
x

)

(

N
n

)

⋅

x

2

{\displaystyle \operatorname {E} [X^{2}]=\sum _{x=0}^{n}f(x;n,m,N)\cdot x^{2}=\sum _{x=0}^{n}{{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}\cdot x^{2}}

E
⁡
[

X

2

]
=

(

m
0

)

(

N
−
m

n
−
0

)

(

N
n

)

⋅

0

2

+

∑

x
=
1

n

(

m
x

)

(

N
−
m

n
−
x

)

(

N
n

)

⋅

x

2

{\displaystyle \operatorname {E} [X^{2}]={{{m \choose 0}{{N-m} \choose {n-0}}} \over {N \choose n}}\cdot 0^{2}+\sum _{x=1}^{n}{{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}\cdot x^{2}}

E
⁡
[

X

2

]
=
0
+

∑

x
=
1

n

m

(

m
−
1

x
−
1

)

(

N
−
m

n
−
x

)

N
n

(

N
−
1

n
−
1

)

⋅
x

{\displaystyle \operatorname {E} [X^{2}]=0+\sum _{x=1}^{n}{{m{m-1 \choose x-1}{{N-m} \choose {n-x}}} \over {{N \over n}{{N-1} \choose {n-1}}}}\cdot x}

E
⁡
[

X

2

]
=

m
n

N

∑

x
=
1

n

(

m
−
1

x
−
1

)

(

N
−
m

n
−
x

)

(

N
−
1

n
−
1

)

⋅
x

{\displaystyle \operatorname {E} [X^{2}]={mn \over N}\sum _{x=1}^{n}{{{m-1 \choose x-1}{{N-m} \choose {n-x}}} \over {{N-1} \choose {n-1}}}\cdot x}
We use the same variable substitution as when deriving the mean.

E
⁡
[

X

2

]
=

m
n

N

∑

x
′

=
0

n
′

(

m
′

x
′

)

(

N
′

−

m
′

n
′

−

x
′

)

(

N
′

n
′

)

(

x
′

+
1
)

{\displaystyle \operatorname {E} [X^{2}]={mn \over N}\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}(x'+1)}

E
⁡
[

X

2

]
=

m
n

N

[

∑

x
′

=
0

n
′

(

m
′

x
′

)

(

N
′

−

m
′

n
′

−

x
′

)

(

N
′

n
′

)

x
′

+

∑

x
′

=
0

n
′

(

m
′

x
′

)

(

N
′

−

m
′

n
′

−

x
′

)

(

N
′

n
′

)

]

{\displaystyle \operatorname {E} [X^{2}]={mn \over N}\left[\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}x'+\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}\right]}
The first sum is the expected value of a hypergeometric random variable with parameteres (n',m',N'). The second sum is the total sum that random variable's pmf.

E
⁡
[

X

2

]
=

m
n

N

[

n
′

m
′

N
′

+
1

]

{\displaystyle \operatorname {E} [X^{2}]={mn \over N}\left[{n'm' \over N'}+1\right]}

E
⁡
[

X

2

]
=

m
n

N

[

(
n
−
1
)
(
m
−
1
)

(
N
−
1
)

+
1

]

=

m
n

N

[

(
n
−
1
)
(
m
−
1
)
+
(
N
−
1
)

(
N
−
1
)

]

{\displaystyle \operatorname {E} [X^{2}]={mn \over N}\left[{(n-1)(m-1) \over (N-1)}+1\right]={mn \over N}\left[{{(n-1)(m-1)+(N-1)} \over (N-1)}\right]}
We then solve for the variance

Var
⁡
(
X
)
=
E
⁡
[

X

2

]
−
(
E
⁡
[
X
]

)

2

{\displaystyle \operatorname {Var} (X)=\operatorname {E} [X^{2}]-(\operatorname {E} [X])^{2}}

Var
⁡
(
X
)
=

m
n

N

[

(
n
−
1
)
(
m
−
1
)
+
(
N
−
1
)

(
N
−
1
)

]

−

(

m
n

N

)

2

{\displaystyle \operatorname {Var} (X)={mn \over N}\left[{{(n-1)(m-1)+(N-1)} \over (N-1)}\right]-\left({mn \over N}\right)^{2}}

Var
⁡
(
X
)
=

N
m
n

N

2

[

(
n
−
1
)
(
m
−
1
)
+
(
N
−
1
)

(
N
−
1
)

]

−

(
N
−
1
)
(
m
n

)

2

(
N
−
1
)

N

2

{\displaystyle \operatorname {Var} (X)={Nmn \over N^{2}}\left[{{(n-1)(m-1)+(N-1)} \over (N-1)}\right]-{(N-1)(mn)^{2} \over (N-1)N^{2}}}

Var
⁡
(
X
)
=

n
m
(
N
−
n
)
(
N
−
m
)

N

2

(
N
−
1
)

{\displaystyle \operatorname {Var} (X)={nm(N-n)(N-m) \over N^{2}(N-1)}}