# Binomial distribution

Notation Probability mass function Cumulative distribution function B(n, p) n ∈ N0 — number of trialsp ∈ [0,1] — success probability in each trial k ∈ { 0, …, n } — number of successes $\textstyle {n \choose k}\, p^k (1-p)^{n-k}$ $\textstyle I_{1-p}(n - k, 1 + k)$ np ⌊np⌋ or ⌈np⌉ ⌊(n + 1)p⌋ or ⌊(n + 1)p⌋ − 1 np(1 − p) $\frac{1-2p}{\sqrt{np(1-p)}}$ $\frac{1-6p(1-p)}{np(1-p)}$ $\frac12 \log_2 \big( 2\pi e\, np(1-p) \big) + O \left( \frac{1}{n} \right)$ $(1-p + pe^t)^n \!$ $(1-p + pe^{it})^n \!$ $G(z) = \left[(1-p) + pz\right]^n.$ $g(p,n) = \frac{n}{p(1-p)}$(continuous parameter only)

"Binomial model" redirects here. For the binomial model in options pricing, see Binomial options pricing model.
Notation Probability mass function Cumulative distribution function B(n, p) n ∈ N0 — number of trialsp ∈ [0,1] — success probability in each trial k ∈ { 0, …, n } — number of successes $\textstyle {n \choose k}\, p^k (1-p)^{n-k}$ $\textstyle I_{1-p}(n - k, 1 + k)$ np ⌊np⌋ or ⌈np⌉ ⌊(n + 1)p⌋ or ⌊(n + 1)p⌋ − 1 np(1 − p) $\frac{1-2p}{\sqrt{np(1-p)}}$ $\frac{1-6p(1-p)}{np(1-p)}$ $\frac12 \log_2 \big( 2\pi e\, np(1-p) \big) + O \left( \frac{1}{n} \right)$ $(1-p + pe^t)^n \!$ $(1-p + pe^{it})^n \!$ $G(z) = \left[(1-p) + pz\right]^n.$ $g(p,n) = \frac{n}{p(1-p)}$(continuous parameter only)
Binomial distribution for $p=0.5$
with n and k as in Pascal's triangle

The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is $70/256$.

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.

## Specification

### Probability mass function

In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(np). The probability of getting exactly k successes in n trials is given by the probability mass function:

$f(k;n,p) = \Pr(X = k) = {n\choose k}p^k(1-p)^{n-k}$

for k = 0, 1, 2, ..., n, where

${n\choose k}=\frac{n!}{k!(n-k)!}$

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want k successes (pk) and n − k failures (1 − p)n − k. However, the k successes can occur anywhere among the n trials, and there are ${n\choose k}$ different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

$f(k,n,p)=f(n-k,n,1-p).$

Looking at the expression ƒ(knp) as a function of k, there is a k value that maximizes it. This k value can be found by calculating

$\frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)}$

and comparing it to 1. There is always an integer M that satisfies

$(n+1)p-1 \leq M < (n+1)p.$

ƒ(knp) is monotone increasing for k < M and monotone decreasing for k > M, with the exception of the case where (n + 1)p is an integer. In this case, there are two values for which ƒ is maximal: (n + 1)p and (n + 1)p − 1. M is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

Recurrence relation

$\left\{p (n-k) \text{Prob}(k)+(k+1) (p-1) \text{Prob}(k+1)=0,\text{Prob}(0)=( 1-p)^n\right\}$

### Cumulative distribution function

The cumulative distribution function can be expressed as:

$F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}$

where $\scriptstyle \lfloor k\rfloor\,$ is the "floor" under k, i.e. the greatest integer less than or equal to k.

It can also be represented in terms of the regularized incomplete beta function, as follows:[1]

\begin{align} F(k;n,p) & = \Pr(X \le k) \\ &= I_{1-p}(n-k, k+1) \\ & = (n-k) {n \choose k} \int_0^{1-p} t^{n-k-1} (1-t)^k \, dt. \end{align}

Some closed-form bounds for the cumulative distribution function are given below.

## Example

Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1,..., 6 heads after six tosses?

$\Pr(0\text{ heads}) = f(0) = \Pr(X = 0) = {6\choose 0}0.3^0 (1-0.3)^{6-0} \approx 0.1176$
$\Pr(1\text{ head }) = f(1) = \Pr(X = 1) = {6\choose 1}0.3^1 (1-0.3)^{6-1} \approx 0.3025$
$\Pr(2\text{ heads}) = f(2) = \Pr(X = 2) = {6\choose 2}0.3^2 (1-0.3)^{6-2} \approx 0.3241$
$\Pr(3\text{ heads}) = f(3) = \Pr(X = 3) = {6\choose 3}0.3^3 (1-0.3)^{6-3} \approx 0.1852$
$\Pr(4\text{ heads}) = f(4) = \Pr(X = 4) = {6\choose 4}0.3^4 (1-0.3)^{6-4} \approx 0.0595$
$\Pr(5\text{ heads}) = f(5) = \Pr(X = 5) = {6\choose 5}0.3^5 (1-0.3)^{6-5} \approx 0.0102$
$\Pr(6\text{ heads}) = f(6) = \Pr(X = 6) = {6\choose 6}0.3^6 (1-0.3)^{6-6} \approx 0.0007$[2]

## Mean and variance

If X ~ B(n, p), that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of X is

$\operatorname{E}[X] = np ,$

(For example, if n=100, and p=25/100, then 25% of the results are likely to be successful. It is expected that 25 results are successful even though it is not certain.)

and the variance

$\operatorname{Var}[X] = np(1 - p).$

## Mode and median

Usually the mode of a binomial B(n, p) distribution is equal to $\lfloor (n+1)p\rfloor$, where $\lfloor\cdot\rfloor$ is the floor function. However when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:

$\text{mode} = \begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases}$

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

• If np is an integer, then the mean, median, and mode coincide and equal np.[3][4]
• Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.[5]
• A median m cannot lie too far away from the mean: |mnp| ≤ min{ ln 2, max{p, 1 − p} }.[6]
• The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or |m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd).[5][6]
• When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.

## Covariance between two binomials

If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case n = 1 (thus being Bernoulli trials) we have

$\operatorname{Cov}(X, Y) = \operatorname{E}(XY) - \mu_X \mu_Y.$

The first term is non-zero only when both X and Y are one, and μX and μY are equal to the two probabilities. Defining pB as the probability of both happening at the same time, this gives

$\operatorname{Cov}(X, Y) = p_B - p_X p_Y,$

and for n independent pairwise trials

$\operatorname{Cov}(X, Y)_n = n ( p_B - p_X p_Y ).$

If X and Y are the same variable, this reduces to the variance formula given above.

## Related distributions

### Sums of binomials

If X ~ B(np) and Y ~ B(mp) are independent binomial variables with the same probability p, then X + Y is again a binomial variable; its distribution is[citation needed]

$X+Y \sim B(n+m, p).\,$

### Conditional binomials

If X ~ B(np) and, conditional on X, Y ~ B(Xq), then Y is a simple binomial variable with distribution[citation needed]

$Y \sim B(n, pq).$

### Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B(1, p) has the same meaning as X ~ Bern(p). Conversely, any binomial distribution, B(np), is the distribution of the sum of n Bernoulli trials, Bern(p), each with the same probability p.[citation needed]

### Poisson binomial distribution

The binomial distribution is a special case of the Poisson binomial distribution, which is a sum of n independent non-identical Bernoulli trials Bern(pi).[citation needed] If X has the Poisson binomial distribution with p1 = … = pn =p then X ~ B(np).

### Normal approximation

Binomial probability mass function and normal probability density function approximation for n = 6 and p = 0.5

If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to B(np) is given by the normal distribution

$\mathcal{N}(np,\, np(1-p)),$

and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.[7] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one:

• One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large n until n is very large (ex: x=11, n=7752).
• A second rule[7] is that for n > 5 the normal approximation is adequate if
$\left | \left (\frac{1}{\sqrt{n}} \right ) \left (\sqrt{\frac{1-p}{p}}-\sqrt{\frac{p}{1-p}} \right ) \right |<0.3$
• Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values,[citation needed] that is if
$\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].$

The following is an example of applying a continuity correction. Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since B(np) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of p using x/n, the sample proportion and estimator of p, in a common test statistic.[8]

For example, suppose one randomly samples n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of n people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)1/2. Large sample sizes n are good because the standard deviation, as a proportion of the expected value, gets smaller, which allows a more precise estimate of the unknown parameter p.

### Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[9]

### Limiting distributions

$\frac{X-np}{\sqrt{np(1-p)}}$
approaches the normal distribution with expected value 0 and variance 1.[citation needed] This result is sometimes loosely stated by saying that the distribution of X is asymptotically normal with expected value np and variance np(1 − p). This result is a specific case of the central limit theorem.

### Beta distribution

Beta distributions provide a family of conjugate prior probability distributions for binomial distributions in Bayesian inference. The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value p:[10]

$P(p;\alpha,\beta) = \frac{p^{\alpha-1}(1-p)^{\beta-1}}{\mathrm{B}(\alpha,\beta)}$.

## Confidence intervals

Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.[11] Because of this problem several methods to estimate confidence intervals have been proposed.

Let n1 be the number of successes out of n, the total number of trials, and let

$\hat{p} = \frac{n_1}{n}$

be the proportion of successes. Let zα/2 be the 100(1 − α/2)th percentile of the standard normal distribution.

• Wald method
$\hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{ \frac{ \hat{p} ( 1 -\hat{p} )}{ n } } .$
A continuity correction of 0.5/n may be added.[clarification needed]
• Agresti-Coull method[12]
$\tilde{p} \pm z_{\frac{\alpha}{2}} \sqrt{ \frac{ \tilde{p} ( 1 - \tilde{p} )}{ n + z_{\frac{\alpha}{2}}^2 } } .$
Here the estimate of p is modified to
$\tilde{p}= \frac{ n_1 + \frac{1}{2} z_{\frac{\alpha}{2}}^2}{ n + z_{\frac{\alpha}{2}}^2 }$
$\sin^2 \left (\arcsin \left ( \sqrt{ \hat{p} } \right ) \pm \frac{ z }{ 2 \sqrt{ n } } \right )$
• Wilson (score) method[14]
$\frac{\hat{p} + \frac{1}{2n} z_{1-\frac{\alpha}{2}}^2 \pm \frac{1}{2n} z_{1-\frac{\alpha}{2}} \sqrt{4n\hat{p}(1 - \hat{p})+ z_{1-\frac{\alpha}{2}}^2}} {1+ \frac{1}{n} z_{1-\frac{\alpha}{2}}^2}.$

The exact (Clopper-Pearson) method is the most conservative.[11] The Wald method although commonly recommended in the text books is the most biased.[clarification needed]

## Generating binomial random variates

Methods for random number generation where the marginal distribution is a binomial distribution are well-established.[15][16]

One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a Linear congruential generator to generate samples uniform between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.

## Bounds for the cumulative distribution function

For knp, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

$F(k;n,p) \leq \exp\left(-2 \frac{(np-k)^2}{n}\right), \!$

and Chernoff's inequality can be used to derive the bound

$F(k;n,p) \leq \exp\left(-\frac{1}{2\,p} \frac{(np-k)^2}{n}\right). \!$

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k3n/8[17]

$F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left(- \frac{16 (\frac{n}{2} - k)^2}{n}\right). \!$

However, the bounds do not work well for extreme values of p. In particular, as p $\rightarrow$ 1, value F(k;n,p) goes to zero (for fixed k, n with k<n) while the upper bound above goes to a positive constant. In this case a better bound is given by [18]

$F(k;n,p) \leq \exp\left(-nH\left(\frac{k}{n},p\right)\right) \quad\quad\mbox{if }0<\frac{k}{n}

where H(a, p) is the relative entropy between a p-coin and an a-coin:

$H(a,p)=(a)\log\frac{a}{p}+(1-a)\log\frac{1-a}{1-p}. \!$

Asymptotically, this bound is reasonably tight; see [18] for details. An equivalent formulation of the bound is

$\Pr(X \ge k) =F(n-k;n,1-p)\leq \exp\left(-nH\left(\frac{k}{n},p\right)\right) \quad\quad\mbox{if }p<\frac{k}{n}<1.\!$

 Statistics portal

## References

1. ^ Wadsworth, G. P. (1960). Introduction to probability and random variables. USA: McGraw-Hill New York. p. 52.
2. ^ Hamilton Institute. "The Binomial Distribution" October 20, 2010.
3. ^ Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German) 19: 29–33.
4. ^ Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
5. ^ a b Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.
6. ^ a b Hamza, K. (1995). "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions". Statistics & Probability Letters 23: 21–25. doi:10.1016/0167-7152(94)00090-U. edit
7. ^ a b Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
8. ^ NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
9. ^ a b NIST/SEMATECH, "6.3.3.1. Counts Control Charts", e-Handbook of Statistical Methods.
10. ^ MacKay, David (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press; First Edition. ISBN 978-0521642989.
11. ^ a b Brown LD, Cai T. and DasGupta A (2001). Interval estimation for a binomial proportion (with discussion). Statist Sci 16: 101–133
12. ^ Agresti A, Coull BA (1998) "Approximate is better than 'exact' for interval estimation of binomial proportions". The American Statistician 52:119–126
13. ^ Pires MA () Confidence intervals for a binomial proportion: comparison of methods and software evaluation.
14. ^ Wilson EB (1927) "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association 22: 209–212
15. ^ Devroye, Luc (1986) Non-Uniform Random Variate Generation, New York: Springer-Verlag. (See especially Chapter X, Discrete Univariate Distributions)
16. ^ Kachitvichyanukul, V.; Schmeiser, B. W. (1988). "Binomial random variate generation". Communications of the ACM 31 (2): 216–222. doi:10.1145/42372.42381. edit
17. ^ Matoušek, J, Vondrak, J: The Probabilistic Method (lecture notes) [1].
18. ^ a b R. Arratia and L. Gordon: Tutorial on large deviations for the binomial distribution, Bulletin of Mathematical Biology 51(1) (1989), 125–131 [2].