Binomial Random Variables
So far, in our discussion about discrete random variables, we have been introduced to:
- The probability distribution, which tells us which values a variable takes, and how often it takes them.
- The mean of the random variable, which tells us the long-run average value that the random variable takes.
- The standard deviation of the random variable, which tells us a typical (or long-run average) distance between the mean of the random variable and the values it takes.
We will now introduce a special class of discrete random variables that are very common, because as you’ll see, they will come up in many situations—binomial random variables.
Here’s how we’ll present this material. First, we’ll explain what kind of random experiments give rise to a binomial random variable, and how the binomial random variable is defined in those types of experiments.
We’ll then present the probability distribution of the binomial random variable, which will be presented as a formula (which, as you remember, is one of the three ways in which a probability distribution of a discrete random variable can be presented), and explain why the formula makes sense. We’ll conclude our discussion by presenting the mean and standard deviation of the binomial random variable.
As we just mentioned, we’ll start by describing what kind of random experiments give rise to a binomial random variable. We’ll call this type of random experiment a “binomial experiment.”
Binomial experiments are random experiments that consist of a fixed number of repeated trials, like tossing a coin 10 times, randomly choosing 10 people, rolling a die 5 times, etc. These trials, however, need to be independent in the sense that the outcome in one trial has no effect on the outcome in other trials. In each of these repeated trials there is one outcome that is of interest to us (we call this outcome “success”), and each of the trials is identical in the sense that the probability that the trial will end in a “success” is the same in each of the trials. So for example, if our experiment is tossing a coin 10 times, and we are interested in the outcome “heads” (our “success”), then this will be a binomial experiment, since the 10 trials are independent, and the probability of success is 1/2 in each of the 10 trials. Let’s summarize and give more examples.
To summarize, the requirements for a random experiment to be a binomial experiment are:
- a fixed number (n) of trials
- each trial must be independent of the others
- each trial has just two possible outcomes, called “success” (the outcome of interest) and “failure“
- there is a constant probability (p) of success for each trial, the complement of which is the probability (1 – p) of failure
In binomial random experiments, the number of successes in n trials is random. It can be as low as 0, if all the trials end up in failure, or as high as n, if all n trials end in success.
The random variable X that represents the number of successes in those n trials is called binomial, and is determined by the values of n and p. We say, “X is binomial with n = … and p = …”
Example: Random Experiments (Binomial or Not?)
Let’s consider a few random experiments.
In each of them, we’ll decide whether the random variable is binomial. If it is, we’ll determine the values for n and p. If it isn’t, we’ll explain why not.
- A fair coin is flipped 20 times; X represents the number of heads.X is binomial with n = 20 and p = 0.5.
- You roll a fair die 50 times; X is the number of times you get a six.X is binomial with n = 50 and p = 1/6.
- Roll a fair die repeatedly; X is the number of rolls it takes to get a six.X is not binomial, because the number of trials is not fixed.
- Draw 3 cards at random, one after the other, without replacement, from a set of 4 cards consisting of one club, one diamond, one heart, and one spade; X is the number of diamonds selected.X is not binomial, because the selections are not independent. (The probability (p) of success is not constant, because it is affected by previous selections.)
- Draw 3 cards at random, one after the other, with replacement, from a set of 4 cards consisting of one club, one diamond, one heart, and one spade; X is the number of diamonds selected. Sampling with replacement ensures independence.X is binomial with n = 3 and p = 1/4.
- Approximately 1 in every 20 children has a certain disease. Let X be the number of children with the disease out of a random sample of 100 children. Although the children are sampled without replacement, it is assumed that we are sampling from such a vast population that the selections are virtually independent.X is binomial with n = 100 and p = 1/20 = 0.05.
- The probability of having blood type B is 0.1. Choose 4 people at random; X is the number with blood type B.X is binomial with n = 4 and p = 0.1.
- A student answers 10 quiz questions completely at random; the first five are true/false, the second five are multiple choice, with four options each. X represents the number of correct answers.X is not binomial, because p changes from 1/2 to 1/4.
Example 4 above was not binomial because sampling without replacement resulted in dependent selections. In particular, the probability of the second card being a diamond is very dependent on whether or not the first card was a diamond: the probability is 0 if the first card was a diamond, 1/3 if the first card was not a diamond.
In contrast, Example 5 was binomial because sampling with replacement resulted in independent selections: the probability of any of the 3 cards being a diamond is 1/4 no matter what the previous selections have been.
On the other hand, when you take a relatively small random sample of subjects from a large population, even though the sampling is without replacement, we can assume independence because the mathematical effect of removing one individual from a very large population on the next selection is negligible. For example, in Example 6, we sampled 100 children out of the population of all children. Even though we sampled the children without replacement, whether one child has the disease or not really has no effect on whether another child has the disease or not. The same is true for Example (7).
The convention is to “fudge” the requirement of independence as long as the population is at least 10 times the sample size.
Rule of Thumb
The number (X) of successes in a sample of size n taken without replacement from a population with proportion (p) of successes is approximately binomial with n and p as long as the sample size (n) is at most 10% of the population size (N).
In symbols, this would be: n ≤ 0.10N.
This is the same as saying the population size is greater than or equal to 10 times the sample size. In symbols t his is: N ≥ 10n.
Now that we understand what a binomial random variable is, and when it arises, it’s time to discuss its probability distribution. We’ll start with a simple example and then generalize to a formula.
Example: Deck of Cards
Consider a regular deck of 52 cards, in which there are 13 cards of each suit: hearts, diamonds, clubs and spades. We select 3 cards at random with replacement. Let X be the number of diamond cards we got (out of the 3).
We have 3 trials here, and they are independent (since the selection is with replacement). The outcome of each trial can be either success (diamond) or failure (not diamond), and the probability of success is 1/4 in each of the trials.
X, then, is binomial with n = 3 and p = 1/4.
Let’s build the probability distribution of X as we did in the chapter on probability distributions. Recall that we begin with a table in which we:
- record all possible outcomes in 3 selections, where each selection may result in success (a diamond, D) or failure (a non-diamond, N).
- find the value of X that corresponds to each outcome.
- use simple probability principles to find the probability of each outcome.
With the help of the addition principle, we condense the information in this table to construct the actual probability distribution table:
In order to establish a general formula for the probability that a binomial random variable X takes any given value x, we will look for patterns in the above distribution. From the way we constructed this probability distribution, we know that, in general:
Let’s start with the second part, the probability that there will be x successes out of 3, where the probability of success is 1/4. Notice that the fractions multiplied in each case are for the probability of x successes (where each success has a probability of p = 1/4) and the remaining (3 – x) failures (where each failure has probability of 1 – p = 3/4).
So in general:
Let’s move on to talk about the number of possible outcomes with x successes out of three. Here it is harder to see the pattern, so we’ll give the following mathematical result.
Consider a random experiment that consists of n trials, each one ending up in either success or failure. The number of possible outcomes in the sample space that have exactly k successes out of n is:
Note that n! is read “n factorial” and is defined to be the product 1 * 2 * 3 * … * n. 0! is defined to be 1.
Example: Ear Piercings
You choose 12 male college students at random and record whether they have any ear piercings (success) or not. There are many possible outcomes to this experiment (actually, 4,096 of them!).
In how many of the possible outcomes of this experiment are there exactly 8 successes (students who have at least one ear pierced)?
There is no way that we would start listing all these possible outcomes. The result above comes to our rescue.
The result says that in an experiment like this, where you repeat a trial n times (in our case, we repeat it n = 12 times, once for each student we choose), the number of possible outcomes with exactly 8 successes (out of 12) is:
12!8!(12-8)!=1*2*3* . . . *12(1*2*3* . . . *8) (1*2*3*4)=495
Mean and Standard Deviation of the Binomial Random Variable
Now that we understand how to find probabilities associated with a random variable X which is binomial, using either its probability distribution formula or software, we are ready to talk about the mean and standard deviation of a binomial random variable. Let’s start with an example:
Example: Blood Type B—Mean
Overall, the proportion of people with blood type B is 0.1. In other words, roughly 10% of the population has blood type B.
Suppose we sample 120 people at random. On average, how many would you expect to have blood type B?
The answer, 12, seems obvious; automatically, you’d multiply the number of people, 120, by the probability of blood type B, 0.1. This suggests the general formula for finding the mean of a binomial random variable:
If X is binomial with parameters n and p, then
Although the formula for mean is quite intuitive, it is not at all obvious what the variance and standard deviation should be. It turns out that:
If X is binomial with parameters n and p, then
For those who are interested, read below to see how these formulas were derived.
Many Students Wonder …
Question: How are formulas for the mean and standard deviation of a binomial random variable derived?Answer: For those who are interested, these formulas may be derived with the aid of the rule for variance of a sum of random variables. We think of our binomial X with n and p as the sum of n identical “Bernoulli” random variables X1 through Xn, each representing one of the n trials. Each of the Xi has just two possibilities: success (X = 1, with probability p), or failure (X = 0, with probability 1 – p), as shown in the probability distribution table below.
The mean of each Xi is 0(1 – p) + 1(p) = p, and the variance is
Because variance is additive, if we sum up n such random variables to form the binomial random variable X = X1 + … + Xn, its variance will be the sum of n variances p(1 – p), or np(1 – p).