In the previous sections we’ve learned principles and tools that help us find probabilities of events in general. Now that we’ve become proficient at doing that, we’ll talk about random variables. Just like any other variable, random variables can take on multiple values. What differentiates random variables from other variables is that the values for these variables are determined by a random trial, random sample, or simulation. The probabilities for the values can be determined by theoretical or observational means. Such probabilities play a vital role in the theory behind statistical inference, our ultimate goal in this course.

### Introduction

We first discussed variables in the Exploratory Data Analysis portion of the course. A variable is a characteristic of an individual. We also made an important distinction between **categorical variables**, whose values are groups or categories (and an individual can be placed into one of them), and **quantitative variables**, which have numerical values for which arithmetic operations make sense. In the previous two modules, we focused mostly on events which arise when there is a categorical variable in the background: blood type, pierced ears (yes/no), gender, on time delivery (yes/no), side effect (yes/no), etc. Now we will begin to consider quantitative variables that arise when a random experiment is performed. We will need to define this new type of variable.random variable*(definition)*

A **random variable** assigns a unique numerical value to the outcome of a random experiment.

A random variable can be thought of as a function that associates exactly one of the possible numerical outcomes to each trial of a random experiment. However, that number can be the same for many of the trials.

Before we go any further, here are some simple examples:

## Example: Theoretical

Consider the random experiment of flipping a coin twice. The sample space of possible outcomes is S = { HH, HT, TH, TT }.

Now, let’s **define the variable X to be the number of tails** that the random experiment will produce.

- If the outcome is HH, we have no tails, so the value for X is 0.
- If the outcome is HT, we got one tail, so the value for X is 1.
- If the outcome is TH, we again got one tail, so the value for X is 1.
- Lastly, if the outcome is TT, we got two tails, so the value for X is 2.

As the definition suggests, X is a quantitative variable that takes the possible values of 0, 1, or 2.

It is random because we do not know which of the three values the variable will eventually take. We can ask questions like:

- What is the probability that X will be 2? In other words, what is the probability of getting 2 tails?
- What is the probability that X will be at least 1? In other words, what is the probability of getting at least 1 tail?

As you can see, random variables are not really a new thing, but just a different way to look at the same problem.

Note that if we had tossed a coin three times, the possible values for the number of tails would be 0, 1, 2, or 3. In general, if we toss a coin “n” times, the possible number of tails would be 0, 1, 2, 3, … , or n.

## Example: Observational

Consider getting data from a random sample on the number of ears in which a person wears one or more earrings.

We **define the variable X to be the number of ears** in which a randomly selected person wears an earring.

If the selected person does not wear any earrings, then X = 0.

If the selected person wears earrings in either the left or the right ear, then X = 1.

If the selected person wears earrings in both ears, then X = 2.

As the definition suggests, X is a quantitative variable which takes the possible values of 0, 1, or 2. We can ask questions like:

- What is the probability that a randomly selected person will have earrings in both ears?
- What is the probability that a randomly selected person will not be wearing any earrings in either ear?

**NOTE…**

We identified the first example as **theoretical** and the second as **observational**. Let’s discuss the distinction.

To answer probability questions about a theoretical situation, we only need the principles of probability. However, if we have an observational situation, the only way to answer probability questions is to use the relative frequency we obtain from a random sample.

Here is a different type of example:

## Example: Lightweight Boxer

Assume we choose a lightweight male boxer at random and record his exact weight. According to the boxing rules, a lightweight male boxer must weigh between 130 and 135 pounds, so the sample space here is S = { All the numbers in the interval 130-135 }. (Note that we can’t list all the possible outcomes here.)

We’ll **define X to be the weight of the boxer** (again, as the definition suggests, X is a quantitative variable whose value is the result of our random experiment). Here X can take any value between 130 and 135. We can ask questions like:

- What is the probability that X will be more than 132? In other words, what is the probability that the boxer will weigh more than 132 pounds?
- What is the probability that X will be between 131 and 133? In other words, what is the probability that the boxer weighs between 131 and 133 pounds?

What is the difference between the random variables in these examples? Let’s see:

- They all arise from a random experiment (tossing a coin twice, choosing a person at random, choosing a lightweight boxer at random).
- They are all quantitative (number of tails, number of ears, weight).

Where they differ is in the type of possible values they can take: In the first two examples, X has three distinct possible values: 0, 1, and 2. You can list them. In contrast, in the third example, X takes any value in the interval 130-135, and thus the possible values of X cover an infinite range of possibilities, and cannot be listed.

A random variable like the one in the first two examples, whose possible values are a list of distinct values, is called a **discrete random variable**. A random variable like the one in the third example, that can take any value in an interval, is called a **continuous random variable**.

Just as the distinction between categorical and quantitative variables was important in Exploratory Data Analysis, the distinction between discrete and continuous random variables is important here, as each one gets a different treatment when it comes to calculating probabilities and other quantities of interest.

# Count vs. Measure

Before we go any further, a few observations about the nature of discrete and continuous random variables should be mentioned.

### Comments

- Sometimes, continuous random variables are “rounded” and are therefore “in a discrete disguise.”For example:
- time spent watching TV in a week, rounded to the nearest hour (or minute)
- outside temperature, to the nearest degree
- a person’s weight, to the nearest pound.

- On the other hand, there are some variables which are discrete in nature, but take so many distinct possible values that it will be much easier to treat them as continuous rather than discrete.
- the IQ of a randomly chosen person
- the SAT score of a randomly chosen student
- the annual salary of a randomly chosen CEO, whether rounded to the nearest dollar or the nearest cent

- Sometimes we have a discrete random variable but do not know the extent of its possible values. For example: How many accidents will occur in a particular intersection this month? We may know from previously collected data that this number is from 0-5. But, 6, 7, or more accidents could be possible.
- A good rule of thumb is that
**discrete**random variables are things we**count**, while**continuous**random variables are things we**measure**.- We counted the number of tails and the number of ears with earrings. These were discrete random variables.
- We measured the weight of the lightweight boxer. This was a continuous random variable.

Often we can have a subject matter for which we can collect data that could involve a discrete or a continuous random variable, depending on the information we wish to know.

## Example: Soft Drinks

Suppose we want to know **how many days per week you drink a soft drink**. The sample space would be S = { 0, 1, 2, 3, 4, 5, 6, 7 }. There are a finite number of values for this variable. This would be a discrete random variable.

Instead, suppose we want to know **how many ounces of soft drinks you consume per week**. Even if we round to the nearest ounce, the answer is a measurement. Thus, this would be a continuous random variable.

### Discrete Random Variables

As the introduction suggests, the first part of this module will be devoted to discrete random variables: variables whose possible values are a list of distinct values. In order to decide on some notation, let’s look at the coin toss example again:

A fair coin is tossed twice. Let the random variable X be the number of tails we get in this random experiment. In this case, the possible values that X can assume are 0 (if we get HH), 1 (if get HT or TH) , and 2 (if we get TT).

### Notation

If we want to find the probability of the event “getting 1 tail,” we’ll write: **P(X = 1)**

If we want to find the probability of the event “getting 0 tails,” we’ll write: **P(X = 0)**

In general, we’ll write: **P(X = x)** to denote the probability that the discrete random variable X gets the value x.

Note that for the random variables we’ll use a capital letter, and for the value we’ll use a lowercase letter.

### Section Plan

The way this section on discrete random variables is organized is very similar to the way we organized our discussion about one quantitative variable in the Exploratory Data Analysis sections. It will be separated into four parts.

- We’ll first discuss the probability
**distribution**of a discrete random variable, ways to display it, and how to use it in order to find probabilities of interest. - We’ll then move on to talk about the
**mean and standard deviation**of a discrete random variable, which are measures of the center and spread of its distribution. - Next we’ll have an optional chapter on the
**rules for means and standard deviations**. - We’ll conclude this part by discussing a special and very common class of discrete random variable: the
**binomial**random variable.

### Probability Distribution

When we learned how to find probabilities by applying the basic principles, we generally focused on just one particular outcome or event, like the probability of getting exactly one tail when a coin is tossed twice, or the probability of getting a 5 when a die is rolled. Now that we have mastered the solution of individual probability problems, we’ll proceed to look at the big picture by considering all the possible values of a discrete random variable, along with their associated probabilities. This list of possible values and probabilities is called the **probability distribution** of the random variable.

### Comment

In the Exploratory Data Analysis sections of this course, we often looked at the distribution of sample values in a quantitative data set. We would display the values with a histogram, and summarize them by reporting their mean. In this section, when we look at the probability distribution of a random variable, we consider all its possible values and their overall probabilities of occurrence. Thus, we have in mind an entire population of values for a variable. When we display them with a histogram or summarize them with a mean, these are representing a population of values, not a sample. The distinction between sample and population is an essential concept in statistics, because an ultimate goal is to draw conclusions about unknown values for a population, based on what is observed in the sample.

Recall our first example, when we introduced the idea of a random variable. In this example we tossed a coin twice.

## Example: Flipping a Coin Twice

What is the probability distribution of X, where the random variable X is the number of tails appearing in two tosses of a fair coin?

We first note that since the coin is fair, each of the four outcomes HH, HT, TH, TT in the sample space S is equally likely, and so each has a probability of 1/4. (Alternatively, the multiplication principle can be applied to find the probability of each outcome to be 1/2 * 1/2 = 1/4.)

X takes the value 0 only for the outcome HH, so the probability that X = 0 is 1/4.

X takes the value 1 for outcomes HT or TH. By the addition principle, the probability that X = 1 is 1/4 + 1/4 = 1/2.

Finally, X takes the value 2 only for the outcome TT, so the probability that X = 2 is 1/4.

The **probability distribution of the random variable X** is easily summarized in a table:

As mentioned before, we write “P(X = x)” to denote “the probability that the random variable X takes the value x.”

The way to interpret this table is:

X takes the values 0, 1, 2 and P(X = 0) = 1/4, P(X = 1) = 1/2, P(X = 2) = 1/4.

Note that events of the type (X = x) are subject to the principles of probability established earlier, and will provide us with a way of systematically exploring the behavior of random variables. In particular, the first two principles in the context of probability distributions of random variables will now be stated.

Any probability distribution of a discrete random variable must satisfy:

- 0≤P(X=x)≤1
- ∑xP(X=x)=1

The probability distribution for two flips of a coin was simple enough to construct at once. For more complicated random experiments, it is common to first construct a table of all the outcomes in S and their probabilities, then use the addition principle to condense that information into the actual probability distribution table.

## Example: Flipping a Coin Three Times

A coin is tossed three times. Let the random variable X be the number of tails. Find the probability distribution of X. We’ll follow the same reasoning we used in the previous example:

First, we specify the 8 possible outcomes in S, along with the number and the probability of that outcome. (Because they are all equally likely, each has probability 1/8. Alternatively, by the multiplication principle, each particular sequence of three coin faces has probability 1/2 * 1/2 * 1/2 = 1/8.)

Next, we figure out what the value of X is (number of tails) for each possible outcome.

Next, we use the addition principle to assert that

P(X = 1) = P(HHT or HTH or THH) = P(HHT) + P(HTH) + P(THH) = 1/8 + 1/8 + 1/8 = 3/8.

Similarly, P(X = 2) = P(HTT or THT or TTH) = 3/8.

The resulting probability distribution is:

The purpose of the next activity is to give you guided practice in finding the probability distribution of a discrete random variable.

## Scenario: Number of Children

A young couple decides to try to have children until they have a boy. For financial reasons, the couple decides that they are going to stop trying when they have three children, whether they have a boy or not. (We are assuming that having a boy or a girl is equally likely, and that the child’s gender in each birth is independent of the gender in the other births.)

Let the random variable X be the number of children the couple has.

Our goal is to find the probability distribution of X. In other words, we would like to create a table that lists all the possible values of X and the corresponding probabilities. We’ll follow the same steps we followed in the two examples we solved.

# Probability Histograms

In the previous two examples and activity, we needed to specify the probability distributions ourselves, based on the physical circumstances of the situation. In some situations, as in the following example, the probability distribution may be specified with an algebraic formula. Such a formula must be consistent with the constraints imposed by the laws of probability, so that the probability of each outcome must be between 0 and 1, and the probabilities of all possible outcomes together must sum to 1.

## Example: Formulas to Define Random Variables

A random variable X has a probability distribution of

**P(X = x) = (x + 2) / 25 for x = 1, 2, 3, 4, 5.**

Show the probability distribution in a table, and verify that the above requirements are satisfied.

Substituting x = 1, 2, 3, 4, and 5, respectively, into the formula for P(X = x), we have

Clearly, each probability is between 0 and 1. Also, the probabilities sum to (3 + 4 + 5 + 6 + 7) / 25 = 25/25 = 1.

### Probability Histograms

We learned to display the distribution of sample values for a quantitative variable with a histogram in which the horizontal axis represented the range of values in the sample. The vertical axis represented the frequency or relative frequency (sometimes given as a percentage) of sample values occurring in that interval. So the width of each rectangle in the histogram was an interval, or part of the possible values for the quantitative variable, and the height of each rectangle was the frequency (or relative frequency) for that interval.

Similarly, we can display the probability distribution of a random variable with a probability histogram. The horizontal axis represents the range of all possible values of the random variable, and the vertical axis represents the probabilities of those values.

Here is the probability histogram for the previous example:

### Area of a Probability Histogram

Notice that each rectangle in the histogram has a width of 1 unit. The height of each rectangle is the probability that it will occur. Thus, the area of each rectangle is base times height, which for these rectangles is 1 times its probability for each value of X. This means that the sum of the areas of all of the rectangles is the same as the sum of all of the probabilities. Therefore, the total area = 1.