Conditional Probability and Independence Introduction


In the last section, we established the five basic rules of probability, which include the two restricted versions of the Addition Rule and Multiplication Rule: The Addition Rule for Disjoint Events and the Multiplication Rule for Independent Events. We have also established a General Addition Rule for which the events need not be disjoint. In order to complete our set of rules, we still require a General Multiplication Rule for which the events need not be independent. In order to establish such a rule, however, we first need to understand the important concept of conditional probability.

This section will be organized as follows: We’ll first introduce the idea of conditional probability, and use it to formalize our definition of independent events, which in the first module was presented only in an intuitive way. We will then develop the General Multiplication Rule, a rule that will tell us how to find P(A and B) in cases when the events A and B are not necessarily independent. We’ll conclude with a discussion of probability trees, a method of displaying conditional probability visually that is very helpful in solving problems.

Another way to visualize conditional probability is using a Venn diagram:

A Venn Diagram, in which a large rectangle represents all of the sample space. There are two circles in the rectangle, labeled M (for Male) and E (for Ear Pierced). Circle M and circle E overlap (but not totally). P(M) = 180/500 = .36, so this is somewhat like the area of circle M. The overlap is the event M and E. P(M and E) = 36/500 = .072, which is also like the area of the overlap area.

In both the two-way table and the Venn diagram, the reduced sample space (comprised of only males) is shaded light green, and within this sample space, the event of interest (having ears pierced) is shaded darker green. The two-way table illustrates the idea via counts, while the Venn diagram converts the counts to probabilities, which are presented as regions rather than cells.

We may work with counts, as presented in the two-way table, to write

P(E | M) = 36/180.

Or we can work with probabilities, as presented in the Venn diagram, by writing

P(E | M) = (36/500) / (180/500).

We will want, however, to write our formal expression for conditional probabilities in terms of other, ordinary, probabilities and therefore the definition of conditional probability will grow out of the Venn diagram.

Notice that

P(E | M) = (36/500) / (180/500) = P(M and E) / P(M). Generalized, we have a formal definition of conditional probability:conditional probability(definition)

The conditional probability of event B, given event A, is P(B | A) = P(A and B) / P(A)


1. Note that when we evaluate the conditional probability, we always divide by the probability of the given event. The probability of both goes in the numerator.

2. The above formula holds as long as P(A) > 0, since we cannot divide by 0. In other words, we should not seek the probability of an event given that an impossible event has occurred.

Let’s see how we can use this formula in practice:


On the “Information for the Patient” label of a certain antidepressant, it is claimed that based on some clinical trials, there is a 14% chance of experiencing sleeping problems known as insomnia (denote this event by I), there is a 26% chance of experiencing headache (denote this event by H), and there is a 5% chance of experiencing both side effects (I and H).

(a) Suppose that the patient experiences insomnia; what is the probability that the patient will also experience headache?

Since we know (or it is given) that the patient experienced insomnia, we are looking for P(H | I). According to the definition of conditional probability:

P(H | I) = P(H and I) / P(I) = 0.05/0.14 = 0.357.

(b) Suppose the drug induces headache in a patient; what is the probability that it also induces insomnia?

Here, we are given that the patient experienced headache, so we are looking for P(I | H).

Using the definition P(I | H) = P(I and H) / P(H) = 0.05/0.26 = 0.1923.


Note that the answers to (a) and (b) above are different. In general, P(A | B) does not equal P(B | A). We’ll come back and illustrate this point later in this module.

The purpose of the following activity is to give you guided practice in using the definition of conditional probability, and teach you how the Complement Rule works with conditional probability.

Compare P(B | A) and P(B)

As we saw in the Exploratory Data Analysis section, whenever a situation involves more than one variable, it is generally of interest to determine whether or not the variables are related. In probability, we talk about independent events, and in the first module we said that two events A and B are independent if event A occurring does not affect the probability that event B will occur. Now that we’ve introduced conditional probability, we can formalize the definition of independence of events and develop four simple ways to check whether two events are independent or not. We will introduce these “independence checks” using examples, and then summarize.

Consider again the two-way table for all 500 students in a particular high school, classified according to gender and whether or not they have one or both ears pierced.

GenderPiercedNot PiercedTotal

Would you expect those two variables to be related? That is, would you expect having pierced ears to depend on whether the student is male or female? Or, to put it yet another way, would knowing a student’s gender affect the probability that the student’s ears are pierced? To answer this, we may compare the overall probability of having pierced ears to the conditional probability of having pierced ears, given that a student is male. Our intuition would tell us that the latter should be lower: male students tend not to have their ears pierced, whereas female students do. Indeed, for students in general, the probability of having pierced ears (event E) is P(E) = 324/500 = 0.648. But the probability of having pierced ears given that a student is male is only P(E | M) = 36/180 = 0.20.

As we anticipated, P(E | M) is lower than P(E). The probability of a student having pierced ears changes (in this case, gets lower) when we know that the student is male, and therefore the events E and M are dependent. (If E and M were independent, knowing or not knowing that the student is male would not have made a difference … but it did.)

This example illustrates that one method for determining whether two events are independent is to compare P(B | A) and P(B).

If the two are equal (i.e., knowing or not knowing whether A has occurred has no effect on the probability of B occurring) then the two events are independent. Otherwise, if the probability changes depending on whether we know that A has occurred or not, then the two events are not independent. Similarly, using the same reasoning, we can compare P(A | B) and P(A).

The General Multiplication Rule: Defined

Now that we have an understanding of conditional probabilities and can express them with concise notation, and have a more formal understanding of what it means for two events to be independent, we can finally establish the General Multiplication Rule, a formal rule for finding P(A and B) that applies to any two events, whether they are independent or dependent.

Probability Trees: Defined

So far, when two categorical variables are involved, we have displayed counts or probabilities for various events with two-way tables and with Venn diagrams. Another display tool, called a probability tree, is particularly useful for showing probabilities when the events occur in stages and conditional probabilities are involved.


A sales representative tells his friend that the probability of landing a major contract by the end of the week, resulting in a large commission, is .4. If the commission comes through, the probability that he will indulge in a weekend vacation in Bermuda is .9. Even if the commission doesn’t come through, he may still go to Bermuda, but only with probability .3.

First, let’s identify the given probabilities for events involving C (the commission comes through) and V (the sales rep takes a Bermuda vacation):

P(C) = 0.4 [and so P(not C) = 0.6],

P(V | C) = 0.9 [and so P(not V | C) = 0.1], and

P(V | not C) = 0.3 [and so P(not V | not C) = 0.7.]

There are two stages in the problem. First, the sales rep will either get the commission or not.

Second, based on what happened in the first stage, the sales rep will either take the Bermuda vacation or not.

We follow exactly the same reasoning when we build the probability tree.

There are two important things to note here:

1. The probabilities in the first branch-off are non-conditional probabilities P(C) = 0.4, P(not C) = 0.6. However, the probabilities that appear in the second branch-off are conditional probabilities. The top two branches assume that C occurred: P(V | C) = 0.9, P(not V | C) = 0.1. The bottom two branches assume that not C occurred: P(V | not C) = 0.3, P(not V | not C) = 0.7

A probability tree. The bottom of the tree (where there is the trunk) represents the first event (in this case, getting the commission). From there, two branches (lines) are drawn. This is the branch-off. One branch is represents getting the commission and the other represents not getting the commission. Each of these branches lead to a point. At both of these points is the second event (going to a vacation). Each of these points has two branches. These represent either going on vacation or not going on vacation. Notice that this tree has two levels of points, one for each event. We will be naming paths through the tree with the notation "{first branch-off: probability of branch-off; second branch-off: probability of branch-off; ...}". Each branch-off represents the path we take when we need to take a branch, and the probability for taking that branch is provided. This probability tree demonstrates how branches map to probabilities. {C: .4} represents P(C); {not C: .6} represents P(not C); {C: .4; V: .9} represents P(V|C); {C: .4; not V: .1} represents P(not V|C); {not C: .6; V: .3} represents P(V|not C); {not C: .6; not V: .7} represents P(not V|not C);

2. The second thing to note is that probabilities of branches that branch out from the same point always add up to one.

A probability tree demonstrating the rule. For the point intersecting C, we have the branches {C, .4} and {C, .6}. Note that .4 + .6 = 1. For the point intersecting {C: .4, V: .9} and {C: .4, not V: .1}, note that the probabilities for V and not V add up to 1 (.9 + .1 = 1). For the last point intersecting {not C: .6, V: .3} and {not C: .6, not V: .7}, note that the probabilities for V and not V add up to 1 (.3 + .7 = 1)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s