Sticky post

A/B testing

The A/B test (also known as a randomised controlled trial, or RCT, in the other sciences) is a powerful tool for product development. some motivations: With the rise of digital marketing led by tools including Google Analytics, Google Adwords, and Facebook Ads, a key competitive advantage for businesses is using A/B testing to determine effects of digital marketing efforts. Why? In short, small changes can have big effects. This is why A/B testing is a huge benefit. A/B Testing enables us to determine whether changes in landing pages, popup forms, article titles, and other digital marketing decisions improve conversion rates … Continue reading A/B testing

Sticky post

Binomial Random Variables: Introduction

Binomial Random Variables So far, in our discussion about discrete random variables, we have been introduced to: The probability distribution, which tells us which values a variable takes, and how often it takes them. The mean of the random variable, which tells us the long-run average value that the random variable takes. The standard deviation of the random variable, which tells us a typical (or long-run average) distance between the mean of the random variable and the values it takes. We will now introduce a special class of discrete random variables that are very common, because as you’ll see, they … Continue reading Binomial Random Variables: Introduction

Introduction to Normal Random Variables: Overview

In the Exploratory Data Analysis sections of this course, we encountered data sets, such as lengths of human pregnancies, whose distributions naturally followed a symmetric unimodal bell shape, bulging in the middle and tapering off at the ends. Many variables, such as pregnancy lengths, shoe sizes, foot lengths, and other human physical characteristics exhibit these properties: symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean; the bell-shape indicates that values closer to the mean are more likely, and it … Continue reading Introduction to Normal Random Variables: Overview

The Big Picture: Inference

Recall again the Big Picture, the four-step process that encompasses statistics: data production, exploratory data analysis, probability, and inference. We are about to start the fourth part of the process and the final section of this course, where we draw on principles learned in the other units (exploratory data analysis, producing data, and probability) in order to accomplish what has been our ultimate goal all along: use a sample to infer (or draw conclusions) about the population from which it was drawn. The specific form of inference called for depends on the type of variables involved—either a single categorical or quantitative … Continue reading The Big Picture: Inference

Conditional Probability and Independence Introduction

Introduction In the last section, we established the five basic rules of probability, which include the two restricted versions of the Addition Rule and Multiplication Rule: The Addition Rule for Disjoint Events and the Multiplication Rule for Independent Events. We have also established a General Addition Rule for which the events need not be disjoint. In order to complete our set of rules, we still require a General Multiplication Rule for which the events need not be independent. In order to establish such a rule, however, we first need to understand the important concept of conditional probability. This section will be organized as follows: We’ll first … Continue reading Conditional Probability and Independence Introduction

Sticky post

How To Distribute Sample

Sampling Distributions Introduction Already on several occasions we have pointed out the important distinction between a population and a sample. In Exploratory Data Analysis, we learned to summarize and display values of a variable for a sample, such as displaying the blood types of 100 randomly chosen U.S. adults using a pie chart, or displaying the heights of 150 males using a histogram and supplementing it with the sample mean (X¯) and sample standard deviation (S). In our study of Probability and Random Variables, we discussed the long-run behavior of a variable, considering the population of all possible values taken by that variable. For example, we … Continue reading How To Distribute Sample

Sticky post

Probability A short story

Sample Spaces As we saw in the previous section, probability questions arise when we are faced with a situation that involves uncertainty. Such a situation is called a random experiment, an experiment that produces an outcome that cannot be predicted in advance (hence the uncertainty). Here are a few examples of random experiments: Toss a coin once and record whether you get heads (H) or tails (T). The possible outcomes that this random experiment can produce are: {H, T}. Toss a coin twice. The possible outcomes that this random experiment can produce are: {HH, HT, TH, TT}. Toss a coin 3 … Continue reading Probability A short story

Producing Data From Population

Recall “The Big Picture,” the four-step process that encompasses statistics: data production, exploratory data analysis, probability, and inference. In the previous posts, we considered exploratory data analysis—the discovery of patterns in the raw data. First we need to choose the individuals from the population that will be included in the sample. Then, once we have chosen the individuals, we need to collect data from them. The first stage is called sampling, and the second stage is called study design. As we have seen, exploratory data analysis seeks to illuminate patterns in the data by summarizing the distributions of quantitative or categorical variables, … Continue reading Producing Data From Population

Sticky post

Causation and Lurking Variables With simpson’s paradox

The one and only principle rule in statistics is Principle:Association does not imply causation! The scatterplot below illustrates how the number of firefighters sent to fires (X) is related to the amount of damage caused by fires (Y) in a certain city. The scatterplot clearly displays a fairly strong (slightly curved) positive relationship between the two variables. Would it, then, be reasonable to conclude that sending more firefighters to a fire causes more damage, or that the city should send fewer firefighters to a fire, in order to decrease the amount of damage done by the fire? Of course not! So what is going … Continue reading Causation and Lurking Variables With simpson’s paradox