Sticky post

# A/B testing

The A/B test (also known as a randomised controlled trial, or RCT, in the other sciences) is a powerful tool for product development. some motivations: With the rise of digital marketing led by tools including Google Analytics, Google Adwords, and Facebook Ads, a key competitive advantage for businesses is using A/B testing to determine effects of digital marketing efforts. Why? In short, small changes can have big effects. This is why A/B testing is a huge benefit. A/B Testing enables us to determine whether changes in landing pages, popup forms, article titles, and other digital marketing decisions improve conversion rates … Continue reading A/B testing

# The Big Picture: Inference

Recall again the Big Picture, the four-step process that encompasses statistics: data production, exploratory data analysis, probability, and inference. We are about to start the fourth part of the process and the final section of this course, where we draw on principles learned in the other units (exploratory data analysis, producing data, and probability) in order to accomplish what has been our ultimate goal all along: use a sample to infer (or draw conclusions) about the population from which it was drawn. The specific form of inference called for depends on the type of variables involved—either a single categorical or quantitative … Continue reading The Big Picture: Inference

# Binary Imbalanced Learning A practical Approach in R

Introduction and motivation Binary classification problem is arguably one of the simplest and most straightforward problems in Machine Learning. Usually we want to learn a model trying to predict whether some instance belongs to a class or not. It has many practical applications ranging from email spam detection to medical testing (determine if a patient has a certain disease or not). Slightly more formally, the goal of binary classification is to learn a function f(x) that map x (a vector of features for an instance/example) to a predicted binary outcome ŷ (0 or 1). Most classification algorithms, such as logistic regression, Naive Bayes and decision trees, … Continue reading Binary Imbalanced Learning A practical Approach in R

# Designing Studies Introduction

Designing Studies Now that we have learned about the first stage of data production— sampling—we can move on to the next stage—designing studies. If haven’t read about it read from here. Introduction Obviously, sampling is not done for its own sake. After this first stage in the data production process is completed, we come to the second stage, that of gaining information about the variables of interest from the sampled individuals. In this module we’ll discuss three study designs; each design enables you to determine the values of the variables in a different way. You can: – Carry out an observational … Continue reading Designing Studies Introduction

# Producing Data From Population

Recall “The Big Picture,” the four-step process that encompasses statistics: data production, exploratory data analysis, probability, and inference. In the previous posts, we considered exploratory data analysis—the discovery of patterns in the raw data. First we need to choose the individuals from the population that will be included in the sample. Then, once we have chosen the individuals, we need to collect data from them. The first stage is called sampling, and the second stage is called study design. As we have seen, exploratory data analysis seeks to illuminate patterns in the data by summarizing the distributions of quantitative or categorical variables, … Continue reading Producing Data From Population

# Causation and Lurking Variables With simpson’s paradox

The one and only principle rule in statistics is Principle:Association does not imply causation! The scatterplot below illustrates how the number of firefighters sent to fires (X) is related to the amount of damage caused by fires (Y) in a certain city. The scatterplot clearly displays a fairly strong (slightly curved) positive relationship between the two variables. Would it, then, be reasonable to conclude that sending more firefighters to a fire causes more damage, or that the city should send fewer firefighters to a fire, in order to decrease the amount of damage done by the fire? Of course not! So what is going … Continue reading Causation and Lurking Variables With simpson’s paradox

# Relations A statistical Approach

In most studies involving two variables, each of the variables has a role. We distinguish between: the explanatory variable (also commonly referred to as the independent variable)-—the variable that claims to explain, predict or affect the response; and the response variable (also commonly referred to … Continue reading Relations A statistical Approach

# Statistical Measures: An Introduction

Boxplot: The Five Number Summary Introduction Before we move on to the third measure of spread (standard deviation), we’ll summarize what we’ve learned so far about measuring spread and use it to introduce another graphical display of the distribution of … Continue reading Statistical Measures: An Introduction

# Scales Of Measurement:

The four different scales of measurement, from least to most precise, are Nominal Ordinal Interval Ratio Nominal: The nominal scale of measurement is a qualitative measure that uses discrete categories to describe a characteristic of the research participants. For each participant, the researcher determines the presence, absence, and type of the attribute. Nominal scales of measurement may have two categories, such as citizen status (citizen/non-citizen), or they can have more than two categories, like religious affiliation (e.g., Agnostic, Buddhist, Jewish, Muslim) or marital status (e.g., divorced, married, single). Often, as described here, the categories have names; however, researchers code them with numbers … Continue reading Scales Of Measurement: