Introduction to Normal Random Variables: Overview

In the Exploratory Data Analysis sections of this course, we encountered data sets, such as lengths of human pregnancies, whose distributions naturally followed a symmetric unimodal bell shape, bulging in the middle and tapering off at the ends. Many variables, such as pregnancy lengths, shoe sizes, foot lengths, and other human physical characteristics exhibit these properties: symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean; the bell-shape indicates that values closer to the mean are more likely, and it … Continue reading Introduction to Normal Random Variables: Overview

Predict prices using regression with ML.NET

Create a console application Create a .NET Core Console Application called “TaxiFarePrediction”. Create a directory named Data in your project to store the data set and model files. Install the Microsoft.ML NuGet Package:In Solution Explorer, right-click the project and select Manage NuGet Packages. Choose “” as the Package source, select the Browse tab, search for Microsoft.ML, select the package in the list, and select the Install button. Select the OK button on the Preview Changesdialog and then select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed. Do the same for the Microsoft.ML.FastTree Nuget package. Prepare and understand the data Download the taxi-fare-train.csv and the taxi-fare-test.csv data sets and save them to the Datafolder you’ve created … Continue reading Predict prices using regression with ML.NET

Hypothesis Testing: Introduction

We are now moving to the other kind of inference, hypothesis testing. We say that hypothesis testing is “the other kind” because, unlike the inferential methods we presented so far, where the goal was estimating the unknown parameter, the idea, logic and goal of hypothesis testing are quite different. In the first part of this section we will discuss the idea behind hypothesis testing, explain how it works, and introduce new terminology that emerges in this form of inference. The next two parts will be more specific and will discuss hypothesis testing for the population proportion (p), and for the population mean (μ). General … Continue reading Hypothesis Testing: Introduction

The Big Picture: Inference

Recall again the Big Picture, the four-step process that encompasses statistics: data production, exploratory data analysis, probability, and inference. We are about to start the fourth part of the process and the final section of this course, where we draw on principles learned in the other units (exploratory data analysis, producing data, and probability) in order to accomplish what has been our ultimate goal all along: use a sample to infer (or draw conclusions) about the population from which it was drawn. The specific form of inference called for depends on the type of variables involved—either a single categorical or quantitative … Continue reading The Big Picture: Inference

Binary Imbalanced Learning A practical Approach in R

 Introduction and motivation Binary classification problem is arguably one of the simplest and most straightforward problems in Machine Learning. Usually we want to learn a model trying to predict whether some instance belongs to a class or not. It has many practical applications ranging from email spam detection to medical testing (determine if a patient has a certain disease or not). Slightly more formally, the goal of binary classification is to learn a function f(x) that map x (a vector of features for an instance/example) to a predicted binary outcome ŷ (0 or 1). Most classification algorithms, such as logistic regression, Naive Bayes and decision trees, … Continue reading Binary Imbalanced Learning A practical Approach in R

Conditional Probability and Independence Introduction

Introduction In the last section, we established the five basic rules of probability, which include the two restricted versions of the Addition Rule and Multiplication Rule: The Addition Rule for Disjoint Events and the Multiplication Rule for Independent Events. We have also established a General Addition Rule for which the events need not be disjoint. In order to complete our set of rules, we still require a General Multiplication Rule for which the events need not be independent. In order to establish such a rule, however, we first need to understand the important concept of conditional probability. This section will be organized as follows: We’ll first … Continue reading Conditional Probability and Independence Introduction

Probability Rules

Basic Probability Rules In the previous section we considered situations in which all the possible outcomes of a random experiment are equally likely, and learned a simple way to find the probability of any event in this special case. We are now moving on to learn how to find the probability of events in the general case (when the possible outcomes are not necessarily equally likely), using five basic probability rules. Fortunately, these basic rules of probability are very intuitive, and as long as they are applied systematically, they will let us solve more complicated problems; in particular, those problems … Continue reading Probability Rules

Sticky post

How To Distribute Sample

Sampling Distributions Introduction Already on several occasions we have pointed out the important distinction between a population and a sample. In Exploratory Data Analysis, we learned to summarize and display values of a variable for a sample, such as displaying the blood types of 100 randomly chosen U.S. adults using a pie chart, or displaying the heights of 150 males using a histogram and supplementing it with the sample mean (X¯) and sample standard deviation (S). In our study of Probability and Random Variables, we discussed the long-run behavior of a variable, considering the population of all possible values taken by that variable. For example, we … Continue reading How To Distribute Sample

Sticky post

TensorFlow Machine Learning on the Amazon Deep Learning AMI

TensorFlow is a popular framework used for machine learning. The Amazon Deep Learning AMI comes bundled with everything you need to start using TensorFlow from development through to production. In this post, you will develop, visualize, serve, and consume a TensorFlow machine learning model using the Amazon Deep Learning AMI.  Objectives Upon completion of this post you will be able to: Create machine learning models in TensorFlow Visualize TensorFlow graphs and the learning process in TensorBoard Serve trained TensorFlow models with TensorFlow Serving Create clients that consume served TensorFlow models, all with the Amazon Deep Learning AMI Prerequisites You should be familiar … Continue reading TensorFlow Machine Learning on the Amazon Deep Learning AMI

Sticky post

Probability A short story

Sample Spaces As we saw in the previous section, probability questions arise when we are faced with a situation that involves uncertainty. Such a situation is called a random experiment, an experiment that produces an outcome that cannot be predicted in advance (hence the uncertainty). Here are a few examples of random experiments: Toss a coin once and record whether you get heads (H) or tails (T). The possible outcomes that this random experiment can produce are: {H, T}. Toss a coin twice. The possible outcomes that this random experiment can produce are: {HH, HT, TH, TT}. Toss a coin 3 … Continue reading Probability A short story