# Producing Data From Population

Recall “The Big Picture,” the four-step process that encompasses statistics: data production, exploratory data analysis, probability, and inference. In the previous posts, we considered exploratory data analysis—the discovery of patterns in the raw data. First we need to choose the individuals from the population that will be included in the sample. Then, once we have chosen the individuals, we need to collect data from them. The first stage is called sampling, and the second stage is called study design. As we have seen, exploratory data analysis seeks to illuminate patterns in the data by summarizing the distributions of quantitative or categorical variables, … Continue reading Producing Data From Population

Sticky post

# Causation and Lurking Variables With simpson’s paradox

The one and only principle rule in statistics is Principle:Association does not imply causation! The scatterplot below illustrates how the number of firefighters sent to fires (X) is related to the amount of damage caused by fires (Y) in a certain city. The scatterplot clearly displays a fairly strong (slightly curved) positive relationship between the two variables. Would it, then, be reasonable to conclude that sending more firefighters to a fire causes more damage, or that the city should send fewer firefighters to a fire, in order to decrease the amount of damage done by the fire? Of course not! So what is going … Continue reading Causation and Lurking Variables With simpson’s paradox

# Relations A statistical Approach

In most studies involving two variables, each of the variables has a role. We distinguish between: the explanatory variable (also commonly referred to as the independent variable)-—the variable that claims to explain, predict or affect the response; and the response variable (also commonly referred to … Continue reading Relations A statistical Approach

# Statistical Measures: An Introduction

Boxplot: The Five Number Summary Introduction Before we move on to the third measure of spread (standard deviation), we’ll summarize what we’ve learned so far about measuring spread and use it to introduce another graphical display of the distribution of … Continue reading Statistical Measures: An Introduction

Sticky post

# The Big Picture of Statistics

this post is inspired by stanford.edu The process of statistics starts when we identify what group we want to study or learn something about. We call this group the population. Note that the word populationhere (and in the entire course) does not … Continue reading The Big Picture of Statistics

# Data Visualization With ggplot2 :Understanding the grammar and practical approach

Data visualization is far most important thing in your data science or data analytics journey. It is the visualization that attract the viewers to see your work that impress the shareholder to invest and the authority to give a positive review towards your work but correct representation of data is not that simple not only you required to have a solid foundation on visualization tools but you needs to keep an eye on the variables you used , understanding the relationship between and foremost understanding the visualization graphics to establish your finding. In this post I am going to cover … Continue reading Data Visualization With ggplot2 :Understanding the grammar and practical approach

# Understand Basic to Advance Data Structure Used in R to use it Efficiently

Understand Basic to Advance Data Structure Used in R to use Efficiently Data structures You’ve probably used many (if not all) of them before, but you may not have thought deeply about how they are interrelated. In this brief overview, I’ll show you how they fit together as a whole. If you need more details, you can find them in R’s documentation. R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they’re homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise … Continue reading Understand Basic to Advance Data Structure Used in R to use it Efficiently