Producing Data From Population

Recall “The Big Picture,” the four-step process that encompasses statistics: data production, exploratory data analysis, probability, and inference. In the previous posts, we considered exploratory data analysis—the discovery of patterns in the raw data. First we need to choose the individuals from the population that will be included in the sample. Then, once we have chosen the individuals, we need to collect data from them. The first stage is called sampling, and the second stage is called study design. As we have seen, exploratory data analysis seeks to illuminate patterns in the data by summarizing the distributions of quantitative or categorical variables, … Continue reading Producing Data From Population

Sticky post

Causation and Lurking Variables With simpson’s paradox

The one and only principle rule in statistics is Principle:Association does not imply causation! The scatterplot below illustrates how the number of firefighters sent to fires (X) is related to the amount of damage caused by fires (Y) in a certain city. The scatterplot clearly displays a fairly strong (slightly curved) positive relationship between the two variables. Would it, then, be reasonable to conclude that sending more firefighters to a fire causes more damage, or that the city should send fewer firefighters to a fire, in order to decrease the amount of damage done by the fire? Of course not! So what is going … Continue reading Causation and Lurking Variables With simpson’s paradox

Data Visualization With ggplot2 :Understanding the grammar and practical approach

Data visualization is far most important thing in your data science or data analytics journey. It is the visualization that attract the viewers to see your work that impress the shareholder to invest and the authority to give a positive review towards your work but correct representation of data is not that simple not only you required to have a solid foundation on visualization tools but you needs to keep an eye on the variables you used , understanding the relationship between and foremost understanding the visualization graphics to establish your finding. In this post I am going to cover … Continue reading Data Visualization With ggplot2 :Understanding the grammar and practical approach

Understand Basic to Advance Data Structure Used in R to use it Efficiently

Understand Basic to Advance Data Structure Used in R to use Efficiently Data structures You’ve probably used many (if not all) of them before, but you may not have thought deeply about how they are interrelated. In this brief overview, I’ll show you how they fit together as a whole. If you need more details, you can find them in R’s documentation. R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they’re homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise … Continue reading Understand Basic to Advance Data Structure Used in R to use it Efficiently