How To Think Like A Data Scientist

A data scientist is some one who will take your data do some data work and show you insights or to be fair the relationship between different parameter of your data, here then you can use this insights to make some of your work prediction by appling ML models and that is what a data scientist will do in a very high level . Now what are the different aspects of data work in an organization .You will require a data Engineer who will build the data pipeline , a data scientist who will work on finding insights at least in your data team. So with out further ado let’s build one simple project from intuitive approach .

For tutorial purpose we are choosing Titanic dataset a very popular dataset on kaggle . Where you as a data scientist will be predicting whether if you have all the information about a person who is travelling in Titanic can survive or not. The competition page looks like this :

So we will not just work here we will work for our own start-up named “Life Saviour” . With a tag line that god can save your life but we can predict . And Eureka on the very first day your client is a travel agency with a data set of titanic to get information about how it is likely to survive so that they can improve their service and like other virtual companies you have also taken the money in advance and hire me in your data science team so now it’s my job to please you in this post .

For very baseline intuition by understanding the serario of the problem I told you that people who are poor and people who are male and aged are likely not to survive in the battle but as a data CEO you give me a hard time and asked for proof of concept so to show yo my skills I do the following:

on very upper level this is how our data looks like:

to get an edge about the parameters please visit this page.

we can plot our intuition describe above by a single plot below:

see the biggest bar was passenger class 3 which is for poor people and male section. so we were right about our intuition so now we will further explore the data

if we do target analysis we will have a plot like this :

This shows majority of the people didn’t survived the phase which is why it is a disaster. A more clear picture can be seen in this plot

There is a 60-40 ratio on saving but is it happening uniformly through out the Titanic or the 60 % is from one class and 40% from other lets explore more .

If we see the Age distribution According to survival the picture does not remain clear may some how tries to potrate a uniform observation

See the survived group Age distribution is normal but the died party had a right skewed distribution means the younger generation dies in much larger quantity than the older which is practical as the younger generation travels more .

Now if we tries to see a picture of Age and Money combination with survival that reveals a quite interesting story to us

The number of survival in the rich section is more and it is more or less uniform across the age and the number of died is more in poor section and mostly young aged poor people dies most in this mishap … if you are young and having less money in your pocket our team would recommend you to travel by alternate scope and not Titanic.

Now we want to see gender bias in survival is females really gets more chance of survival in all classes

This picture clearly states that if you are a woman and with money then and only then you have a high chance of survival but if you travel via 3rd class your chance is definitely more than male companion but not as likely as your rich friends. This shows how we actually discriminate people based on their assets and post fake motivation on social media.

When everything is moving around money let’s find out what is the distribution of money around different ages:

You can see the clear trend in the boxplot to save life no matter what is your age group you have to spend more money than your fellows but one interesting fact this is not true in case of childs and that is probably if you see the movie when they got lifeboats they also have to survive the time till police reaches the place . And those are mature persons can spend more money on tickets so make a friend in the age group of 40-60 if you want to travel and survived . But can you see there are some rich kids who spent the most money on tickets and not a mature person.

Now if we want to understand the same money distribution according to gender we will get plot like this

see almost in every section womans are the clearer majority in terms of spending no matter whether if they survive or not.And the person who were young and spent most money to buy tickets are likely to be couple if you see the young section here.

Now in terms of passenger class obviously the pclass 1 is likely to get more money than others:

Till now did you get a vive that you are poor I am definitely getting and now I hate to explore data I will become a front-End developer always happy and Angular.

Now according to data the overall survival number for class 1 and 3 are mostly same but the difference is in total number in each class and if you are in class 1 your survival is around 70% and if you are in class 2 it decreases upto 45% and in 3 dips down to 25%. This can be our basic model only based on what class you are travelling it is more or less same with a bike accident where comparison happens with your helmet .

Now never travel with a companion who is ultra rich and when there is a shortage of life boats see the figure our ultra rich youth survived the mishap and they say money can’t buy you happiness to those come and see the plot and donate me some if you still belief the old saying .

On general what we can infer from data if you are male don’t travel in this kind of trip and if you are female and poor don’t travel either you think I am creating a biased chart here see to believe and stay at your home or travel in your bicycle but never on titanic.

And atlast one interesting but true fact single people tends to travel more than couples and they say couples are the loneliest persons. see to belief

That’s it for today we will be back with more interesting stories and facts .

To play with data or download notebook :

