Machine Learning Applications for Big Data(Regression):

Machine Learning:

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

Big Data

Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too bigor it moves too fast or it exceeds current processing capacity.

So basically machine learning is learning from data and big data is data in it’s Shiva (huge) avatar. Apart from dummy projects in real life this two technologies are complementary to each others.When you have huge data it is expected that some sort of automated learning needs to be done to recognise pattern in data and to learn pattern in a well formatted way you needs to have big data . We will discuss this here.

So the question is why ML?

Different Types Of Learning:

ML At scale:

The most Urgent reason:

overfitting is common for small dataset

let’s discuss this with your example now suppose you as a novice in machine learning understand only Linear Regression as ML now you can answer all questions from R2 to RMSE regarding linear regression because your training happens only on linear regression topic AKA regression data point . Now you went for a certification programme where they ask you questions from clustering what do you think what will happen you will be failed as you have only been introduced to RMSE so when a question like find the most suitable k value pops up you will loose your cool and overfit on exam . So you need to study a lot to get good score AKA Big data.

next reason is automated feature selection:

simple model vs complex model :what will be the price of gold in next week . A simple model from past will tell you it will remain same for most of the time but when you in production want to build a robust model about this parameter like location,datetime,population,share market,demand,production etc feature will make your simple model more complicated. To automate this features you will use deep networks in turns you need big data.

So what we will talk today, today we will talk about spark MLlib with Example for this big data ML which is real problem to solve . If you are not using cloud and want to manipulate your data and make a model on data that does not fit to your machine cluster you have to use spark MLlib for ML.

do you need to worry of course no because mllib is inspired by scikit-learn

one important thing in MLlib is transformer , that transform one dataframe into another in high level

an example

Another important thing is Estimator , it is required to be trained on data before usage

Its job is to fit . It tooks input data frame and output a transformer.

Another important thing is pipeline


Now to get best fit or optimal regression solution we can use 3 solutions:

1.Analytical Solutions:

2.Gradient Decent:

we cover stochastic gradient decent later.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s