End To End Scalable Machine Learning Project On Google Cloud With Beautiful Front End with Big Data

As the title suggest we will be using Big data(structured) as our resource , We will use GCP as our platform and we will use python as our saviour and kubernete for deployment and Finally we will have a working Machine learning project of our own.

Note: The code and concept is taken from Google cloud tutorial on coursera

So with out further wasting your time let’s start:

This is what you will use

So why data science projects or ML model with Big data are different than what we used to build on our github to showcase its the memory ……got it again right memory is the key enemy of our growth since our childhood if our data fits in than any model can work but if you have huge data like bank balance of Bill gates or possible combinations of a Quantum Computer then you need a sophisticated model and that is the reason why you need tensorflow .(used not only in Deep learning but also in machine learning @scale).

What you required to have an Effective machine learning system:

coursera

Where companies made mistake often while performing machine learning @scale

  • They tends to get more GPU and memory in a single machine but regrets later as scaling up can never be a solution to big data only scaling out.
  • They tends to build model based on what data fits into there 15 inch banged up macbook pro with shitty stickers on it with only 4 gigs of ram(taken from the mouth of Gilfoyle of HBO’s famous tech sitcom silicon valley).They tends to forget the graph
Image result for graph of data vs model performance
Hacker Noon

So why we use tensorflow because:

  • It can scale to big data .
  • capture many types of feature transformations.
  • Implementations of many kind of model architecture.
Full ML lifecycle

our projects will be consist of :

coursera

Explore The Data:

why structured data we choose, the reason is

coursera

what we will build in this blog post

How our Front-End should look like:

What data should we use:

What our dataset consist of :

I assume that you have free subscription on GCP. So let’s start by Exploring the dataset: by the time you have understood that it is a regression problem so keep that first before you start to build go to your datalab instance and start typing

first set your bucket , project id and Region. it will look like this:

# change these to try this notebook out
BUCKET = 'cloud-training-demos-ml' PROJECT = 'cloud-training-demos' REGION = 'us-central1'

or if you are a bash expert try this

import os
os.environ['BUCKET'] = BUCKET
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION

Now its time to exploring the data to gain insights we will write a sql script that will run on the datalab if you are in bigquery you can run the same and export the result as a table to analysis further.

On datalab it will look like this

In [1]:

# Create SQL query using natality data after the year 2000
query = """
SELECT
  weight_pounds,
  is_male,
  mother_age,
  plurality,
  gestation_weeks,
  ABS(FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING)))) AS hashmonth
FROM
  publicdata.samples.natality
WHERE year > 2000
"""

on bigquery it will look like this:

To use datalab we have to call bigquery service to use its result as data frame by calling the google.datalab.bigquery module then we will use execute the result to store it as dataframe(feeling pandas) to visualize it further.The end result will look like this

weight_poundsis_malemother_agepluralitygestation_weekshashmonth
03.562670True251301403073183891835564
13.999185False301327146494315947640619
27.438397True131347146494315947640619
34.806077True191348904940584331855459
44.812691True223342126480030009879160

Now , Let’s write a query to find the unique values for each of the columns and the count of those values. This is important to ensure that we have enough examples of each data value, and to verify our hunch that the parameter has predictive value.

def get_distinct_values(column_name):
  sql = """
SELECT
  {0},
  COUNT(1) AS num_babies,
  AVG(weight_pounds) AS avg_wt
FROM
  publicdata.samples.natality
WHERE
  year > 2000
GROUP BY
  {0}
  """.format(column_name)
  return bq.Query(sql).execute().result().to_dataframe()

Now lets do some visualization :

In the next blog post i will build ML model on sample data we just collected then in the next post we will have our solution ready to use till than follow the blog to get notification or check out my medium page for some beautiful ML posts here.

If you like the post and want your colleagues or friends to learn the same hit the like button share it on linkedin ,facebook, twitter let’s grow together for a bias free machine-human compiled future.

2 thoughts on “End To End Scalable Machine Learning Project On Google Cloud With Beautiful Front End with Big Data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s