Running Spark on Azure Databricks

Who Should See This Blog Post?

This course is intended for the people who wants to Run their Analytics workload and Machine Learning workload on DataBricks And chosen Azure as their distributor.

Skills Required To Follow:

  • SQL
  • Machine Learning

Why we need DataBricks and Azure services ….. well we can train one model on DataBricks and deploy it on Azure Services

Before we can run a spark job we needs to create a computer cluster and before we create a computer cluster we need to create a DataBricks Workspace.All the work done in DataBricks need not to be done within Azure we need Azure if we import data from Azure and wantto deploy our services.

To create a databricks account in Azure it will be looked like this :

And if you are using only DataBricks account it will be looked like this:

The log in page in Azure looks like this:

And in only databricks it is looks like this

First create a cluster for spark , on the left side click on clusters than click on create cluster then fill the required fills a snapshot of it is shown here

Now we have our cluster so we can start working straight away with it . we will work with DataBricks NoteBook which is similar to Jupyter NoteBooks or datalab in GCP. it resides on the workspace . Go there click on the drop down menu and click on create NoteBook option there.

name it test in language select SQL click create.

As name suggests Azure DataBricks integrated with other data services provided by Azure

so if you have data resides on this services you can use it directly with DataBricks

As this is a demo tutorial we will use demo file resides on databricks to get that type %fs ls.here %fs to let notebook understand that we are trying to execute bash scripts and ls to list a file system

command you needs to type to get a table to run SQL query given below:

%fs ls
%fs ls databricks-datasets
%fs head --maxBytes=1000 dbfs:/databricks-datasets/Rdatasets/data-001/csv/Ecdat/Computers.csv

DROP TABLE IF EXISTS computers;

CREATE TABLE computers
  USING csv
  OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/Ecdat/Computers.csv", header "true", inferSchema "true")

Now run SQL to enrich your SQL knowledge

Now to build a Machine learning model visit the link

You can also run jobs every scheduled time to see any changes in your database in job portal.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s