How Azure Machine Learning works: Architecture and concepts

How Azure Machine Learning works: Architecture and concepts

Azure Machine Learning architecture and workflow

Workflow

The machine learning model workflow generally follows this sequence:

  1. Train
    • Develop machine learning training scripts in Python or with the visual designer.
    • Create and configure a compute target.
    • Submit the scripts to the configured compute target to run in that environment. During training, the scripts can read from or write to datastore. And the records of execution are saved as runs in the workspace and grouped under experiments.
  2. Package – After a satisfactory run is found, register the persisted model in the model registry.
  3. Validate – Query the experiment for logged metrics from the current and past runs. If the metrics don’t indicate a desired outcome, loop back to step 1 and iterate on your scripts.
  4. Deploy – Develop a scoring script that uses the model and Deploy the model as a web service in Azure, or to an IoT Edge device.
  5. Monitor – Monitor for data drift between the training dataset and inference data of a deployed model. When necessary, loop back to step 1 to retrain the model with new training data.

Tools for Azure Machine Learning

Use these tools for Azure Machine Learning:

Activities

An activity represents a long running operation. The following operations are examples of activities:

  • Creating or deleting a compute target
  • Running a script on a compute target

Activities can provide notifications through the SDK or the web UI so that you can easily monitor the progress of these operations.

An Azure Machine Learning compute instance (formerly Notebook VM) is a fully managed cloud-based workstation that includes multiple tools and environments installed for machine learning. Compute instances can be used as a compute target for training and inferencing jobs.

Compute targets

compute target lets you specify the compute resource where you run your training script or host your service deployment. This location may be your local machine or a cloud-based compute resource.

Azure Machine Learning Datasets (preview) make it easier to access and work with your data. Datasets manage data in various scenarios such as model training and pipeline creation. Using the Azure Machine Learning SDK, you can access underlying storage, explore data, and manage the life cycle of different Dataset definitions.

Datasets provide methods for working with data in popular formats, such as using from_delimited_files() or to_pandas_dataframe().

datastore is a storage abstraction over an Azure storage account. The datastore can use either an Azure blob container or an Azure file share as the back-end storage. Each workspace has a default datastore, and you can register additional datastores. Use the Python SDK API or the Azure Machine Learning CLI to store and retrieve files from the datastore.

Endpoints

An endpoint is an instantiation of your model into either a web service that can be hosted in the cloud or an IoT module for integrated device deployments.

Web service endpoint

When deploying a model as a web service the endpoint can be deployed on Azure Container Instances, Azure Kubernetes Service, or FPGAs. You create the service from your model, script, and associated files. These are placed into a base container image which contains the execution environment for the model. The image has a load-balanced, HTTP endpoint that receives scoring requests that are sent to the web service.

Azure helps you monitor your web service by collecting Application Insights telemetry or model telemetry, if you’ve chosen to enable this feature. The telemetry data is accessible only to you, and it’s stored in your Application Insights and storage account instances.

IoT module endpoints

A deployed IoT module endpoint is a Docker container that includes your model and associated script or application and any additional dependencies. You deploy these modules by using Azure IoT Edge on edge devices.

If you’ve enabled monitoring, Azure collects telemetry data from the model inside the Azure IoT Edge module. The telemetry data is accessible only to you, and it’s stored in your storage account instance.

Azure IoT Edge ensures that your module is running, and it monitors the device that’s hosting it.

Environments

Azure ML Environments are used to specify the configuration (Docker / Python / Spark / etc.) used to create a reproducible environment for data preparation, model training and model serving. They are managed and versioned entities within your Azure Machine Learning workspace that enable reproducible, auditable, and portable machine learning workflows across different compute targets.

You can use an environment object on your local compute to develop your training script, reuse that same environment on Azure Machine Learning Compute for model training at scale, and even deploy your model with that same environment.

ML Pipelines

You use machine learning pipelines to create and manage workflows that stitch together machine learning phases. For example, a pipeline might include data preparation, model training, model deployment, and inference/scoring phases. Each phase can encompass multiple steps, each of which can run unattended in various compute targets.

Pipeline steps are reusable, and can be run without rerunning subsequent steps if the output of that step hasn’t changed. For example, you can retrain a model without rerunning costly data preparation steps if the data hasn’t changed. Pipelines also allow data scientists to collaborate while working on separate areas of a machine learning workflow.

Runs

A run is a single execution of a training script. Azure Machine Learning records all runs and stores the following information:

  • Metadata about the run (timestamp, duration, and so on)
  • Metrics that are logged by your script
  • Output files that are autocollected by the experiment or explicitly uploaded by you
  • A snapshot of the directory that contains your scripts, prior to the run

You produce a run when you submit a script to train a model. A run can have zero or more child runs. For example, the top-level run might have two child runs, each of which might have its own child run.

Run configurations

A run configuration is a set of instructions that defines how a script should be run in a specified compute target. The configuration includes a wide set of behavior definitions, such as whether to use an existing Python environment or to use a Conda environment that’s built from a specification.

A run configuration can be persisted into a file inside the directory that contains your training script, or it can be constructed as an in-memory object and used to submit a run.

Snapshots

When you submit a run, Azure Machine Learning compresses the directory that contains the script as a zip file and sends it to the compute target. The zip file is then extracted, and the script is run there. Azure Machine Learning also stores the zip file as a snapshot as part of the run record. Anyone with access to the workspace can browse a run record and download the snapshot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s