How Azure Machine Learning works: Architecture and concepts
The machine learning model workflow generally follows this sequence:
- Develop machine learning training scripts in Python or with the visual designer.
- Create and configure a compute target.
- Submit the scripts to the configured compute target to run in that environment. During training, the scripts can read from or write to datastore. And the records of execution are saved as runs in the workspace and grouped under experiments.
- Package – After a satisfactory run is found, register the persisted model in the model registry.
- Validate – Query the experiment for logged metrics from the current and past runs. If the metrics don’t indicate a desired outcome, loop back to step 1 and iterate on your scripts.
- Deploy – Develop a scoring script that uses the model and Deploy the model as a web service in Azure, or to an IoT Edge device.
- Monitor – Monitor for data drift between the training dataset and inference data of a deployed model. When necessary, loop back to step 1 to retrain the model with new training data.
Tools for Azure Machine Learning
Use these tools for Azure Machine Learning:
- Interact with the service in any Python environment with the Azure Machine Learning SDK for Python.
- Interact with the service in any R environment with the Azure Machine Learning SDK for R.
- Automate your machine learning activities with the Azure Machine Learning CLI.
- Use Azure Machine Learning designer (preview) to perform the workflow steps without writing code.
An activity represents a long running operation. The following operations are examples of activities:
- Creating or deleting a compute target
- Running a script on a compute target
Activities can provide notifications through the SDK or the web UI so that you can easily monitor the progress of these operations.
An Azure Machine Learning compute instance (formerly Notebook VM) is a fully managed cloud-based workstation that includes multiple tools and environments installed for machine learning. Compute instances can be used as a compute target for training and inferencing jobs.
A compute target lets you specify the compute resource where you run your training script or host your service deployment. This location may be your local machine or a cloud-based compute resource.
Azure Machine Learning Datasets (preview) make it easier to access and work with your data. Datasets manage data in various scenarios such as model training and pipeline creation. Using the Azure Machine Learning SDK, you can access underlying storage, explore data, and manage the life cycle of different Dataset definitions.
Datasets provide methods for working with data in popular formats, such as using
A datastore is a storage abstraction over an Azure storage account. The datastore can use either an Azure blob container or an Azure file share as the back-end storage. Each workspace has a default datastore, and you can register additional datastores. Use the Python SDK API or the Azure Machine Learning CLI to store and retrieve files from the datastore.
An endpoint is an instantiation of your model into either a web service that can be hosted in the cloud or an IoT module for integrated device deployments.
Web service endpoint
When deploying a model as a web service the endpoint can be deployed on Azure Container Instances, Azure Kubernetes Service, or FPGAs. You create the service from your model, script, and associated files. These are placed into a base container image which contains the execution environment for the model. The image has a load-balanced, HTTP endpoint that receives scoring requests that are sent to the web service.
Azure helps you monitor your web service by collecting Application Insights telemetry or model telemetry, if you’ve chosen to enable this feature. The telemetry data is accessible only to you, and it’s stored in your Application Insights and storage account instances.
IoT module endpoints
A deployed IoT module endpoint is a Docker container that includes your model and associated script or application and any additional dependencies. You deploy these modules by using Azure IoT Edge on edge devices.
If you’ve enabled monitoring, Azure collects telemetry data from the model inside the Azure IoT Edge module. The telemetry data is accessible only to you, and it’s stored in your storage account instance.
Azure IoT Edge ensures that your module is running, and it monitors the device that’s hosting it.
Azure ML Environments are used to specify the configuration (Docker / Python / Spark / etc.) used to create a reproducible environment for data preparation, model training and model serving. They are managed and versioned entities within your Azure Machine Learning workspace that enable reproducible, auditable, and portable machine learning workflows across different compute targets.
You can use an environment object on your local compute to develop your training script, reuse that same environment on Azure Machine Learning Compute for model training at scale, and even deploy your model with that same environment.
You use machine learning pipelines to create and manage workflows that stitch together machine learning phases. For example, a pipeline might include data preparation, model training, model deployment, and inference/scoring phases. Each phase can encompass multiple steps, each of which can run unattended in various compute targets.
Pipeline steps are reusable, and can be run without rerunning subsequent steps if the output of that step hasn’t changed. For example, you can retrain a model without rerunning costly data preparation steps if the data hasn’t changed. Pipelines also allow data scientists to collaborate while working on separate areas of a machine learning workflow.
A run is a single execution of a training script. Azure Machine Learning records all runs and stores the following information:
- Metadata about the run (timestamp, duration, and so on)
- Metrics that are logged by your script
- Output files that are autocollected by the experiment or explicitly uploaded by you
- A snapshot of the directory that contains your scripts, prior to the run
You produce a run when you submit a script to train a model. A run can have zero or more child runs. For example, the top-level run might have two child runs, each of which might have its own child run.
A run configuration is a set of instructions that defines how a script should be run in a specified compute target. The configuration includes a wide set of behavior definitions, such as whether to use an existing Python environment or to use a Conda environment that’s built from a specification.
A run configuration can be persisted into a file inside the directory that contains your training script, or it can be constructed as an in-memory object and used to submit a run.
When you submit a run, Azure Machine Learning compresses the directory that contains the script as a zip file and sends it to the compute target. The zip file is then extracted, and the script is run there. Azure Machine Learning also stores the zip file as a snapshot as part of the run record. Anyone with access to the workspace can browse a run record and download the snapshot.