Using Azure AutoML and AML for Assessing Multiple Models and Deployment

Using Azure AutoML and AML for Assessing Multiple Models and Deployment

In this post we’ll be exploring how we can use Azure AutoML in the cloud to assess the performance of multiple regression models in parallel and then deploy the best performing model.

The steps we’ll go through in this post are as follows:

  • Create Workspace
  • Create multi-node Azure Machine Learning compute cluster
  • Use AutoML to train multiple Regression Models in parallel
  • Validate our highest performing model
  • Deploy model in Azure Machine Learning Sevices

In this post we’ll be using the Boston House Price data that’s bundled with scikit-learn to predict the house prices of different neighbourhoods of Boston from different attributes of the neighbourhoods.

Quick disclaimer: At the time of writing, I am currently a Microsoft Employee

Create Workspace

First we create a workspace, this will create all the necessary components for our Azure Machine Learning Services workflow including our training compute, our trained model, our model API image and our model API container and associated compute.

In [1]:
from azureml.core.workspace import Workspace

ws = Workspace.create(name='myworkspace',
UserWarning: The resource group doesn't exist or was not provided. AzureML SDK is creating a resource group=myresourcegroup in location=westeurope using subscription={subscription_id}.

Create Multi-Node Azure Machine Learning Compute Context

We’ll now create the compute cluster in which our Azure Machine Learning AutoML training will execute.

We’ll go for 4 two-core nodes (this will start by provisioning 1 node and scale up to 4 as required).

In [2]:
from azureml.core.compute import AmlCompute

aml_name = 'myamlcompute'
    aml_compute = AmlCompute(ws, aml_name)
    print('Found existing AML compute context.')
    print('Creating new AML compute context.')
    aml_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_v2", min_nodes=1, max_nodes=4)
    aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config)
    aml_compute.wait_for_completion(show_output = True)
Creating new AML compute context.
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned

Use AutoML to train multiple Regression Models in parallel

We first need to create a file that will allow our compute nodes to retrieve our training data.

As mentioned above, this data is bundled with scikit-learn, so what we’ll do is load the data using scikit-learn and then split the data into a training and test set, passing in the random_state seed so that we can reliably split the data in the same way across our nodes and later on for testing.

In [3]:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from scipy import sparse
import numpy as np

def get_data():
    boston = datasets.load_boston()
    X =
    y =
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

    return {'X': X_train, 'y': y_train}

Now that we’ve written our file, we can pass it into our AutoML configuration.

The configuration has 3 potential tasks: classification, regression, or forecasting depending on the type of problem we’re trying to solve.

As we’re trying to predict a continuous variable in house prices, we’ll be doing a regression. This configuration informs the AutoML as to which models it should be trying i.e. it doesn’t make sense to try K-nearest neighbours for a regression problem.

We need to inform AutoML as to which metric we want to optimise, in this case we’ll go for the root mean squared error and we’ll be using K-fold cross validation with a K of 5.

The number of iterations will be the number of different modelling configurations the AutoML will try and we set a timeout of 15 minutes.

By default the maximum number of concurrent iterations is 1 so we over-ride this to allow us to make use of our cluster.

We’ll also get any error logging sent to a file automl_errors.log.

In [4]:
import logging
import os
import time
from azureml.train.automl import AutoMLConfig

automl_settings = {
    "name": "AutoML_Demo_Experiment",
    "iteration_timeout_minutes": 15,
    "iterations": 25,
    "n_cross_validations": 5,
    "primary_metric": 'r2_score',
    "preprocess": False,
    "max_concurrent_iterations": 8,
    "verbosity": logging.INFO

automl_config = AutoMLConfig(task='regression',
                             compute_target = aml_compute,

Now that our configuration is all set up we can run our experiment, it runs through the 30 iterations we have requested. The first time this is run on a node cluster, it can take a while to set up. It then takes a little more time to scale up.

We get information about which iteration is being evaluated, the pipeline (which normalisation algorithm is being used, which model is being used), how long the training took and the $R^2$ value for the iteration, as well as the $R^2$ for the best iteration so far. The higher the $R^2$, the better.

Note that the iterations are not in order, this is because the iterations are being carried out in parallel, not in sequential order.

In [5]:
from azureml.core.experiment import Experiment
experiment=Experiment(ws, 'automl_remote')
remote_run = experiment.submit(automl_config, show_output=True)
Running on remote compute: myamlcompute
Parent Run ID: AutoML_4aa3b61f-6ec1-4e9f-ad09-8809661c61f8
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
TRAINFRAC: Fraction of the training data to train on.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.

 ITERATION   PIPELINE                                       TRAINFRAC  DURATION      METRIC      BEST
         0   RobustScaler ExtremeRandomTrees                1          0:00:46       0.8308    0.8308
         1   RobustScaler ElasticNet                        1          0:01:36       0.6698    0.8308
         2   RobustScaler ElasticNet                        1          0:01:31       0.6698    0.8308
         3   StandardScalerWrapper RandomForest             1          0:01:52       0.7942    0.8308
         4   StandardScalerWrapper LightGBM                 1          0:02:20       0.7800    0.8308
         5   MaxAbsScaler RandomForest                      1          0:02:40       0.7664    0.8308
         6   RobustScaler KNN                               1          0:03:01       0.5206    0.8308
         9   StandardScalerWrapper LightGBM                 1          0:02:30       0.7470    0.8308
        10   MinMaxScaler DecisionTree                      1          0:02:35       0.7366    0.8308
        11   StandardScalerWrapper LightGBM                 1          0:02:46       0.8503    0.8503
        13   StandardScalerWrapper RandomForest             1          0:02:20       0.8142    0.8503
        14   MaxAbsScaler RandomForest                      1          0:02:28       0.7224    0.8503
        15   MinMaxScaler RandomForest                      1          0:02:36       0.8143    0.8503
        16   MaxAbsScaler ExtremeRandomTrees                1          0:02:30       0.5377    0.8503
        17   MinMaxScaler LightGBM                          1          0:02:30       0.8635    0.8635
        18   SparseNormalizer RandomForest                  1          0:02:40       0.7901    0.8635
         7   RobustScaler LightGBM                          1          0:07:41       0.8622    0.8635
         8   MaxAbsScaler ExtremeRandomTrees                1          0:07:42       0.7280    0.8635
        19   MaxAbsScaler ElasticNet                        1          0:02:33       0.5470    0.8635
        20   MaxAbsScaler RandomForest                      1          0:02:06       0.8288    0.8635
        21   RobustScaler ExtremeRandomTrees                1          0:02:06       0.7744    0.8635
        22   MinMaxScaler GradientBoosting                  1          0:01:38       0.8517    0.8635
        23   StandardScalerWrapper ExtremeRandomTrees       1          0:00:52       0.7542    0.8635
        24   MaxAbsScaler RandomForest                      1          0:01:06       0.7676    0.8635
        12   StandardScalerWrapper RandomForest             1          0:07:20       0.7470    0.8635

We can view more details by running this cell:

In [6]:
from azureml.widgets import RunDetails

Validate our highest performing model

In order to retrieve our best model based on our primary metric, we can run the get_output() method to get the best model.

If we wanted to pick the model from any other iteration, e.g. the “MaxAbsScaler LightGBM” pipeline from iteration 2 we could run get_output(iteration=2). If, for example, we instead wanted the best model for a different metric, say RMSE, we could run get_output(metric="root_mean_squared_error")

In [7]:
best_run, fitted_model = remote_run.get_output()
Run(Experiment: automl_remote,
Id: AutoML_4aa3b61f-6ec1-4e9f-ad09-8809661c61f8_17,
Type: azureml.scriptrun,
Status: Completed)
     steps=[('MinMaxScaler', MinMaxScaler(copy=True, feature_range=(0, 1))), ('LightGBMRegressor', <automl.client.core.common.model_wrappers.LightGBMRegressor object at 0x000001A1C945B780>)])

To save money on our Azure compute costs, we can now delete our compute cluster.

In [8]:

In training our model, we only used the X_train and y_train data.

We can now use the test set of data to see how well our model performs on the unseen set of test data. We’ll plot up our predictions and view our $R^2$ value.

In [9]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

boston = datasets.load_boston()
X =
y =
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)
# Make predictions on unseen data
y_hat = fitted_model.predict(X_test)

r2 = r2_score(y_test, y_hat)'ggplot')
plt.figure(figsize=(10, 7))
plt.scatter(y_test, y_hat)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='lightblue')
plt.title(r"House prices vs House price prediction ($'000s) R^2={}".format(round(r2, 3)))

Deploy model in Azure Machine Learning Sevices

Serialising Model

Now that we’ve got our trained model we’ll want to serialise it so it’s ready for deployment. We can use joblib, which is a more efficient way of pickling files for deployment.

In [10]:
from sklearn.externals import joblib

model_path = 'house_price_regressor.pkl'

joblib.dump(fitted_model, model_path)

Registering Model

Next we register the model in our workspace so that our API can use it.

In [11]:
from azureml.core.model import Model

model = Model.register(model_path = model_path,
                       model_name = "house_price_regression",
                       tags = {"key": "0.1"},
                       description = "Boston House Price Dataset Regression",
                       workspace = ws)

print(,, model.version, sep = '\n')
Registering model house_price_regression


This script will provide our predictions for new observations that our API will be polled with.

There are two functions, an init function and a run function.

Our init function is called first and will set the path to the model based on the model we registered above.

The run function then loads the data passed into the API and deserialises the model using joblib, then polls the model with new observations.

The return value is a list of values that can be serialised to JSON for returning from the API.

In [12]:

import json
import numpy as np

from sklearn.externals import joblib
from azureml.core.model import Model
import azureml.train.automl

def init():
    global model
    model_path = Model.get_model_path('house_price_regression')
    model = joblib.load(model_path)

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    y_hat = model.predict(data)
    return y_hat.tolist()

Create Environment File

The environment will have a number of external python package dependencies required in order to run the, these are added to our conda dependencies in a yml file we’ve named myenv.yml.

In [13]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies()
env_path = 'my_env.yml'

with open(env_path,"w") as f:

Deploy API

Now that we have our image created, we’ll want to deploy it.

We need to provide the configuration for our image so that it knows to run our file using python and requires dependencies in myenv.yml.

We’re then ready to create our container image. We provide it with the model that we registered, the image configuration we have defined, the workspace we’re working with and the a name.

We provide a deployment configuration first with details on the server we’ll deploy our image to – for this example, we’re just going for 1 CPU core and 1 GB of RAM.

We then use the deploy_from_image method to deploy a container from the image we created above.

In [14]:
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage

score_path = ''

image_config = ContainerImage.image_configuration(execution_script=score_path, 

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               tags={"data": "Boston House Prices",  "method" : "sklearn"}, 
                                               description='Predict House Prices using Ensemble Model')

service = Webservice.deploy_from_model(workspace=ws,

Creating image
Image creation operation finished for image automl-model:1, operation "Succeeded"
Creating service
SucceededACI service creation operation finished, operation "Succeeded"

Now that our model is deployed we can view the URI for the API:

In [15]:

And we can test the API to ensure we are getting the same results as we were when we were testing our model locally:

In [16]:
import json
test_data = json.dumps({"data": X_test.tolist()})
test_data = bytes(test_data, encoding = 'utf8')

result =

api_r2 = r2_score(y_test, np.array(result))
print(round(api_r2, 3))

Remember to delete your resource group when you’re finished to save on money/credits!