Now that we’ve written our
get_data.py file, we can pass it into our AutoML configuration.
The configuration has 3 potential tasks:
forecasting depending on the type of problem we’re trying to solve.
As we’re trying to predict a continuous variable in house prices, we’ll be doing a regression. This configuration informs the AutoML as to which models it should be trying i.e. it doesn’t make sense to try K-nearest neighbours for a regression problem.
We need to inform AutoML as to which metric we want to optimise, in this case we’ll go for the root mean squared error and we’ll be using K-fold cross validation with a K of 5.
The number of iterations will be the number of different modelling configurations the AutoML will try and we set a timeout of 15 minutes.
By default the maximum number of concurrent iterations is 1 so we over-ride this to allow us to make use of our cluster.
We’ll also get any error logging sent to a file