Tuning keras Models with hyperopt

by Kehang Han on 2020-02-14 | tags: automl keras

In last post Tuning sklearn Models with hyperopt, I shared how to use hyperopt to tune hyper-parameters for a traditional sklearn model. Continuing this topic, today's post covers how that works for a keras neural networks model. I trid to write this post in a similar style to Tuning sklearn Models with hyperopt, so that readers can compare them side by side. In this post, I'll also share two scenarios: single-machine (base) scenario and distributed scenario.

1. Base Scenario

I'd like to build a keras neural network model that gives best test performance on mnist. To make it more concrete, I have already decided to use sequential architecture with one hidden layer and one dropout layer. I'd like to find the best values of its following hyper-parameters from the constraints below:

1.1 Prepare Data

The mnist dataset can be directly downloaded from tf.keras.datasets API as follows:

import tensorflow as tf

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

1.2 Translate Constraints

Now let's translate the constraints of the hyper parameters into hyperopt's seach space.

from hyperas.distributions import choice, uniform

space_tf = {'activation': choice('activation',['relu','sigmoid']),
            'dropout': uniform('dropout',0,1)
            }

1.3 Define Objective

The goal for us is to find best hyper-parameters that achieve lowest test error, which can be translated into the following code.

def objective_func(space_sample):

    ## parse the hyper-parameter sample
    activ = space_sample['activation']
    dropout = space_sample['dropout']

    ## build the model based on the hyper-parameters
    model = tf.keras.models.Sequential([
              tf.keras.layers.Flatten(input_shape=(28, 28)),
              tf.keras.layers.Dense(128, activation=activ),
              tf.keras.layers.Dropout(dropout),
              tf.keras.layers.Dense(10, activation='softmax')
    ])

    ## compile the model
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    ## train the model
    model.fit(x_train, y_train, epochs=5)

    ## evaluate test performance
    loss, accuracy = model.evaluate(x_test,  y_test, verbose=2)
    return loss

1.4 Start Tuning

Now we have data, constraints and objective ready, it's time to start tuning for the best parameters.

from hyperopt import tpe, fmin

best_classifier = fmin(objective_func, space, algo=tpe.suggest, max_evals=10)
print(best_classifier)

After trying 10 iterations, hyperopt comes back with a pretty good set of hyper-parameters. In my experiment, it ends up with this {activation: 0, dropout: 0.5517} (it's index based, equivalent to {activation: relu, dropout: 0.5517})

2. Distributed Scenario

Here's the part where hyperopt's working with keras neural network models starts to differ from that with sklearn models. If following directly the distributed scenario of Tuning sklearn Models with hyperopt, you would encounter TypeError: can't pickle _LazyLoader objects - MongoTrials still has issues working with keras neural network models.

In order to run distributed tuning for neural networks, we need to use hyperas library, which essentially a library porting hyperopt for keras models.

2.1 Prepare Data

Instead of loading data directly in the main script, we cleanly make a function as follows.

import tf

def data():
    mnist = tf.keras.datasets.mnist
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    return x_train, y_train, x_test, y_test

2.2 Define Objective

One of the neat features of hyperas is it enables users to define constraints/ranges of hyper-parameters directly in objective definition.

import tf
from hyperas.distributions import choice, uniform

def create_model(x_train, y_train, x_test, y_test):

    ## create model with constraints of hyper-parameters
    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation={{choice(['relu', 'sigmoid'])}}),
      tf.keras.layers.Dropout({{uniform(0, 1)}}),
      tf.keras.layers.Dense(10, activation='softmax')
    ])

    ## compile keras model
    model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

    ## train the model
    model.fit(x_train, y_train, epochs=5)

    ## evaluate the test performance
    loss, accuracy = model.evaluate(x_test,  y_test, verbose=2)
    return loss

2.3 Start Tuning

We replace hyperopt.fmin with hyperas.optim.minimize, which takes a MongoTrials object to enable distributed training.

from hyperopt import tpe
from hyperas import optim
from hyperopt.mongoexp import MongoTrials

trials = MongoTrials('mongo://localhost:27017/tf_mnist/jobs',
                     exp_key='exp1')
best_run, best_model = optim.minimize(model=create_model,
      data=data,
      algo=tpe.suggest,
      max_evals=10,
      trials=trials,
      notebook_name="tool1. try out hyperopt", ## if using notebook, put its name
      keep_temp=True)

2.4 Launch Worker

Just like before the training jobs are only registered at previous step, we need a bunch of workers to do the real work. Compared with previous launching workers, the only difference is we have to copy temp_model.py from main program folder over to worker folder. This file is automatically generated by the main program.

## create a working directory 
## for the worker
mkdir worker
cd worker

## the additional step needed compared with 
## launching workers previously
cp /path/to/temp_model.py ./

## ideally use the same python environment
## as the main program 
## use conda environment as a example (pip environment also fine)
conda activate tuning_env
hyperopt-mongo-worker --mongo=localhost:27017/tf_mnist --poll-interval=0.1

At the end, you should get the same best hyper-parameters in shorter period of time.

Concluding Words

hyperopt works well with keras models in single-machine tuning mode. If enabling distributed mode, hyperopt throws out TypeError which prevents us from using the MongoTrials in hyperopt.fmin.

One workaround is use hyperas which is designed for keras models. Its usage is very similar to `hyperopt with the following quick steps.