In last post Tuning sklearn Models with hyperopt, I shared how to use `hyperopt`

to tune hyper-parameters for a traditional sklearn model. Continuing this topic, today's post covers how that works for a keras neural networks model. I trid to write this post in a similar style to Tuning sklearn Models with hyperopt, so that readers can compare them side by side. In this post, I'll also share two scenarios: single-machine (base) scenario and distributed scenario.

I'd like to build a keras neural network model that gives best test performance on mnist. To make it more concrete, I have already decided to use sequential architecture with one hidden layer and one dropout layer. I'd like to find the best values of its following hyper-parameters from the constraints below:

- hidden layer activation,
`activation`

:`relu`

or`sigmoid`

- dropout rate,
`dropout`

: from 0 to 1

The `mnist`

dataset can be directly downloaded from `tf.keras.datasets`

API as follows:

import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0

Now let's translate the constraints of the hyper parameters into `hyperopt`

's seach space.

from hyperas.distributions import choice, uniform space_tf = {'activation': choice('activation',['relu','sigmoid']), 'dropout': uniform('dropout',0,1) }

The goal for us is to find best hyper-parameters that achieve **lowest test error**, which can be translated into the following code.

def objective_func(space_sample): ## parse the hyper-parameter sample activ = space_sample['activation'] dropout = space_sample['dropout'] ## build the model based on the hyper-parameters model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation=activ), tf.keras.layers.Dropout(dropout), tf.keras.layers.Dense(10, activation='softmax') ]) ## compile the model model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) ## train the model model.fit(x_train, y_train, epochs=5) ## evaluate test performance loss, accuracy = model.evaluate(x_test, y_test, verbose=2) return loss

Now we have data, constraints and objective ready, it's time to start tuning for the best parameters.

from hyperopt import tpe, fmin best_classifier = fmin(objective_func, space, algo=tpe.suggest, max_evals=10) print(best_classifier)

After trying 10 iterations, `hyperopt`

comes back with a pretty good set of hyper-parameters. In my experiment, it ends up with this `{activation: 0, dropout: 0.5517}`

(it's index based, equivalent to `{activation: relu, dropout: 0.5517}`

)

Here's the part where hyperopt's working with `keras`

neural network models starts to differ from that with `sklearn`

models. If following directly the distributed scenario of Tuning sklearn Models with hyperopt, you would encounter `TypeError: can't pickle _LazyLoader objects`

- `MongoTrials`

still has issues working with `keras`

neural network models.

In order to run distributed tuning for neural networks, we need to use hyperas library, which essentially a library porting `hyperopt`

for `keras`

models.

Instead of loading data directly in the main script, we cleanly make a function as follows.

import tf def data(): mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 return x_train, y_train, x_test, y_test

One of the neat features of `hyperas`

is it enables users to define constraints/ranges of hyper-parameters directly in objective definition.

import tf from hyperas.distributions import choice, uniform def create_model(x_train, y_train, x_test, y_test): ## create model with constraints of hyper-parameters model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation={{choice(['relu', 'sigmoid'])}}), tf.keras.layers.Dropout({{uniform(0, 1)}}), tf.keras.layers.Dense(10, activation='softmax') ]) ## compile keras model model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) ## train the model model.fit(x_train, y_train, epochs=5) ## evaluate the test performance loss, accuracy = model.evaluate(x_test, y_test, verbose=2) return loss

We replace `hyperopt.fmin`

with `hyperas.optim.minimize`

, which takes a `MongoTrials`

object to enable distributed training.

from hyperopt import tpe from hyperas import optim from hyperopt.mongoexp import MongoTrials trials = MongoTrials('mongo://localhost:27017/tf_mnist/jobs', exp_key='exp1') best_run, best_model = optim.minimize(model=create_model, data=data, algo=tpe.suggest, max_evals=10, trials=trials, notebook_name="tool1. try out hyperopt", ## if using notebook, put its name keep_temp=True)

Just like before the training jobs are only registered at previous step, we need a bunch of workers to do the real work. Compared with previous launching workers, the only difference is we have to copy `temp_model.py`

from main program folder over to worker folder. This file is automatically generated by the main program.

## create a working directory ## for the worker mkdir worker cd worker ## the additional step needed compared with ## launching workers previously cp /path/to/temp_model.py ./ ## ideally use the same python environment ## as the main program ## use conda environment as a example (pip environment also fine) conda activate tuning_env hyperopt-mongo-worker --mongo=localhost:27017/tf_mnist --poll-interval=0.1

At the end, you should get the same best hyper-parameters in shorter period of time.

`hyperopt`

works well with `keras`

models in single-machine tuning mode. If enabling distributed mode, `hyperopt`

throws out `TypeError`

which prevents us from using the `MongoTrials`

in `hyperopt.fmin`

.

One workaround is use `hyperas`

which is designed for `keras`

models. Its usage is very similar to `hyperopt with the following quick steps.

- define data loading function
- define objective function which incorporates constraints of hyper-parameters directly
- launch tuning process in the main program
- copy
`temp_model.py`

over to worker folder and launch workers there