Hyper-parameter tuning has been one of the least wanted tasks for a data scientist. In the past, people have to do in either grid search or random search strategy. Recently, there's been a trend of using bayesian optimization strategy, e.g., AutoML. I've been wanting to try out this python library called hyperopt. In this post, I'll share what I'd use it in two scenarios.

I'd like to build a `sklearn`

classfier that gives best test performance. To make it more concrete, I have narrowed my classfier type to `KNeighborsClassifier`

and I'd like to find the best values of its following hyper-parameters from the constraints below:

- number of neighbors,
`n_neighbors`

: range from 3 to 11 - specific
`algorithm`

: either`ball_tree`

or`kd_tree`

`leaf_size`

: range from 1 to 50- distance
`metric`

: one of`euclidean`

,`manhattan`

,`chebyshev`

,`minkowski`

For the demo purpose, let's choose the well-known dataset `iris`

from `sklearn`

.

from sklearn import datasets from sklearn.model_selection import train_test_split iris = datasets.load_iris() x = iris.data y = iris.target x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

Now let's translate the constraints of the hyper parameters into `hyperopt`

's seach space.

from hyperas.distributions import choice space = {'n_neighbors': choice('n_neighbors',range(3,11)), 'algorithm': choice('algorithm',['ball_tree','kd_tree']), 'leaf_size': choice('leaf_size',range(1,50)), 'metric': choice('metric', ["euclidean","manhattan", "chebyshev","minkowski"]) }

The goal for us is to find best hyper-parameters that achieve **lowest test error**, which can be translated into the following code.

def objective_func(space_sample): ## parse the hyper-parameter sample n_neighbors = space_sample['n_neighbors'] algorithm = space_sample['algorithm'] leaf_size = space_sample['leaf_size'] metric = space_sample['metric'] ## build the classfier based on the hyper-parameters clf = KNeighborsClassifier(n_neighbors=n_neighbors, algorithm=algorithm, leaf_size=leaf_size, metric=metric, ) ## train the classifier clf.fit(x_train,y_train) ## evaluate test performance y_pred_test = clf.predict(x_test) loss = mean_squared_error(y_test,y_pred_test) return loss

Now we have data, constraints and objective ready, it's time to start tuning for the best parameters.

best_classifier = fmin(objective_func, space, algo=tpe.suggest, max_evals=100) print(best_classifier)

After trying 100 times in less than 10 seconds, `hyperopt`

comes back with a pretty good set of hyper-parameters. In my experiment, it ends up with this `{n_neighbors: 6, algorithm: 1, leaf_size: 13, metric: 0}`

(it's index based, equivalent to `{n_neighbors: 9, algorithm: kd_tree, leaf_size: 13, metric: euclidean}`

)

All the previous steps are done in a single machine. When each evaluation is computationally expensive and many evaluations are required, such sequencial tuning doesn't scale well. The great thing about `hyperopt`

is that it allows tuning in distributed fashion.

The additional component we need is a work distributor/broker, for which `hyperopt`

uses `MongoDB`

. The idea is the main program (which executes `fmin`

) spawns training jobs (one job per set of hyper-parameters) and get them registered into `MongoDB`

. On the other hand, an array of workers (called `hyperopt-mongo-worker`

) can be launched, which connects to `MongoDB`

to fetch and execute training jobs.

You only need to install MongoDB and launch it. Guidelines on installation of MongoDB can be found here.

All the data, constraints and objective are the same as base scenario. The only modification needed is add an object called `trials`

to the call of `fmin`

. The `trials`

object basically allow this main program to register jobs to a designated databse-table (here would be `iris`

database, `jobs`

table) in `MongoDB`

.

from hyperopt.mongoexp import MongoTrials trials = MongoTrials('mongo://localhost:27017/iris/jobs', exp_key='exp1') best_classifier = fmin(objective_func, space, trials=trials, algo=tpe.suggest,max_evals=100) print(best_classifier)

Since the training jobs are only registered, we need a bunch of workers to do the real work. Launching a worker is also made easy.

## create a working directory ## for the worker mkdir worker cd worker ## ideally use the same python environment ## as the main program ## use conda environment as a example (pip environment also fine) conda activate tuning_env hyperopt-mongo-worker --mongo=localhost:27017/iris --poll-interval=0.1

At the end, you should get the same best hyper-parameters in shorter period of time.

`hyperopt`

has made the machine learning model tuning easy and efficient. For a data scientist who likes to try out, remeber the four steps:

- load the data
- translate constraints on the hyper-parameters
- define objective function
- start tuning (if choosing distributed approach, one needs to launch workers)