Since my previous writings - Tuning sklearn Models with hyperopt and Tuning keras Models with hyperopt, I've been using `hyperopt`

library until recently read a well-written post and noticed a similar package `optuna`

has been gaining a lot of momentum. After first trying it out, I started loving to use it because of its user-friendliness, intuitiveness and importantly minimal required changes to parallelize calculation. So today I decided to make a sibling post to show you how to tune ML models via `optuna`

.

Here I chose the same setting as that of Tuning sklearn Models with hyperopt - build a `sklearn`

classfier that classifies `iris`

data. Source code can be found here. The search space for the hyper-parameters is listed again as follows:

- number of neighbors,
`n_neighbors`

: range from 3 to 11 - specific
`algorithm`

: either`ball_tree`

or`kd_tree`

`leaf_size`

: range from 1 to 50- distance
`metric`

: one of`euclidean`

,`manhattan`

,`chebyshev`

,`minkowski`

Unlike `hyperopt`

, an objective function for `optuna`

defines a whole procedure of an evaluation for a set of hyper-parameters, which includes

- data loading
- hyper-parameter sampling
- constructing the model
- training the model, and
- evaluating the model

from sklearn.svm import SVC from sklearn import datasets from sklearn.metrics import mean_squared_error from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split def objective(trial): # Load data iris = datasets.load_iris() x = iris.data y = iris.target x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2) # Sample hyper parameters classifier_name = trial.suggest_categorical("classifier", ["KNeighborsClassifier", "SVC"]) if classifier_name=="KNeighborsClassifier": # Sample hyper parameters n_neighbors = trial.suggest_int('n_neighbors', 3, 11) algorithm = trial.suggest_categorical("algorithm", ["ball_tree", "kd_tree"]) leaf_size = trial.suggest_int('leaf_size', 1, 50) metric = trial.suggest_categorical('metric', ["euclidean","manhattan", "chebyshev","minkowski"]) # Construct the model clf = KNeighborsClassifier(n_neighbors=n_neighbors, algorithm=algorithm, leaf_size=leaf_size, metric=metric, ) elif classifier_name=="SVC": # Sample hyper parameters C = trial.suggest_loguniform('C', 1e-10, 1) kernel = trial.suggest_categorical('kernel',['rbf','poly','rbf','sigmoid']) degree = trial.suggest_int('degree',1, 50) gamma = trial.suggest_loguniform('gamma',0.001,10000) # Construct the model clf = SVC(C=C, kernel=kernel, degree=degree,gamma=gamma) # Train the model clf.fit(x_train,y_train) # Evaluate the model y_pred_test = clf.predict(x_test) loss = mean_squared_error(y_test,y_pred_test) print("Test Score:",clf.score(x_test,y_test)) print("Train Score:",clf.score(x_train,y_train)) print("\n=================") return loss

The `objective`

function takes one argument: `trial`

object from `optuna`

, which has the key power of sampling hyper-parameters. In contrast, if you remeber, `hyperopt`

requires to define search space outside `objective`

function and pass it into `fmin`

to take care of the sampling. It seems not much a big deal at this point, but this atomic function design under `optuna`

framework really makes distributed optimization less prone to bugs.

With objective defined as above, we can easily start tuning the hyper-parameters as follows:

import optuna # depending on the definition of objective # we can create study object with either minimize or maximize study = optuna.create_study(direction='minimize') # start tuning for the hyper-parameters study.optimize(objective, n_trials=100)

For a single local machine, this is what you need. If you have multiple machines designated to run the study together, the code only needs minimal changes (see the next section for details).

To use a cluster of machines, we need a database to serve as a central place to store trial status and results. Unlike `hyperopt`

uses mongoDB, `optuna`

uses relational databases, like sqlite, MySQL, postgreSQL. In this example, I set up a local MySQL database `ml_expts`

with the connection string being `mysql+pymysql://root:root@localhost:8888/ml_expts`

. The only difference from local tuning is to feed the database connection string into the `create_study`

method as storage.

import optuna # depending on the definition of objective # we can create study object with either minimize or maximize study = optuna.create_study(direction='minimize', study_name='distributed-tuning', storage='mysql+pymysql://root:root@localhost:8888/ml_expts', load_if_exists=True) # start tuning for the hyper-parameters study.optimize(objective, n_trials=100)

You might wonder this code is just for the master machine, how about worker side? The good news is `optuna`

doesn't require one to write a separate script for workers - the same script will be used in worker machines. This is enabled by turning on the argument in the `create_study`

: `load_if_exists`

. If the worker running the script finds the study has been created before, it will access the current status of the study and continue to run and contribute to it.

If you'd like to try out this package, feel free to refer to the two scripts I created when making the post: one for sklearn model and the other for keras model.

What you'll find great is there's not much difference between the two scripts using `optuna`

. Unlike `hyperopt`

that has difficulty in handling distributed tuning for keras models (we resorted to another package `hyperas`

in Tuning keras Models with hyperopt), `optuna`

's distributed tuning works perfectly the same way regardless of whether it's a simple sklearn model or a complex keras neural network model.