## Tuning ML Models with optuna

by Kehang Han on 2020-06-01 | tags: automl

Since my previous writings - Tuning sklearn Models with hyperopt and Tuning keras Models with hyperopt, I've been using hyperopt library until recently read a well-written post and noticed a similar package optuna has been gaining a lot of momentum. After first trying it out, I started loving to use it because of its user-friendliness, intuitiveness and importantly minimal required changes to parallelize calculation. So today I decided to make a sibling post to show you how to tune ML models via optuna.

Here I chose the same setting as that of Tuning sklearn Models with hyperopt - build a sklearn classfier that classifies iris data. Source code can be found here. The search space for the hyper-parameters is listed again as follows:

• number of neighbors, n_neighbors: range from 3 to 11
• specific algorithm: either ball_tree or kd_tree
• leaf_size: range from 1 to 50
• distance metric: one of euclidean, manhattan, chebyshev, minkowski

### 1. Define Objective

Unlike hyperopt, an objective function for optuna defines a whole procedure of an evaluation for a set of hyper-parameters, which includes

• hyper-parameter sampling
• constructing the model
• training the model, and
• evaluating the model
from sklearn.svm import SVC
from sklearn import datasets
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

def objective(trial):

x = iris.data
y = iris.target
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

# Sample hyper parameters
classifier_name = trial.suggest_categorical("classifier", ["KNeighborsClassifier",
"SVC"])
if classifier_name=="KNeighborsClassifier":

# Sample hyper parameters
n_neighbors = trial.suggest_int('n_neighbors', 3, 11)
algorithm = trial.suggest_categorical("algorithm",
["ball_tree",
"kd_tree"])
leaf_size = trial.suggest_int('leaf_size', 1, 50)
metric = trial.suggest_categorical('metric',
["euclidean","manhattan",
"chebyshev","minkowski"])
# Construct the model
clf = KNeighborsClassifier(n_neighbors=n_neighbors,
algorithm=algorithm,
leaf_size=leaf_size,
metric=metric,
)
elif classifier_name=="SVC":

# Sample hyper parameters
C = trial.suggest_loguniform('C', 1e-10, 1)
kernel = trial.suggest_categorical('kernel',['rbf','poly','rbf','sigmoid'])
degree = trial.suggest_int('degree',1, 50)
gamma = trial.suggest_loguniform('gamma',0.001,10000)

# Construct the model
clf = SVC(C=C, kernel=kernel, degree=degree,gamma=gamma)

# Train the model
clf.fit(x_train,y_train)

# Evaluate the model
y_pred_test = clf.predict(x_test)
loss = mean_squared_error(y_test,y_pred_test)
print("Test Score:",clf.score(x_test,y_test))
print("Train Score:",clf.score(x_train,y_train))
print("\n=================")
return loss


The objective function takes one argument: trial object from optuna, which has the key power of sampling hyper-parameters. In contrast, if you remeber, hyperopt requires to define search space outside objective function and pass it into fmin to take care of the sampling. It seems not much a big deal at this point, but this atomic function design under optuna framework really makes distributed optimization less prone to bugs.

### 2. Local Tuning

With objective defined as above, we can easily start tuning the hyper-parameters as follows:

import optuna

# depending on the definition of objective
# we can create study object with either minimize or maximize
study = optuna.create_study(direction='minimize')

# start tuning for the hyper-parameters
study.optimize(objective, n_trials=100)


For a single local machine, this is what you need. If you have multiple machines designated to run the study together, the code only needs minimal changes (see the next section for details).

### 3. Distributed Tuning

To use a cluster of machines, we need a database to serve as a central place to store trial status and results. Unlike hyperopt uses mongoDB, optuna uses relational databases, like sqlite, MySQL, postgreSQL. In this example, I set up a local MySQL database ml_expts with the connection string being mysql+pymysql://root:root@localhost:8888/ml_expts. The only difference from local tuning is to feed the database connection string into the create_study method as storage.

import optuna

# depending on the definition of objective
# we can create study object with either minimize or maximize
study = optuna.create_study(direction='minimize',
study_name='distributed-tuning',
storage='mysql+pymysql://root:root@localhost:8888/ml_expts',

You might wonder this code is just for the master machine, how about worker side? The good news is optuna doesn't require one to write a separate script for workers - the same script will be used in worker machines. This is enabled by turning on the argument in the create_study: load_if_exists. If the worker running the script finds the study has been created before, it will access the current status of the study and continue to run and contribute to it.
What you'll find great is there's not much difference between the two scripts using optuna. Unlike hyperopt that has difficulty in handling distributed tuning for keras models (we resorted to another package hyperas in Tuning keras Models with hyperopt), optuna's distributed tuning works perfectly the same way regardless of whether it's a simple sklearn model or a complex keras neural network model.