Tuning ML Models with optuna

by Kehang Han on 2020-06-01 | tags: automl

Since my previous writings - Tuning sklearn Models with hyperopt and Tuning keras Models with hyperopt, I've been using hyperopt library until recently read a well-written post and noticed a similar package optuna has been gaining a lot of momentum. After first trying it out, I started loving to use it because of its user-friendliness, intuitiveness and importantly minimal required changes to parallelize calculation. So today I decided to make a sibling post to show you how to tune ML models via optuna.

Here I chose the same setting as that of Tuning sklearn Models with hyperopt - build a sklearn classfier that classifies iris data. Source code can be found here. The search space for the hyper-parameters is listed again as follows:

1. Define Objective

Unlike hyperopt, an objective function for optuna defines a whole procedure of an evaluation for a set of hyper-parameters, which includes

from sklearn.svm import SVC
from sklearn import datasets
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split


def objective(trial):

    # Load data
    iris = datasets.load_iris()
    x = iris.data
    y = iris.target
    x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)
    
    # Sample hyper parameters
    classifier_name = trial.suggest_categorical("classifier", ["KNeighborsClassifier",
                                                               "SVC"])
    if classifier_name=="KNeighborsClassifier":

        # Sample hyper parameters
        n_neighbors = trial.suggest_int('n_neighbors', 3, 11)
        algorithm = trial.suggest_categorical("algorithm", 
                                              ["ball_tree",
                                                "kd_tree"])
        leaf_size = trial.suggest_int('leaf_size', 1, 50)
        metric = trial.suggest_categorical('metric', 
                                           ["euclidean","manhattan", 
                                            "chebyshev","minkowski"])
        # Construct the model
        clf = KNeighborsClassifier(n_neighbors=n_neighbors,
                               algorithm=algorithm,
                               leaf_size=leaf_size,
                               metric=metric,
                               )
    elif classifier_name=="SVC":

        # Sample hyper parameters
        C = trial.suggest_loguniform('C', 1e-10, 1)
        kernel = trial.suggest_categorical('kernel',['rbf','poly','rbf','sigmoid'])
        degree = trial.suggest_int('degree',1, 50)
        gamma = trial.suggest_loguniform('gamma',0.001,10000)

        # Construct the model
        clf = SVC(C=C, kernel=kernel, degree=degree,gamma=gamma)
    
    # Train the model
    clf.fit(x_train,y_train)

    # Evaluate the model
    y_pred_test = clf.predict(x_test)
    loss = mean_squared_error(y_test,y_pred_test)
    print("Test Score:",clf.score(x_test,y_test))
    print("Train Score:",clf.score(x_train,y_train))
    print("\n=================")
    return loss

The objective function takes one argument: trial object from optuna, which has the key power of sampling hyper-parameters. In contrast, if you remeber, hyperopt requires to define search space outside objective function and pass it into fmin to take care of the sampling. It seems not much a big deal at this point, but this atomic function design under optuna framework really makes distributed optimization less prone to bugs.

2. Local Tuning

With objective defined as above, we can easily start tuning the hyper-parameters as follows:

import optuna

# depending on the definition of objective
# we can create study object with either minimize or maximize
study = optuna.create_study(direction='minimize')

# start tuning for the hyper-parameters
study.optimize(objective, n_trials=100)

For a single local machine, this is what you need. If you have multiple machines designated to run the study together, the code only needs minimal changes (see the next section for details).

3. Distributed Tuning

To use a cluster of machines, we need a database to serve as a central place to store trial status and results. Unlike hyperopt uses mongoDB, optuna uses relational databases, like sqlite, MySQL, postgreSQL. In this example, I set up a local MySQL database ml_expts with the connection string being mysql+pymysql://root:root@localhost:8888/ml_expts. The only difference from local tuning is to feed the database connection string into the create_study method as storage.

import optuna

# depending on the definition of objective
# we can create study object with either minimize or maximize
study = optuna.create_study(direction='minimize',
                            study_name='distributed-tuning',
                            storage='mysql+pymysql://root:root@localhost:8888/ml_expts', 
                            load_if_exists=True)

# start tuning for the hyper-parameters
study.optimize(objective, n_trials=100)

You might wonder this code is just for the master machine, how about worker side? The good news is optuna doesn't require one to write a separate script for workers - the same script will be used in worker machines. This is enabled by turning on the argument in the create_study: load_if_exists. If the worker running the script finds the study has been created before, it will access the current status of the study and continue to run and contribute to it.

4. Trying out

If you'd like to try out this package, feel free to refer to the two scripts I created when making the post: one for sklearn model and the other for keras model.

What you'll find great is there's not much difference between the two scripts using optuna. Unlike hyperopt that has difficulty in handling distributed tuning for keras models (we resorted to another package hyperas in Tuning keras Models with hyperopt), optuna's distributed tuning works perfectly the same way regardless of whether it's a simple sklearn model or a complex keras neural network model.