Since my previous writings - Tuning sklearn Models with hyperopt and Tuning keras Models with hyperopt, I've been using
hyperopt library until recently read a well-written post and noticed a similar package
optuna has been gaining a lot of momentum. After first trying it out, I started loving to use it because of its user-friendliness, intuitiveness and importantly minimal required changes to parallelize calculation. So today I decided to make a sibling post to show you how to tune ML models via
Here I chose the same setting as that of Tuning sklearn Models with hyperopt - build a
sklearn classfier that classifies
iris data. Source code can be found here. The search space for the hyper-parameters is listed again as follows:
n_neighbors: range from 3 to 11
leaf_size: range from 1 to 50
metric: one of
hyperopt, an objective function for
optuna defines a whole procedure of an evaluation for a set of hyper-parameters, which includes
from sklearn.svm import SVC from sklearn import datasets from sklearn.metrics import mean_squared_error from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split def objective(trial): # Load data iris = datasets.load_iris() x = iris.data y = iris.target x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2) # Sample hyper parameters classifier_name = trial.suggest_categorical("classifier", ["KNeighborsClassifier", "SVC"]) if classifier_name=="KNeighborsClassifier": # Sample hyper parameters n_neighbors = trial.suggest_int('n_neighbors', 3, 11) algorithm = trial.suggest_categorical("algorithm", ["ball_tree", "kd_tree"]) leaf_size = trial.suggest_int('leaf_size', 1, 50) metric = trial.suggest_categorical('metric', ["euclidean","manhattan", "chebyshev","minkowski"]) # Construct the model clf = KNeighborsClassifier(n_neighbors=n_neighbors, algorithm=algorithm, leaf_size=leaf_size, metric=metric, ) elif classifier_name=="SVC": # Sample hyper parameters C = trial.suggest_loguniform('C', 1e-10, 1) kernel = trial.suggest_categorical('kernel',['rbf','poly','rbf','sigmoid']) degree = trial.suggest_int('degree',1, 50) gamma = trial.suggest_loguniform('gamma',0.001,10000) # Construct the model clf = SVC(C=C, kernel=kernel, degree=degree,gamma=gamma) # Train the model clf.fit(x_train,y_train) # Evaluate the model y_pred_test = clf.predict(x_test) loss = mean_squared_error(y_test,y_pred_test) print("Test Score:",clf.score(x_test,y_test)) print("Train Score:",clf.score(x_train,y_train)) print("\n=================") return loss
objective function takes one argument:
trial object from
optuna, which has the key power of sampling hyper-parameters. In contrast, if you remeber,
hyperopt requires to define search space outside
objective function and pass it into
fmin to take care of the sampling. It seems not much a big deal at this point, but this atomic function design under
optuna framework really makes distributed optimization less prone to bugs.
With objective defined as above, we can easily start tuning the hyper-parameters as follows:
import optuna # depending on the definition of objective # we can create study object with either minimize or maximize study = optuna.create_study(direction='minimize') # start tuning for the hyper-parameters study.optimize(objective, n_trials=100)
For a single local machine, this is what you need. If you have multiple machines designated to run the study together, the code only needs minimal changes (see the next section for details).
To use a cluster of machines, we need a database to serve as a central place to store trial status and results. Unlike
hyperopt uses mongoDB,
optuna uses relational databases, like sqlite, MySQL, postgreSQL. In this example, I set up a local MySQL database
ml_expts with the connection string being
mysql+pymysql://root:root@localhost:8888/ml_expts. The only difference from local tuning is to feed the database connection string into the
create_study method as storage.
import optuna # depending on the definition of objective # we can create study object with either minimize or maximize study = optuna.create_study(direction='minimize', study_name='distributed-tuning', storage='mysql+pymysql://root:root@localhost:8888/ml_expts', load_if_exists=True) # start tuning for the hyper-parameters study.optimize(objective, n_trials=100)
You might wonder this code is just for the master machine, how about worker side? The good news is
optuna doesn't require one to write a separate script for workers - the same script will be used in worker machines. This is enabled by turning on the argument in the
load_if_exists. If the worker running the script finds the study has been created before, it will access the current status of the study and continue to run and contribute to it.
If you'd like to try out this package, feel free to refer to the two scripts I created when making the post: one for sklearn model and the other for keras model.
What you'll find great is there's not much difference between the two scripts using
hyperopt that has difficulty in handling distributed tuning for keras models (we resorted to another package
hyperas in Tuning keras Models with hyperopt),
optuna's distributed tuning works perfectly the same way regardless of whether it's a simple sklearn model or a complex keras neural network model.