Hyperparameter tuning is one of the key concepts in machine learning. Grid search, random search, gradient based optimization are few concepts you could use to perform hyperparameter tuning automatically [1].
In this article, I am going to explain how you could do the hyperparameter tuning manually by performing few tests. I am going to use WSO2 Machine Learner 1.0 for this purpose (refer [2] to understand what WSO2 ML 1.0 is capable of doing). Dataset I have used to perform this analysis is the wellknown Pima Indians Diabetes dataset [3] and the algorithm picked was Logistic regression with mini batch gradient descent algorithm. For this algorithm, there are few hyperparameters namely,
 Iterations  Number of times optimizer runs before completing the optimization process
 Learning rate  Step size of the optimization algorithm
 Regularization type  Type of the regularization. WSO2 Machine Learner supports L2 and L1 regularizations.
 Regularization parameter  Regularization parameter controls the model complexity and hence, helps to control model overfitting.
 SGD Data Fraction  Fraction of the training dataset use in a single iteration of the optimization algorithm
From the above set of hyperparameters, what I wanted to know was, the optimal learning rate and the number of iterations keeping other hyperparameters at a constant value.
Goals
 Finding the optimal learning rate and the number of iterations which improves AUC (Area under curve of ROC curve [4])
 Finding the relationship between Learning rate and AUC
 Finding the relationship between number of iterations and AUC
Approach
Firstly, Pima Indians Diabetes dataset was uploaded to WSO2 ML 1.0. Then, I wanted to understand a fair number for the iterations so that I could find the optimal learning rate. For that the learning rate was kept at a fixed value (0.1) and varied the number of iterations and recorded the AUC against each iterations number.
LR = 0.1
 
Iterations

100

1000

5000

10000

20000

30000

50000

AUC

0.475

0.464

0.507

0.526

0.546

0.562

0.592

According to the plotted graph, it is quite evident that the AUC increases with the number of iterations. Hence, I picked 10000 as a fair number of iterations to find the optimal learning rate (of course I could have picked any number > 5000 (where learning rate started to climb over 0.5)). Increasing number of iterations extensively would lead to an overfitted model.
Since, I have picked a ‘fair’ number for iterations, next step is to find the optimal learning rate. For that, the number of iterations was kept at a fixed value (10000) and varied the learning rate and recorded the AUC against each learning rate.
Iterations=10000
 
LR

0.0001

0.0005

0.001

0.005

0.01

0.1

AUC

0.529

0.558

0.562

0.59

0.599

0.526

According to the above observations, we can see that the AUC has a global maxima at 0.01 learning rate (to be precise it is between 0.005 and 0.01). Hence, we could conclude that AUC is get maximized when learning rate approaches 0.01 i.e. 0.01 is the optimal learning rate for this particular dataset and algorithm.
Now, we could change the learning rate to 0.01 and rerun the first test mentioned in the article.
LR = 0.01
 
Iterations

100

1000

5000

10000

20000

30000

50000

100000

150000

AUC

0.512

0.522

0.595

0.599

0.601

0.604

0.607

0.612

0.616

Above graph depicts that the AUC increases ever so slightly when we increase the number of iterations. So, how to find the optimal number of iterations? Well, it depends on how much computing power you have and also what level of AUC you expect. AUC will probably not improve drastically, even though you improve number of iterations.
How can I increase the AUC then? You can of course use another binary classification algorithm (Support Vector Machine) or else you could do some feature engineering on the dataset so that it reduces the noise of the training data.
Summary
This article tries to explain the process of tuning hyperparameters for a selected dataset and an algorithm. Same approach could be used with different datasets and algorithms too.
References:
No comments:
Post a Comment