Automatic Model Tuning
Hyperparameter tuning
-
Very exciting capability of Sagemaker
-
How do you know the best values of learning rate, batch size, depth, etc?
-
Often you have to experiment
-
Problem blows up quickly when you have many different hyperparameters; need to try every combination of every possible value somehow, train a model, and evaluate it every time
Automatic Model Tuning
-
Define the hyperparameters you care about and the ranges you want to try, and the metrics you are optimizing for
-
SageMaker spins up a "HyperParameter Tuning Job" that trains as many combinations as you'll allow
-
Training instances are spun up as needed, potentially a lot of them
-
can use the parallel capabilities of sage maker and the ability to spin up entire separate instances to do this for you as quick as possible
-
-
The set of hyperparameters producing the best results can then be deployed as a model
-
It learns as it goes, so it doesn't have to try every possible combination
-
Knows which direction is having a better effect
-
Can save a lot when doing hyperparameter tuning
-
Best Practices
-
Don't optimize too many hyperparameters at once
-
Explodes very quickly
-
Focus on the Hyperparameters you think will have the largest impact on the accuracy of the model
-
-
Limit your ranges to as small a range as possible
- If you have some guidance as to what parameters might work, don't explore crazy values on the outside of that because that will just yield work that you don't need to be done.
-
Use logarithmic scales when appropriate to explore your parameter space
- if you have a hyper parameter where the values tend to range from something like 0.001 to 0.01 you probably want to try a logarithmic scale for that instead
-
Don't run too many training jobs concurrently
-
This limits how well the process can learn as it goes. Learning relies on the sequential learning overt time.
-
SageMaker Automatic Model Tuning learns as it goes.
-
-
Make sure training jobs running on multiple instances report the correct objective metric in the end
- If you're doing your own training job code, that can be a little bit tricky, you want to make sure that it plays nice with hyper parameter tuning by reporting the objective that you're trying to optimize on in hyper parameter tuning at the end when all those instances come back together.