Cloud & MLOps ☁️
Amazon SageMaker

Amazon SageMaker

SageMaker is built to handle the entire machine learning workflow:

SageMaker Training & Deployment. Architecturally, this is the idea:

Architecturally, this is the idea, so let's start at the bottom here. When we're doing our training, our training data that we've already prepared, we'll be sitting in an S3 bucket somewhere, and Sagemaker's job is to go out there and provision a bunch of training hosts to actually do that training on.

Now the code that it uses, the actual model itself comes from a docker image, that's registered in an elastic container registry. So we'll take that training code from a docker image, deploy that out to a fleet of hosts to actually do that training, and get that training data from S3. When it's done, it will save that trained model and any artifacts from it, also the S3. At this point, we're ready to deploy that model and actually put it out there in production, right, so at this point we're also gonna have some docker image in ECR, that's the inference code. It's potentially a lot simpler, it's only job is to take in incoming requests and use that save model to actually make inferences based on that request. So it pulls in our inference code from ECR. Again, it will spin up as many hosts as it needs to actually provide endpoints and serve those requests coming in, and it will spin up end points as well that we can use to communicate with the outside world.

So now we might have some client application that's sending in requests to our model, and that endpoint will then very quickly make those predictions and send them back. You know, for example, maybe you have a client that's taking pictures and we want to know what's in the picture, that might say hey, end point, here's a picture, tell me what's in it. It would then refer to that inference code and the train model artifacts that we have to say, okay, I think it's a picture of a cat, and send that back to the client application as it's just one of many examples.

Whether it's your own code, a built-in algorithm from SageMaker, or a model you've purchased in the marketplace - all training code deployed to SageMaker training instances come from ECR.

SageMaker Notebooks can direct the process

There are a couple of ways to work with Sagemaker, probably the most common is by using the Sagemaker notebook. And it's just a notebook instance running on an EC2 instance somewhere, that you specify, and you spin these up from the console and it's very easy to use as we'll see. Your Sagemaker notebook has access to S3, so it can actually access its training and validation data there or whatever else you need. You can do things like use scikit learn or pyspark or tensor flow within them, if you want to, and it has access to a wide variety of built-in models.

You can also spin up those training instances from within your notebook. So, within your very notebook there you can say, go spin up a whole fleet of servers that are dedicated specialized machine learning hosts to execute that training on. And when your training is done and saved to S3, it also, from the notebooks, says, okay, deploy that model to a whole fleet of endpoints and allow me to make predictions at large scale.

And you can even say from the notebook, go ahead and like do a automated hyper parameter tuning job to actually try different parameters on my model, and try to find the ideal set of parameters to make that model work as well as possible.

All this can be done from within a notebook. You can also do a lot of this from the Sagemaker console as well. The notebook obviously gives you more flexibility, because you can actually write code there, but sometimes you'll use them together. So a pretty common thing is to kick off a training job or a hyper parameter tuning job from within your notebook, then switch back to the console and just keep an eye on it, see how well it's doing.

Let's talk about the data preparation stage and how that interacts with Sagemaker. So again, Sagemaker expects your data to come from S3 somewhere, so we kind of assume that you've already prepared using some other means, if you need to, the format it expects will vary with the algorithm with the actual training code that you're deploying from ECR.

For the built-in algorithms, that's often record IO/protobuf format, which is just a data format that's very well suited as the input to deep learning and other machine learning models. But usually, these algorithms will also just take straight up CSV data or whatever you might have. But record IO, protobuf will usually be a lot more efficient if you can get your data into that format, and you can do that pre-processing within your Sagemaker notebook if you want to - that's fine.

You can also integrate spark with Sagemaker which is pretty cool, so if you want to use Apache Spark to pre-process your data at a massive scale, you can actually use Sage Maker within spark; and we'll see an example of that later on in the course.

You also have the usual tools at your disposal that you can use within the Jupiyer notebooks, scikit learn, numpy, pandas etc. if you want to use that to slice and dice and manipulate your data before you actually feed it into your training job, that's totally fine.

  • Once you're ready to train, you'll just create a training job either from the console or from your notebook, all it needs is :
    • URL of S3 bucket with training data
    • ML compute resources
      • they could GPU nodes like P2s, or P3s
    • URL of S3 bucket for output
    • ECR path to training code
  • Training options
    • Built-in training algorithms
    • Spark MLLib
    • Custom Python Tensorflow / MXNet code
    • Your own Docker image
    • Algorithm purchased from AWS marketplace

There are also algorithms you can purchase from the AWS marketplace where you can purchase access to a docker image that contains a Sagemaker training algorithm if you want to as well, once your model is trained, you need to deploy it - and again - this can just be done from a notebook. You'll save the trained model to S3 somewhere and at that point, there's two things you can do with that train model.

Batch Transform

Use batch transform when you need to do the following:

  1. Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.
  2. Get inferences from large datasets.
  3. Run inference when you don't need a persistent endpoint.
  4. Associate input records with inferences to assist the interpretation of results.

Supports two types of predictions:

  1. Real time prediction or sync predictions and needs a persistent end point. It is supported now in two regions: in US East and EU regions.
  2. Batch prediction and do not need a persistent end point

Deploying Trained Models

  • Save your trained model to S3
  • Can deploy two ways:
    • Persistent endpoint for making individual predictions on demand
      • spin up a fleet of persistent endpoints to make individual inferences and predictions on demand from some sort of external application
    • SageMaker Batch Transform to get predictions for an entire dataset
      • batch transforms if you know, you just have an existing set of observations that you want to make predictions for on mass
  • Lots of cool options
  • Inference Pipelines for more complex processing
    • If you need to chain different steps together as you're doing your inferences
  • SageMaker Neo for deploying to edge devices
  • Elastic Inference for accelerating deep learning models
    • accelerate how quickly that deployed model actually comes back by having dedicated instance types that are just made for accelerating that
  • Automatic scaling (increase No. of endpoints as needed)
    • can automatically scale up and down the number of endpoints you have as needed