Cloud & MLOps ☁️
Modern SageMaker

Modern SageMaker

SageMaker Studio

  • Visual IDE for machine learning

  • Integrates many of the features we're about to cover.

SageMaker Notebooks

  • Create and share Jupyter notebooks with SageMaker Studio

    • Collaborate with other people
  • Switch between hardware configurations (no infrastructure to manage)

    • AWS managed hardware

SageMaker Experiments

Organize, capture, compare, and search your historical ML jobs in one place. One stop place to visualize and interpret the models you might create.

SageMaker Debugger

  • Saves internal model state at periodical intervals

    • Can go back and see the trends as training progresses

    • Gradients / tensors over time as a model is trained

    • Define rules for detecting unwanted conditions while training

      • Want to watch something going out of bounds? Can set a rule for that
    • A debug job is run for each rule you configure

    • Logs & fires a CloudWatch event when the rule is hit

      • Can manage this how we want. E.g. cam dire a notification through SNS to your phone
  • SageMaker Studio Debugger dashboards

    • Integrated with SageMaker studio
  • Auto-generated training reports

  • Built-in rules:

    • Monitor system bottlenecks

    • Profile model framework operations

    • Debug model parameters

Supported Frameworks & Algorithms:

  • Tensorflow

  • PyTorch

  • MXNet

  • XGBoost

  • SageMaker generic estimator (for use with custom training containers)

Debugger API's available in GitHub:

  • Construct hooks & rules for CreateTrainingJob and DescribeTrainingJob API's to do what you want

  • SMDebug client library lets you register hooks for accessing training data

    • So SMDebug is the name of that client library for integrating Sageemaker debugger with your own training code.

Even newer Features in SageMaker debugger

 An example of a SageMaker Studio Debugger dashboard

  • SageMaker Debugger Insights Dashboard

    • lets you see everything in a graphical manner and see what's going on over time during the training process.
  • Debugger ProfilerRule

    • ProfilerReport

    • Hardware system metrics (CPUBottlenck, GPUMemoryIncrease, etc)

    • Framework Metrics (MaxInitializationTime, OverallFrameworkMetrics, StepOutlier)

      • Hyperparameters for the framework used during trianing
  • Built-in actions to receive notifications or stop training in response to a debugger event

    • StopTraining(), Email(), or SMS()

    • In response to Debugger Rules

    • Sends notifications via SNS automatically

  • Profiling system resource usage and training over time

SageMaker Autopilot

Amazon SageMaker Autopilot eliminates the heavy lifting of building ML models. You simply provide a tabular dataset and select the target column to predict, and SageMaker Autopilot will automatically explore different solutions to find the best model. You then can directly deploy the model to production with just one click or iterate on the recommended solutions to further improve the model quality.

How Amazon SageMaker Autopilot works

  • Automates:

    • Algorithm selection

      • there is literally a wizard you can go through that says, my data is here. This is the thing I want to predict. Here's my data that I want to train it on. And it goes off and figures it out for you. It will create an optimized model, figure out what model to use, how to tune it and show you what the results are.
    • Data preprocessing

    • Model tuning

    • All infrastructure

  • It does all the trial and error for you under the hood automatically, all the of parameter tuning, all the experimenting with different model types, and you will automatically get an optimal result.

  • More broadly this is called AutoML

    • This SageMaker Autopilot is just a wrapper on AutoML

SageMaker Autopilot Workflow

  1. Load data from S3 for training

  2. Select your target column for prediction

  3. Automatic model creation

  4. Offers you a Model notebook

    • available for visibility & control and tweaking
  5. Exposes a Model leaderboard

    • Ranked list of recommended models

    • You can pick one

  6. Deploy & monitor the model, refine via notebook if needed

  • Can add in human guidance if you want

  • With or without code in SageMaker Studio or AWS SDK's

    • Does expose the notebook to you if you want to refine it
  • Problem types:

    • Binary classification

    • Multiclass classification

    • Regression

  • Algorithm Types:

    • Linear Learner

    • XGBoost

    • Deep Learning (MLP's)

  • Data must be tabular CSV

Autopilot Explainability & Feature Importance

Amazon SageMaker Autopilot provides an explainability report, generated by Amazon SageMaker Clarify, that makes it easier for you to understand and explain how models created with SageMaker Autopilot make predictions. You can also identify how each attribute in your training data contributes to the predicted result as a percentage. The higher the percentage, the more strongly that feature impacts your model's predictions.

  • Integrates with SageMaker Clarify

  • Transparency on how models arrive at predictions

    • And Biases on those features
  • Feature attribution

    • Uses SHAP Baselines / Shapley Values

    • Research from cooperative game theory

    • Assigns each feature an importance value for a given prediction

Good feature for Bias analysis.

SageMaker Model Monitor

  • Get alerts on quality deviations on your deployed models (via CloudWatch)

    • Things can drift over time

    • Biases can drift over time

    • quality can degrade as the properties of your training data or your real world data coming in change over time.

  • Visualize data drift

    • Example: loan model starts giving people more credit due to drifting or missing input features

    • Maybe incomes are rising due to inflation

    • it allows you to visualize that over time and maybe even alert you if things start to change too much.

  • Detect anomalies & outliers

    • if you're starting to see new anomalies and outliers in your data model model, it can be set up to watch for that over time and alert you when it sees new things
  • Detect new features

    • model monitor will tell you automatically if new features are coming in that you need to think about.
  • No code needed

SageMaker Model Monitor + Clarify

  • Integrates with SageMaker Clarify

  • SageMaker Clarify detects potential bias

  • i.e., imbalances across different groups / ages / income brackets

  • With ModelMonitor, you can monitor for bias and be alerted to new potential bias via CloudWatch

  • SageMaker Clarify also helps explain model behavior

    • Understand which features contribute the most to your predictions

SageMaker Model Monitor Details

  • Data is stored in S3 and secured

    • Use S3 security measures
  • Monitoring jobs are scheduled via a Monitoring Schedule

    • You need to set up what's called a monitoring schedule to run these monitoring jobs over time, repeatedly. That's just another step in deploying them.
  • Metrics are emitted to CloudWatch

    • CloudWatch notifications can be used to trigger alarms

    • You'd then take corrective action (retrain the model, audit the data)

  • Integrates with Tensorboard, QuickSight, Tableau

  • Or just visualize within SageMaker Studio

Monitoring Types

  • Drift in data quality

    • the statistical properties of the features coming in. So, you know, the mean, the standard deviation, the min, the max,

    • Relative to a baseline you create when you create your model monitoring job

    • "Quality" is just statistical properties of the features

  • Drift in model quality (accuracy, etc) (as opposed to data quality)

    • Works the same way with a model quality baseline

    • Can integrate with Ground Truth labels

      • can see what humans are saying vs the model for classifications and if it is diverging
    • Recall, precision, RSME

  • Bias drift

  • Feature attribution drift

    • if we're seeing more drift in, what features are attributed to your predictions, similar to bias, can see that too

    • Based on Normalized Discounted Cumulative Gain (NDCG) score

    • This compares feature ranking of training vs. live data

Putting them together

Other New Features

SageMaker JumpStart

  • One-click models and algorithms from model zoos

  • Over 150 open source models in NLP, object detections, image classification, etc.

  • they've collected over 150 open source models in GitHub that cover things in natural language processing, object detection, image classification and a bunch of other stuff as well.

  • So with one click, you can choose an existing model that people know to work in a specific situation and just get started

SageMaker Data Wrangler

  • Pre-process your data in SageMaker

  • Import / transform / analyze / export data within SageMaker Studio

  • Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. Using SageMaker Data Wrangler's data selection tool, you can choose the data you want from various data sources and import it with a single click. Once data is imported, you can use the data quality and insights report to automatically verify data quality and detect abnormalities, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. With SageMaker Data Wrangler's visualization templates, you can quickly preview and inspect that these transformations are completed as you intended by viewing them in Amazon SageMaker Studio, the first fully integrated development environment (IDE) for ML. Once your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines and save them for reuse in the Amazon SageMaker Feature Store.

How Amazon SageMaker Data Wrangler works

SageMaker Feature Store

  • Find, discover, and share features in Studio

  • Online (low latency for real-time predictions) or offline (for training or batch inference) modes

  • Features organized into Feature Groups

  • Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used repeatedly by multiple teams and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it's hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store for feature use across the ML lifecycle.

SageMaker Edge Manager

  • Amazon SageMaker Edge enables machine learning on edge devices by optimizing, securing, and deploying models to the edge, and then monitoring these models on your fleet of devices, such as smart cameras, robots, and other smart-electronics, to reduce ongoing operational costs. Customers who train models in TensorFlow, MXNet, PyTorch, XGBoost, and TensorFlow Lite can use SageMaker Edge to improve their performance, deploy them on edge devices, and monitor their health throughout their lifecycle.

  • SageMaker Edge Compiler optimizes the trained model to be executable on an edge device. SageMaker Edge includes an over-the-air (OTA) deployment mechanism that helps you deploy models on the fleet independent of the application or device firmware. SageMaker Edge Agent allows you to run multiple models on the same device. The Agent collects prediction data based on the logic that you control, such as intervals, and uploads it to the cloud so that you can periodically retrain your models over time. SageMaker Edge cryptographically signs your models so you can verify that it was not tampered with as it moves from the cloud to edge devices.

  • Software agent for edge devices

    • E.g., Car making its own predictions
  • Model optimized with SageMaker Neo

  • Collects and samples data for monitoring, labeling, retraining

SageMaker Canvas

  • No-code machine learning for business analysts

    • Not data scientists
  • Upload csv data (csv only for now), select a column to predict, build it, and make predictions

  • Can also join datasets

  • Classification or regression

    • Only
  • Automatic data cleaning

    • Missing values

    • Outliers

    • Duplicates

  • Share models & datasets with SageMaker Studio

The Finer Points

  • Local file uploading must be configured "by your IT administrator."

    • Set up an S3 bucket with appropriate CORS permissions

    • you'll need to set up an S3 bucket somewhere with the appropriate permissions and course permissions so that when end users upload their data to SageMaker Canvas, it has a place to go in S3 under the hood.

  • Can integrate with Okta SSO

    • Sign in
  • Canvas lives within a SageMaker Domain that must be manually updated

    • updates need to be applied to that domain by hand. So you're not gonna automatically get updates to SageMaker Canvas within an existing domain unless you go back and do that explicitly

    • IT admin Task

  • Import from Redshift can be set up

  • Time series forecasting must be enabled via IAM

    • explicitly enable that
  • Can run within a VPC

  • Pricing is $1.90/hr plus a charge based on number of training cells in a model

SageMaker Training Compiler

State-of-the-art deep learning (DL) models consist of complex multi-layered neural networks with billions of parameters that can take thousands of GPU hours to train. Optimizing such models on training infrastructure requires extensive knowledge of DL and systems engineering; this is challenging even for narrow use cases. Although there are open-source implementations of compilers that optimize the DL training process, they can lack the flexibility to integrate DL frameworks with some hardware such as GPU instances.

SageMaker Training Compiler is a capability of SageMaker that makes these hard-to-implement optimizations to reduce training time on GPU instances. The compiler optimizes DL models to accelerate training by more efficiently using SageMaker machine learning (ML) GPU instances. SageMaker Training Compiler is available at no additional charge within SageMaker and can help reduce total billable time as it accelerates training.

SageMaker Training Compiler is integrated into the AWS Deep Learning Containers (DLCs). Using the SageMaker Training Compiler-enabled AWS DLCs, you can compile and optimize training jobs on GPU instances with minimal changes to your code. Bring your deep learning models to SageMaker and enable SageMaker Training Compiler to accelerate the speed of your training job on SageMaker ML instances for accelerated computing.

  • Integrated into AWS Deep Learning Containers (DLCs)

    • Can't bring your own container
  • Compile & optimize training jobs specifically on GPU instances

  • Can accelerate training up to 50%

    • Dependent on how parameters are set
  • Converts models into hardware-optimized instructions under the hood.

  • Tested with Hugging Face transformers library, or bring your own model

    • Can't guarantee your results in that case
  • Incompatible with SageMaker distributed training libraries

  • Best practices:

    • Ensure GPU instances are used (ml.p3, ml.p4)

    • PyTorch models must use PyTorch/XLA's model save function

    • Enable debug flag in compiler_config parameter to enable debugging