Data Engineering
Here's a quick summary of all the services mentioned in this section:
- Amazon S3: Object Storage for your data
- VPC Endpoint Gateway: Privately access your S3 bucket without going through the public internet
- Kinesis Data Streams: real-time data streams, need capacity planning, real-time applications
- Kinesis Data Firehose: near real-time data ingestion to S3, Redshift, ElasticSearch, Splunk
- Kinesis Data Analytics: SQL transformations on streaming data
- Kinesis Video Streams: real-time video feeds
- Glue Data Catalog & Crawlers: Metadata repositories for schemas and datasets in your account
- Glue ETL: ETL Jobs as Spark programs, run on a serverless Spark Cluster
- DynamoDB: NoSQL store
- Redshift: Data Warehousing for OLAP, SQL language
- Redshift Spectrum: Redshift on data in S3 (without the need to load it first in Redshift)
- RDS / Aurora: Relational Data Store for OLTP, SQL language
- ElasticSearch: index for your data, search capability, clickstream analytics
- ElastiCache: data cache technology
- Data Pipelines: Orchestration of ETL jobs between RDS, DynamoDB, S3. Runs on EC2 instances
- Batch: batch jobs run as Docker containers - not just for data, manages EC2 instances for you
- DMS: Database Migration Service, 1-to-1 CDC replication, no ETL
- Step Functions: Orchestration of workflows, audit, retry mechanisms
Briefly mentioned, covered later:
- EMR: Managed Hadoop Clusters
- Quicksight: Visualization Tool
- Rekognition: ML Service
- SageMaker: ML Service
- DeepLens: camera by Amazon
- Athena: Serverless Query of your data