AWS Glue
- serverless ETL (vs data pipeline using servers - EMR)
- crawls data sources and generates the AWS glue data catalog, help data visibility for whole organization
- cost effective
- source - store: S3, RDS, JDBC, dynamoDB
- source - stream: kinesis data stream, apache Kafka
- target: S3, RDS, JDBC databases
data catalog
- persistent metadata about data sources in region
- 1 catalog per region per account, avoids data silos
- used by amazon athena, redshift, EMR, lake formation