AWS Glue

Untitled.png

  • serverless ETL (vs data pipeline using servers - EMR)
  • crawls data sources and generates the AWS glue data catalog, help data visibility for whole organization
  • cost effective
  • source - store: S3, RDS, JDBC, dynamoDB
  • source - stream: kinesis data stream, apache Kafka
  • target: S3, RDS, JDBC databases

data catalog

  • persistent metadata about data sources in region
  • 1 catalog per region per account, avoids data silos
  • used by amazon athena, redshift, EMR, lake formation