Stanley Chan's Note🧠

Search

❯

❯

❯

AWS Glue

Aug 16, 20251 min read

AWS Glue

serverless ETL (vs data pipeline using servers - EMR)
crawls data sources and generates the AWS glue data catalog, help data visibility for whole organization
cost effective
source - store: S3, RDS, JDBC, dynamoDB
source - stream: kinesis data stream, apache Kafka
target: S3, RDS, JDBC databases

💡 ***JDBC*** stands for Java ***Database*** Connectivity. ***JDBC*** is a Java API to connect and execute the query with the ***database***.

data catalog

persistent metadata about data sources in region
1 catalog per region per account, avoids data silos
used by amazon athena, redshift, EMR, lake formation

💡 A data silo is **a collection of data held by one group that is not easily or fully accessible by other groups in the same organization**

Graph View

Backlinks

AWS

Created with Quartz v4.2.3 © 2025

GitHub
LinkedIn