Redshift

petabyte-scale data warehouse
OLAP (column based), not OLTP (row based like RDS/ transaction) ⇒ good for aggregated/ analysis
pay as you use
direct query S3 using redshift spectrum without loading data
direct query other DBs using federated query without loading data
integrates with AWS tooling such as Quicksight
SQL-like interface JDBC/ODBC connection
Redshift does not support table partitioning

Architecture

Untitled.png

server based (not serverless like Athena)
1 AZ in a VPC, not HA by design
run multiple nodes
- leader node - query input, planning and aggregation
- compute nodes - performing queries of data
can configure VPC security, IAM permissions, KMS at rest encryption, CE monitoring
can enable Amazon Redshift enhanced VPC routing (forces the traffic to go through your VPC but not public network; for customized network)

DR and Resilience

Redshift Spectrum (S3 data shown in redshift as table)

Distribution styles

VACUUM command

VACUUM DELETE ONLY: Remove deleted rows only (no sorting) (Redshift only do a soft delete)
VACUUM SORT ONLY: Sort rows without removing deleted rows (keep sort sort order sharp)
VACUUM FULL Does both delete cleanup and sorting (default)
VACUUM REINDEX Rebuilds interleaved sort keys (keep the index effective from skew), followed by a VACUUM FULL

System table:

STL_ALERT_EVENT_LOG records any alerts/notifications related to queries or user-defined performance thresholds. This would capture optimizer alerts about potential performance issues.
STL_PLAN_INFO provides detailed info on execution plans. The optimizer statistics and warnings provide insight into problematic query plans.
STL_USAGE_CONTROL limits user activity but does not log anomalies.
STL_QUERY_METRICS has execution stats but no plan diagnostics.

Importing / Exporting data

COPY command
- Parallelized; efficient
- From S3, EMR, DynamoDB, remote hosts
- S3 requires a manifest file and IAM role
UNLOAD command
- Unload from a table into files in S3
Enhanced VPC routing
- force all copy and unload traffic via amazon vpc
Auto-copy from Amazon S3
Amazon Aurora zero-ETL integration
- Auto replication from Aurora → Redshift
Redshift Streaming Ingestion
- From Kinesis Data Streams or MSK

Security

Stanley Chan's Note🧠