DynamoDB

Overview:

  • NoSQL public database-as-a-service (DBaaS) - key/value & document (not suit for RDB)
  • no self-managed servers or infrastructure
  • manual/ automatic provisioned performance IN/OUT or on-demand
  • highly resilient.across AZs and optionally global
  • data are replicated to multiple nodes by default
  • really fast - single-digit milliseconds (SSD based)
  • backup, point-in-time recovery, encryption at rest
  • support event-driven integration, do things when data change
  • don’t support true SQL
  • access via console, CLI, API
  • billed based RCU, WCU, storage & feature
    • i.e. capacity, storage, operation)

Use case:

  • when data is hot and smaller, with high I/O
    • eg sensor data, logs, web session, gaming
  • big but not frequent S3 as data lake
  • Traditional transaction/ structured db RDS

Primary Key:

  • Option 1: Partition Key (HASH)
    • Partition key must be unique for each item
    • Partition key must be “diverse” so that the data is distributed
    • Example: “User_ID” for a users table
  • Option 2: Partition Key + Sort Key (HASH + RANGE)
    • The combination must be unique for each item
    • Data is grouped by partition key
    • Example: users-games table, “User_ID” for Partition Key and “Game_ID” for

Read/ write Capacity:

  • Provisioned mode
    • specify number of reads/ writes per second
    • need to plan capacity beforehand
    • Pay for provisioned read & write capacity units
  • On-Demand mode
    • automatically scale up/down with your workloads
    • Pay for what you use, quite expensive (2x-3x more expensive )

Write Capacity calculation: items per second * item size e.g. 120 item per minutes with item size 2KB = 4 WCU (be careful of WCU) e.g. 6 item per minutes with item size 4.5 KB = 30 WCU (size must rounded up)

Read mode:

  • strongly consistent read
    • If we read just after a write, it’s possible we’ll get some stale data because of replication
  • Eventually Consistent Read
    • If we read just after a write, we will get the correct data
    • Set “ConsistentRead” parameter to True in API calls
    • Consumes twice the RCU

Read Capacity calculation: RCU repsentes 1 strongly consistent read, or 2 eventually consistent read If the items are larger than 4KB, more RCUs are consumed and item size must rounded up to 4 multiplier 10 item per second with item size 4KB (strongly consistent)= 10* 4/4 RCU = 10 RCU 16 item per second with item size 12KB (Eventually Consistent)= 16/2 * 12/4 RCU = 24 RCU 10 item per second with item size 6KB (strongly Consistent)= 10 * 8/4 RCU = 24 RCU

Partitions:

  • dynamoDB store data in partitions,
  • WCU/ RCU will evenly distributed to each partition
  • If hot key/ hot partition is hitting the provisioned ECUs/WCUs, with error: “ProvisionedThroughputExceededException”, throttling is needed
  • Solutions:
    • Exponential backoff from SDK = “error handling strategy for network applications in which a client periodically retries a failed request with increasing delays between requests”
    • Distributed partition keys as much as possible
    • use DynamoDB accelerator (DAX) for RCU problem

API - write

  • PutItem:
    • creates a new item or fully replace an old item (same Primary key)
    • Consumes WCUs
  • UpdateItem::
    • Edits an existing item’s attributes or adds a new item if it doesn’t exist
    • Can be used to implement Atomic Counters – a numeric attribute that’s unconditionally incremented
  • Conditional Writes
    • Accept a write/update/delete only if conditions are met, otherwise returns an error
    • Helps with concurrent access to items
    • No performance impact

API - read (get 1 item)

  • GetItem
    • Read based on Primary key
    • Primary Key can be HASH or HASH+RANGE
    • Eventually Consistent Read (default)
    • Option to use Strongly Consistent Reads (more RCU - might take longer)
    • ProjectionExpression can be specified to retrieve only certain attributes

API - read (query; for specifc partition key) Query returns items based on:

  • KeyConditionExpression
    • Must specify Partition Key (required; must use = operator)
    • May include Sort Key (optional; supports =, <, <=, >, >=, BETWEEN, BEGINS_WITH)
  • FilterExpression (optional)
    • Applies additional filtering after the query operation (before results are returned)
    • Can only use non-key attributes (i.e., not HASH or RANGE keys) Returns
  • Number of items specified in limit, or up to 1MB of data
  • pagination is allowed
  • can query table, add local secondary index/ global secondary index

API - read (Scan)

  • Scan the entire table and then filter out data (inefficient)
  • Returns up to 1 MB of data – use pagination to keep on reading
  • Consumes a lot of RCU
  • Limit impact using Limit or reduce the size of the result and pause
  • For faster performance, use Parallel Scan
    • Multiple workers scan multiple data segments at the same time
    • Increases the throughput and RCU consumed
    • Limit the impact of parallel scans just like you would for Scans
  • Can use ProjectionExpression & FilterExpression (no changes to RCU)

API - Delete

  • DeleteItem - Delete an individual item (conditinoal optional)
  • DeleteTable - Delete a whole table and all its items

API - Batch Operations

  • Allows you to save in latency by reducing the number of API calls
  • Operations are done in parallel for better efficiency
  • art of a batch can fail; in which case we need to try again for the failed items
  • BatchWriteItem
    • Up to 25 PutItem and/or DeleteItem in one call
    • Up to 16 MB of data written, up to 400 KB of data per item
    • Can’t update items (use UpdateItem)
    • UnprocessedItems for failed write operations (exponential backoff or add WCU)
  • BatchGetItem
    • Return items from one or more tables
    • Up to 100 items, up to 16 MB of data
    • Items are retrieved in parallel to minimize latency
    • UnprocessedKeys for failed read operations (exponential backoff or add RCU)

PartiQL

  • SQL-compatible query language for DynamoDB
  • Allows you to select, insert, update, and delete data in DynamoDB using SQL, (no join/ agg)
  • Run queries across multiple DynamoDB tables
  • Run PartiQL queries from
    • AWS Management Console
    • NoSQL Workbench for DynamoDB
    • DynamoDB APIs
    • AWS CLI
    • AWS SDK

Local Secondary Index (LSI)

  • Alternative Sort Key for your table (same Partition Key as that of base table)
  • Up to 5 Local Secondary Indexes per table
  • Must be defined at table creation time
  • Attribute Projections – can contain some or all the attributes of the base table (KEYS_ONLY, INCLUDE, ALL)

Global Secondary Index (GSI)

  • Alternative Primary Key (HASH or HASH+RANGE) from the base table
  • Speed up queries on non-key attributes
  • Attribute Projections – some or all the attributes of the base table (KEYS_ONLY, INCLUDE, ALL)
  • Must provision RCUs & WCUs for the index
  • Can be added/modified after table creation

Indexes and Throttling Global Secondary Index (GSI):

  • If the writes are throttled on the GSI, then the main table will be throttled, Even if the WCU on the main tables are fine!
  • Choose your GSI partition key and Assign your WCU capacity carefully Local Secondary Index (LSI):
  • Uses the WCUs and RCUs of the main table
  • No special throttling considerations

DynamoDB Streams

  • Ordered stream of item-level modifications (create/update/delete) in a table
  • Stream records can be:
    • Sent to Kinesis Data Streams
    • Read by AWS Lambda
    • Read by Kinesis Client Library applications
  • Data Retention for up to 24 hours
  • Use cases:
    • react to changes in real-time (welcome email to users)
    • Analytics
    • Insert into derivative tables
    • Insert into OpenSearch Service
    • Implement cross-region replication

Time To Live (TTL)

  • Automatically delete items after an expiry timestamp
  • Doesn’t consume any WCUs (i.e., no extra cost)
  • The TTL attribute must be a “Number” data type with “Unix Epoch timestamp” value
  • A delete operation for each expired item enters the
  • Use cases: reduce stored data by keeping only current items, adhere to regulatory obligation…