AWS S3

Overall

object store but not file or block, cannot use file share, cannot mount it as a drive (C:)
great for large scale data storage, distribution, upload
good input/ output to many AWS services
private by default, which access can be granted by bucket resources policies
- for multiple account/ anonymous principals
- specific the “principal” in the policy
- can block IP with condition
Block public access
- further level of security to block all public access by default
Normal access is via AWS APIs
Composition

Buckets:
- name have to be totally globally unique
- 3-63 characters, start with a lowercase letter/ number, cannot be IP-like formatted eg 1.1.1.1
- number buckets: 100 soft limit and 1000 hard limit per account
- unlimited objects within buckets
- flat structure (no folders like structure, even though UI show you the structure)
Object:
- constructed from a key and value
- size ranged from 0 bytes to 5 TB
Storage Classes (different in frequency & number of copy)
- object in Glacier is cold, it needs a retrival process where first byte latency from min to hrs:
Untitled.png

Untitled 1.png

Untitled 2.png

Untitled 3.png

Untitled 4.png

Untitled 5.png

Untitled 6.png

Untitled 7.png
Lifecycle Configuration
- is a set of rule consisting of transition/expiration actions performed on conditions
- on bucket or groups of objects with/ without versioning
- can be moved between storage classes or objects can be deleted automatically based on a schedule
- a waterfall transition (only going down):
  
  Untitled 8.png
  - Minimum of 30 days on standard class before move to other classes, if it is originally store in S3 standard
  - Minimum of 30 days on upper classes before move to glacier classes
  - smaller objects can cost more
Static website Hosting
- can access S3 via HTTP (eg. Blogs)
- website endpoint is created from bucket name/ region, bucket_name doesn’t matters
- custom domain can be created via route53, but then bucket_name matters
Out-of-band pages
- for example, when compute server is down, DSN can be changed and pointed to the static website hosted on S3
offloading
- much cheap for static content for website compared to compute server
Object versioning

Untitled 9.png
- any modification on a object will result in a new version (same name but new ID)
- new version = latest version = current version
- deleting a object without ‘show version’ just simply add a delete marker on top of the current version to hide the object, you can undelete it by deleting delete marker.
  
  Untitled 10.png
- Use version delete and specify an object ID to actually delete a version of an object
MFA Delete
- enabled in versioning configuration
- required to change bucket versioning state or delete versions
- ensures users cannot delete objects from a bucket unless they provide their MFA code
- can only be turned on by providing serial number + AWS CLI and with versioning turned on
- only the bucket owner logged in as Root User can DELETE objects from buckets
Optimization
- S3 rely on single put upload (single data stream up to 5GB)
- stream fails = requires full restart
- so Multipart upload helps
  - data is broken up to multiple >100MB parts (max 10,000 parts)
  - parts can fail and can be restarted
  - transfer rate = speeds of all parts (because they runs in parallel)
- transfer acceleration (default to be off)
  - the traffic can be indirect from the user location to where bucket is using the internet
  - To solve this we utilizes CloudFront’s distributed edge location, data being uploaded enter the closest edge location then it send to S3 via the backbone network (like a highway under control of AWS)
  - instead of uploading to your bucket, users use a distinct URL for an Edge location
  - there are restriction to enable it + cannot contain period + needs to be DNS compatible in its naming
Encryption On Objects
- Buckets aren’t encrypted, object do
- user/app between S3 endpoint are encrypted in-transit
- Two main type: You can use both Client- and server-side encryption to encrypt your data (you encrypted it from start, S3 encrypted it again)
Untitled 11.png
- Client-side encryption:
  - act of encrypting your data locally before you send them to S3 to help ensure its security in transit and at rest
  - encrypted all the time
  - use AWS KMS managed keys or client-side master key
- Server-side encryption:
  - you control how it process from user to S3 endpoint
  - Encrypted at S3 before saving them on disks in AWS data centers and then decrypts the objects when you download them
- Type of Server-side encryption (SSE) - how the process happen?
  - 1. server-side encryption with customer-provided keys (SSE-C)
    
    Untitled 12.png
    - plaintext still encrypted by HTTPS not being visible by external observer
    - manage your key, while retaining the control of the process
    - S3 perform the encryption/ decryption process
  - 2. server-side encryption with Amazon S3-managed keys (SSE-S3) = Default
    
    Untitled 13.png
    - handle both encryption, key generation and management
    - no management overhead, AWS do the work
    - use AES-256
    - Problem:
      - no control to key
      - no customized rotation control
      - no role separation, admin can view data (GDPR!)
  - 3. Server-side encryption with KMS keys stored in AWS key management service (SSE-KMS) = enhance (1)
    
    Untitled 14.png
    - logging & auditing the KMS key
      - enable role separation
      - admin can control the object, but without the access to KMS key, he cannot read the object
  Untitled 15.png
bucket key
- each object is put to S3 we need to call AWS KMS API once, it causes problem then we have numerous objects
- bucket key is a time-limited key that allows KMS key to apply to all object in a bucket, offloading work from KMS
- but cloudtrail will only records the whole bucket rather than individual object
Replication
- IAM role allow S3 to read object and replicate to destination via SSL
- In different accounts, bucket policy is also needed to trust destination account
- Option
  - what object: replication all objects or a subset by filter after you add a replication configuration, with exceptions described in the next section.
  - what storage class: default as same as source
  - ownership: default as source bucket, needed to configure if we want destination account own its
- Considerations
  - by default & changed be configured
    - Replication only applies to objects that are modified or created after the replication configuration is set up (not retroactive)
    - One-way replication
    - No delete markers
  - both buckets’s versioning needs to be on
  - batch replication can be used to replicate existing objects, need extract configuration
  - source bucket owner needs permission to objects
  - No system events or glacier classes
- have to wait the file to be replicated to the target bucket before we do any action on the coming new file of the target (no queue)
- Replication Time control (RTC):
  - make sure the destination and source are in sync as soon as position
  - S3 RTC replicates most objects that you upload to Amazon S3 in seconds, and 99.99 percent of those objects within 15 minutes
- Cross-region Replication (CRR)
  - for greater durability
  - must have versioning turned on in the source and destination bucket
  - can have CRR to another AWS account
  - why
    - global resilience improvements
    - latency reduction
- Same-region Replication (SRR)
  - Why
    - Log aggregation
    - PROD & test sync
    - Resilience with strict sovereignty
Pre-signed URL
- if unauthenticated want to access the bucket, it is risky to give them AWS credentials/ make the bucket make public
Untitled 16.png
- this gives temporary access to an private object to either upload or download the object data using your credential in a save/ secure way
- the token is embedded to the URL
- use AWS CLI or UI to generate Presigned URLs
Untitled 17.png
- Noted that
  - Can create a URL for an object you have no access to
  - when using the url, permissions match the identity which generated it right now
  - access denied could mean the generating ID never had access or doesn’t now
  - don’t generate with a role - Presigned URL can have much longer validity period
S3 Select and Glacier Select
- you often want to retrieve the entire object, it takes time and to retrieve the whole file
- S3/ Glacier select let you use SQL-like statements to retrieve data = pre-filtered = faster + cheaper
- CSV, JSON, parquet, BZIP2 compression
Event Notifications
- generated when events occur in a bucket with event notification config
- trigger:
  - object creation, deletion, restore, replication
- then can be delivered to SNS, SQS, lambda functions
- EventBridge as alternative service to supports more types of events/ services
Access Logs
- access logging can be able via the console. put bucket logging
- ACL allows S3 log delivery group do doing on a source bucket
- log file consisting of log records will put to target bucket
Object Lock
- prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely
- enabled on new buckets (support req. for existing) & requires versioning
- write-once-read-many (WORM) - no delete, no overwrite
- a bucket can have default object lock settings
- two type
  1. Retention period
    - specify days & years
    - two mode!
      - compliance - can’t be adjusted, deleted, overwritten during the period, even by account root user
      - governance - special permissions can be granted allowing lock settings to be adjusted before the period expires
  2. Legal hold
    - A manual lock that prevents object deletion, regardless of retention period. No expiry.
Access points
- simplify managing access to S3 buckets/Objects via endpoint
- like a bucket policy
- rather than bucket policy 1 by 1
  - create many access points
  - each with different policies
  - each with different network access controls
  - each has its own endpoint address
- created via console or “aws s3control create-access-point”
  
  Untitled 18.png
S3 transfer acceleration
- enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket
- users expect real time access to a stream of images from other users around the worldusers expect real time access to a stream of images from other users around the world

Stanley Chan's Note🧠

Explorer

AWS S3

AWS S3

Graph View

Backlinks