AWS S3
Overall
-
object store but not file or block, cannot use file share, cannot mount it as a drive (C:)
-
great for large scale data storage, distribution, upload
-
good input/ output to many AWS services
-
private by default, which access can be granted by bucket resources policies
- for multiple account/ anonymous principals
- specific the “principal” in the policy
- can block IP with condition
-
Block public access
- further level of security to block all public access by default
-
Normal access is via AWS APIs
-
Composition
Buckets:
- name have to be totally globally unique
- 3-63 characters, start with a lowercase letter/ number, cannot be IP-like formatted eg 1.1.1.1
- number buckets: 100 soft limit and 1000 hard limit per account
- unlimited objects within buckets
- flat structure (no folders like structure, even though UI show you the structure)
Object:
- constructed from a key and value
- size ranged from 0 bytes to 5 TB
-
Storage Classes (different in frequency & number of copy)
- object in Glacier is cold, it needs a retrival process where first byte latency from min to hrs:
-
Lifecycle Configuration
-
is a set of rule consisting of transition/expiration actions performed on conditions
-
on bucket or groups of objects with/ without versioning
-
can be moved between storage classes or objects can be deleted automatically based on a schedule
-
a waterfall transition (only going down):
- Minimum of 30 days on standard class before move to other classes, if it is originally store in S3 standard
- Minimum of 30 days on upper classes before move to glacier classes
- smaller objects can cost more
-
-
Static website Hosting
- can access S3 via HTTP (eg. Blogs)
- website endpoint is created from bucket name/ region, bucket_name doesn’t matters
- custom domain can be created via route53, but then bucket_name matters
Out-of-band pages
- for example, when compute server is down, DSN can be changed and pointed to the static website hosted on S3
offloading
- much cheap for static content for website compared to compute server
-
Object versioning
-
any modification on a object will result in a new version (same name but new ID)
-
new version = latest version = current version
-
deleting a object without ‘show version’ just simply add a delete marker on top of the current version to hide the object, you can undelete it by deleting delete marker.
-
Use version delete and specify an object ID to actually delete a version of an object
-
-
MFA Delete
- enabled in versioning configuration
- required to change bucket versioning state or delete versions
- ensures users cannot delete objects from a bucket unless they provide their MFA code
- can only be turned on by providing serial number + AWS CLI and with versioning turned on
- only the bucket owner logged in as Root User can DELETE objects from buckets
-
Optimization
- S3 rely on single put upload (single data stream up to 5GB)
- stream fails = requires full restart
- so Multipart upload helps
- data is broken up to multiple >100MB parts (max 10,000 parts)
- parts can fail and can be restarted
- transfer rate = speeds of all parts (because they runs in parallel)
- transfer acceleration (default to be off)
- the traffic can be indirect from the user location to where bucket is using the internet
- To solve this we utilizes CloudFront’s distributed edge location, data being uploaded enter the closest edge location then it send to S3 via the backbone network (like a highway under control of AWS)
- instead of uploading to your bucket, users use a distinct URL for an Edge location
- there are restriction to enable it + cannot contain period + needs to be DNS compatible in its naming
-
Encryption On Objects
- Buckets aren’t encrypted, object do
- user/app between S3 endpoint are encrypted in-transit
- Two main type: You can use both Client- and server-side encryption to encrypt your data (you encrypted it from start, S3 encrypted it again)
-
Client-side encryption:
- act of encrypting your data locally before you send them to S3 to help ensure its security in transit and at rest
- encrypted all the time
- use AWS KMS managed keys or client-side master key
-
Server-side encryption:
- you control how it process from user to S3 endpoint
- Encrypted at S3 before saving them on disks in AWS data centers and then decrypts the objects when you download them
-
Type of Server-side encryption (SSE) - how the process happen?
-
1. server-side encryption with customer-provided keys (SSE-C)
- plaintext still encrypted by HTTPS not being visible by external observer
- manage your key, while retaining the control of the process
- S3 perform the encryption/ decryption process
-
2. server-side encryption with Amazon S3-managed keys (SSE-S3) = Default
- handle both encryption, key generation and management
- no management overhead, AWS do the work
- use AES-256
- Problem:
- no control to key
- no customized rotation control
- no role separation, admin can view data (GDPR!)
-
3. Server-side encryption with KMS keys stored in AWS key management service (SSE-KMS) = enhance (1)
- logging & auditing the KMS key
- enable role separation
- admin can control the object, but without the access to KMS key, he cannot read the object
- logging & auditing the KMS key
-
-
bucket key
- each object is put to S3 we need to call AWS KMS API once, it causes problem then we have numerous objects
- bucket key is a time-limited key that allows KMS key to apply to all object in a bucket, offloading work from KMS
- but cloudtrail will only records the whole bucket rather than individual object
-
Replication
- IAM role allow S3 to read object and replicate to destination via SSL
- In different accounts, bucket policy is also needed to trust destination account
- Option
- what object: replication all objects or a subset by filter after you add a replication configuration, with exceptions described in the next section.
- what storage class: default as same as source
- ownership: default as source bucket, needed to configure if we want destination account own its
- Considerations
- by default & changed be configured
- Replication only applies to objects that are modified or created after the replication configuration is set up (not retroactive)
- One-way replication
- No delete markers
- both buckets’s versioning needs to be on
- batch replication can be used to replicate existing objects, need extract configuration
- source bucket owner needs permission to objects
- No system events or glacier classes
- by default & changed be configured
- have to wait the file to be replicated to the target bucket before we do any action on the coming new file of the target (no queue)
- Replication Time control (RTC):
- make sure the destination and source are in sync as soon as position
- S3 RTC replicates most objects that you upload to Amazon S3 in seconds, and 99.99 percent of those objects within 15 minutes
- Cross-region Replication (CRR)
- for greater durability
- must have versioning turned on in the source and destination bucket
- can have CRR to another AWS account
- why
- global resilience improvements
- latency reduction
- Same-region Replication (SRR)
- Why
- Log aggregation
- PROD & test sync
- Resilience with strict sovereignty
- Why
-
Pre-signed URL
- if unauthenticated want to access the bucket, it is risky to give them AWS credentials/ make the bucket make public
- this gives temporary access to an private object to either upload or download the object data using your credential in a save/ secure way
- the token is embedded to the URL
- use AWS CLI or UI to generate Presigned URLs
- Noted that
- Can create a URL for an object you have no access to
- when using the url, permissions match the identity which generated it right now
- access denied could mean the generating ID never had access or doesn’t now
- don’t generate with a role - Presigned URL can have much longer validity period
-
S3 Select and Glacier Select
- you often want to retrieve the entire object, it takes time and to retrieve the whole file
- S3/ Glacier select let you use SQL-like statements to retrieve data = pre-filtered = faster + cheaper
- CSV, JSON, parquet, BZIP2 compression
-
Event Notifications
- generated when events occur in a bucket with event notification config
- trigger:
- object creation, deletion, restore, replication
- then can be delivered to SNS, SQS, lambda functions
- EventBridge as alternative service to supports more types of events/ services
-
Access Logs
- access logging can be able via the console. put bucket logging
- ACL allows S3 log delivery group do doing on a source bucket
- log file consisting of log records will put to target bucket
-
Object Lock
- prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely
- enabled on new buckets (support req. for existing) & requires versioning
- write-once-read-many (WORM) - no delete, no overwrite
- a bucket can have default object lock settings
- two type
- Retention period
- specify days & years
- two mode!
- compliance - can’t be adjusted, deleted, overwritten during the period, even by account root user
- governance - special permissions can be granted allowing lock settings to be adjusted before the period expires
- Legal hold
- A manual lock that prevents object deletion, regardless of retention period. No expiry.
- Retention period
-
Access points
-
simplify managing access to S3 buckets/Objects via endpoint
-
like a bucket policy
-
rather than bucket policy 1 by 1
- create many access points
- each with different policies
- each with different network access controls
- each has its own endpoint address
-
created via console or “aws s3control create-access-point”
-
-
S3 transfer acceleration
- enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket
- users expect real time access to a stream of images from other users around the worldusers expect real time access to a stream of images from other users around the world