AWS S3

Overall

  • object store but not file or block, cannot use file share, cannot mount it as a drive (C:)

  • great for large scale data storage, distribution, upload

  • good input/ output to many AWS services

  • private by default, which access can be granted by bucket resources policies

    • for multiple account/ anonymous principals
    • specific the “principal” in the policy
    • can block IP with condition
  • Block public access

    • further level of security to block all public access by default
  • Normal access is via AWS APIs

  • Composition

    Buckets:

    • name have to be totally globally unique
    • 3-63 characters, start with a lowercase letter/ number, cannot be IP-like formatted eg 1.1.1.1
    • number buckets: 100 soft limit and 1000 hard limit per account
    • unlimited objects within buckets
    • flat structure (no folders like structure, even though UI show you the structure)

    Object:

    • constructed from a key and value
    • size ranged from 0 bytes to 5 TB
  • Storage Classes (different in frequency & number of copy)

    • object in Glacier is cold, it needs a retrival process where first byte latency from min to hrs:

    Untitled.png

    Untitled 1.png

    Untitled 2.png

    Untitled 3.png

    Untitled 4.png

    Untitled 5.png

    Untitled 6.png

    Untitled 7.png

  • Lifecycle Configuration

    • is a set of rule consisting of transition/expiration actions performed on conditions

    • on bucket or groups of objects with/ without versioning

    • can be moved between storage classes or objects can be deleted automatically based on a schedule

    • a waterfall transition (only going down):

      Untitled 8.png

      • Minimum of 30 days on standard class before move to other classes, if it is originally store in S3 standard
      • Minimum of 30 days on upper classes before move to glacier classes
      • smaller objects can cost more
  • Static website Hosting

    • can access S3 via HTTP (eg. Blogs)
    • website endpoint is created from bucket name/ region, bucket_name doesn’t matters
    • custom domain can be created via route53, but then bucket_name matters

    Out-of-band pages

    • for example, when compute server is down, DSN can be changed and pointed to the static website hosted on S3

    offloading

    • much cheap for static content for website compared to compute server
  • Object versioning

    Untitled 9.png

    • any modification on a object will result in a new version (same name but new ID)

    • new version = latest version = current version

    • deleting a object without ‘show version’ just simply add a delete marker on top of the current version to hide the object, you can undelete it by deleting delete marker.

      Untitled 10.png

    • Use version delete and specify an object ID to actually delete a version of an object

  • MFA Delete

    • enabled in versioning configuration
    • required to change bucket versioning state or delete versions
    • ensures users cannot delete objects from a bucket unless they provide their MFA code
    • can only be turned on by providing serial number + AWS CLI and with versioning turned on
    • only the bucket owner logged in as Root User can DELETE objects from buckets
  • Optimization

    • S3 rely on single put upload (single data stream up to 5GB)
    • stream fails = requires full restart
    • so Multipart upload helps
      • data is broken up to multiple >100MB parts (max 10,000 parts)
      • parts can fail and can be restarted
      • transfer rate = speeds of all parts (because they runs in parallel)
    • transfer acceleration (default to be off)
      • the traffic can be indirect from the user location to where bucket is using the internet
      • To solve this we utilizes CloudFront’s distributed edge location, data being uploaded enter the closest edge location then it send to S3 via the backbone network (like a highway under control of AWS)
      • instead of uploading to your bucket, users use a distinct URL for an Edge location
      • there are restriction to enable it + cannot contain period + needs to be DNS compatible in its naming
  • Encryption On Objects

    • Buckets aren’t encrypted, object do
    • user/app between S3 endpoint are encrypted in-transit
    • Two main type: You can use both Client- and server-side encryption to encrypt your data (you encrypted it from start, S3 encrypted it again)

    Untitled 11.png

    • Client-side encryption:

      • act of encrypting your data locally before you send them to S3 to help ensure its security in transit and at rest
      • encrypted all the time
      • use AWS KMS managed keys or client-side master key
    • Server-side encryption:

      • you control how it process from user to S3 endpoint
      • Encrypted at S3 before saving them on disks in AWS data centers and then decrypts the objects when you download them
    • Type of Server-side encryption (SSE) - how the process happen?

      • 1. server-side encryption with customer-provided keys (SSE-C)

        Untitled 12.png

        • plaintext still encrypted by HTTPS not being visible by external observer
        • manage your key, while retaining the control of the process
        • S3 perform the encryption/ decryption process
      • 2. server-side encryption with Amazon S3-managed keys (SSE-S3) = Default

        Untitled 13.png

        • handle both encryption, key generation and management
        • no management overhead, AWS do the work
        • use AES-256
        • Problem:
          • no control to key
          • no customized rotation control
          • no role separation, admin can view data (GDPR!)
      • 3. Server-side encryption with KMS keys stored in AWS key management service (SSE-KMS) = enhance (1)

        Untitled 14.png

        • logging & auditing the KMS key
          • enable role separation
          • admin can control the object, but without the access to KMS key, he cannot read the object

      Untitled 15.png

  • bucket key

    • each object is put to S3 we need to call AWS KMS API once, it causes problem then we have numerous objects
    • bucket key is a time-limited key that allows KMS key to apply to all object in a bucket, offloading work from KMS
    • but cloudtrail will only records the whole bucket rather than individual object
  • Replication

    • IAM role allow S3 to read object and replicate to destination via SSL
    • In different accounts, bucket policy is also needed to trust destination account
    • Option
      • what object: replication all objects or a subset by filter after you add a replication configuration, with exceptions described in the next section.
      • what storage class: default as same as source
      • ownership: default as source bucket, needed to configure if we want destination account own its
    • Considerations
      • by default & changed be configured
        • Replication only applies to objects that are modified or created after the replication configuration is set up (not retroactive)
        • One-way replication
        • No delete markers
      • both buckets’s versioning needs to be on
      • batch replication can be used to replicate existing objects, need extract configuration
      • source bucket owner needs permission to objects
      • No system events or glacier classes
    • have to wait the file to be replicated to the target bucket before we do any action on the coming new file of the target (no queue)
    • Replication Time control (RTC):
      • make sure the destination and source are in sync as soon as position
      • S3 RTC replicates most objects that you upload to Amazon S3 in seconds, and 99.99 percent of those objects within 15 minutes
    • Cross-region Replication (CRR)
      • for greater durability
      • must have versioning turned on in the source and destination bucket
      • can have CRR to another AWS account
      • why
        • global resilience improvements
        • latency reduction
    • Same-region Replication (SRR)
      • Why
        • Log aggregation
        • PROD & test sync
        • Resilience with strict sovereignty
  • Pre-signed URL

    • if unauthenticated want to access the bucket, it is risky to give them AWS credentials/ make the bucket make public

    Untitled 16.png

    • this gives temporary access to an private object to either upload or download the object data using your credential in a save/ secure way
    • the token is embedded to the URL
    • use AWS CLI or UI to generate Presigned URLs

    Untitled 17.png

    • Noted that
      • Can create a URL for an object you have no access to
      • when using the url, permissions match the identity which generated it right now
      • access denied could mean the generating ID never had access or doesn’t now
      • don’t generate with a role - Presigned URL can have much longer validity period
  • S3 Select and Glacier Select

    • you often want to retrieve the entire object, it takes time and to retrieve the whole file
    • S3/ Glacier select let you use SQL-like statements to retrieve data = pre-filtered = faster + cheaper
    • CSV, JSON, parquet, BZIP2 compression
  • Event Notifications

    • generated when events occur in a bucket with event notification config
    • trigger:
      • object creation, deletion, restore, replication
    • then can be delivered to SNS, SQS, lambda functions
    • EventBridge as alternative service to supports more types of events/ services
  • Access Logs

    • access logging can be able via the console. put bucket logging
    • ACL allows S3 log delivery group do doing on a source bucket
    • log file consisting of log records will put to target bucket
  • Object Lock

    • prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely
    • enabled on new buckets (support req. for existing) & requires versioning
    • write-once-read-many (WORM) - no delete, no overwrite
    • a bucket can have default object lock settings
    • two type
      1. Retention period
        • specify days & years
        • two mode!
          • compliance - can’t be adjusted, deleted, overwritten during the period, even by account root user
          • governance - special permissions can be granted allowing lock settings to be adjusted before the period expires
      2. Legal hold
        • A manual lock that prevents object deletion, regardless of retention period. No expiry.
  • Access points

    • simplify managing access to S3 buckets/Objects via endpoint

    • like a bucket policy

    • rather than bucket policy 1 by 1

      • create many access points
      • each with different policies
      • each with different network access controls
      • each has its own endpoint address
    • created via console or aws s3control create-access-point

      Untitled 18.png

  • S3 transfer acceleration

    • enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket
    • users expect real time access to a stream of images from other users around the worldusers expect real time access to a stream of images from other users around the world