AWS S3 Storage

Created OnAugust 22, 2022

Last Updated OnOctober 13, 2022

byAdmin

S3 is the AWS object storage system. S3 stores objects in flat volumes or containers called buckets, rather than a hierarchical file system. There are no file directories as such in S3!

Buckets must have globally unique name – across ALL accounts of AWS – not just your own!

defined at region level – important!

bucket naming convention – must know this for exam!

no uppercase, no underscore, 3-63 chars long,

must start with lowercase letter or number

must not be an ip address

objects:

are files, must have a key – the full path to the file

s3://my-bucket/my_file.txt -> my_file.txt is the key

you can add “folder” names but they are just a prefix ie tag not a real file directory system

so if you have

s3://my-bucket/my_folder/my_file.txt

then /my_folder/my_file.txt

is the object key for this file

object max size is 5TB but you can only upload 5GB in one go, to upload larger objects, you have to use multi-part upload

metadata can be added, also tags

and version id system if enabled

you can block public access if you want for a bucket

you receive an ARN or amazon resource name for the bucket

2 ways to open an S3

in the console click on object action and open

or via the url public object url

but for this you must set the permission for access to the bucket

bucket must be public access

pre-signed url – you give the client temp credentials to open the file

you can version your files, but have to enable versioning at the bucket level. Best practice is to use versioning, this protects against unintended deletes and you can roll back.

Any files existing before versioning is enabled will not have previous versions available.

if you click on objects -> list versions, you will see the available versions listed.

you can easily roll back in this way. So versioning should be enabled!

S3 Encryption

Exam question!

Know the 4 methods of encryption for S3:

SSE-S3 – encrypts S3 objects using keys managed by AWS

SSE-KMS – uses AWS key management service to manage the keys

SSE-C – to manage own keys yourself

Client-Side Encryption

important to know which is best suited for each situation!

SSE-S3

managed by AWS S3

object encrypted server side

uses AES-256 algorithm

must set header “x-amx server-side-encryption”: “AES256”

SSE-KMS

managed by AWS KMS

gives you control over users and an audit trail

object is encrypted server side

set header “x-amx server-side-encryption”: “aws:kms”

SSE-C

server side with your own keys outside of aws

so s3 does NOT store the key

https has to be used for this as you will be sending your key

the actual client side data encryption key in every http header

s3 then uses the key to encrypt the data on s3 at the bucket.

Client Side Encryption

this happens before transmitting data to s3 at the client side

and decryption on the client side.

customer fully manages the keys and encryption/decryption.

there is a client library called aws s3 encryption client which you can use on your clients.

encryption in transit ssl or tls

https is “in flight” – you always use https for your in flight: mandatory

uses ssl or tls certificates

S3 Security

User-based security

first there is user-based

uses IAM policies, which api calls are allowed from a specific user

Resource-based security

sets bucket-wide rules from S3 console, allows cross-account

not very common (NOT in exam)

oacl – object access control list acl – this is finer grain

bacl – bucket access control list acl – this is less common

IAM principal can access an S3 object if

user iam permissions allow it or the resource policy allows it

and no explicit deny exists

S3 Bucket Policies

they are json based policies

actions: they allow a set of api to allow or deny

principle is the account or user the policy applies to

use the s3 bucket policy to

grant public access to the bucket

force encryption at upload to the bucket

grant access to another account – cross account access

Bucket settings for block public access

used to block public access

3 kinds:

new acls
any acl
new public bucket or access point policies

block public or cross-account access to buckets or objects through ANY public bucket or access point policies.

but exam will not test you on these.

created to prevent company data leaks

can also be set at account level

networking: supports vpc endpoints

S3 Logging and Audit

s3 access logs can be stored in other s3 buckets
api calls can be logged in CloudTrail

user security:

MFA Delete can be required to delete objects for versioned buckets

pre-signed urls valid for a limited time only

use case

eg to download a premium product or service eg video if user is logged in as a paid up user or has purchased the vid or service

S3 Websites

s3 can host static websites for public access

url will be

bucket-name.s3-website-aws-region.amazonaws.com

if you get 403 forbidden error then make sure bucket policy allows public reads – bucket must be publicly accessible for this.

CORS Cross-Origin Resource Sharing

web browser mechanism to allow requests to other origins while visiting the main origin

eg http://example.com/app1 and http://example.com/app2

a CORS header is needed for this – and the other origin must also allow the request.

the web-browser does a “pre-flight request” first – asking the cross-origin site if it is permitted – then if yes, then eg get put delete

these are the cors method access-control-allow-methods

S3 CORS

exam question!
if a client does a cross-origin request to an “3 bucket then you must enable the correct CORS headers.

you can allow for a specific origin, or for * ie all origins

S3 MFA Delete

forces a user to generate a code from a device eg mobile phone before doing some operations on S3

to activate MFA-Delete enable versioning on the S3 bucket

it is required to permanently delete an object version

and to suspend versioning on the bucket

not needed to enable versioning or list deleted versions

only bucket owner ie root account can enable/disable MFA-Delete

only possible via CLI at present.

first create an access key for the bucket in the web console of iam

then configure the aws cli to use this key

download the key file, and then set up a cli with your access key id and secret access key

command:

aws configure –profile root-mfa-delete-demo

you are then prompted to enter the access key id and secret access key

then you run

aws s3 ls –profile root-mfa-delete-demo to display

then do:

aws s3api put-bucket-versioning –bucket demo-mfa-2020 –versioning-configuration Status=Enabled, MFADelete=Enabled –mfa “<here enter the arn-of-mfa-device and-the-mfa-code-for-the-device>” –profile root-mfa-delete-demo

you can then test by uploading an object

and try deleting the version – you should get a message saying you cannot delete as mfa authentication delete is enabled for this bucket…

so to delete you must use the cli mfa-delete command and your chosen device eg mobile phone mfa – or alternatively for this demo just disable the mfa-delete again. then you can delete as per usual.

To force encryption you can set a bucket policy that refuses any api “put” calls to an object that does not have encryption headers

alternatively, you can use the default encryption option of S3

important for exam: bucket policies are evaluated *before* default encryption settings!

S3 Access Logs

– you can log to another bucket, or you can use AWS Athena

first, very important – and potential exam question!

NEVER set your logging bucket to be the monitored bucket or one of the monitored buckets! because this will create a big infinite loop! which means a huge AWS bill!

always keep logging bucket and the monitored bucket/s separate! – ie set a separate different target bucket that is NOT being logged!

tip: make sure you define a bucket with the word “access-logs” or similar in the name, so that you can easily identify your login bucket to avoid logging it by mistake.

S3 Replication – Cross Region (CRR) and Single Region (SRR)

– must enable versioning for this

the copying is asynchronous

buckets can belong to different accounts

must grant proper iam permissions to S3 for this

CRR: you synchronize against different regions

used for compliance, lower latency access, replicating across different accounts

SRR: used for log aggregation, live replication between eg production and test and development accounts

note: after activating only the new objects get replicated. to replicate existing objects, you need to use…

S3 Batch Replication feature

for DELETEs: you can replicate the delete markers from source to target

but deletions with a version id are not replicated – this is to avoid malicious deletes

and there is no “chaining” of replication…

this means eg if bucket 1 replicates to bucket 2 and 2 replicates to bucket 3, then bucket 1 is not automatically replicated to 3 – you have to explicitly set each replication for each pair of buckets.

first you have to enable versioning for the bucket

then create your target bucket for the replication if not already in existence – can be same region for SRR or a different region for CRR

then select in origin bucket:

management -> replication rules, you create a replication rule and you set the source and destination.

then you need to create an iam role:

and specify if you want to replicate existing objects or not

for existing objects you must use a one-time batch operation for this

S3 Pre-Signed URLs

can generate using sdk or cli

uploads: must use sdk
downloads can use cli, easy

valid fo 3600 secs ie 1hr default, can change

users are given a pre-signed url for get or put

use cases:
eg to only allow logged-in or premium uses to download a product eg video or service
allow a user temporary right to upload a file to your bucket

S3 Storage Classes

need to know for exam!

S3 offers

Standard General Purpose, and
Standard-Infrequent-Access (IA)
One Zone-IA
Glacier Instant Retrieval
Glacier Flexible Retrieval
Glacier Deep Archive
Intelligent Tiering

can move objects between classes or use S3 Lifecycle Management service

Standard S3:

Durability and Availability

difference:

S3 has very high durability, 99.9 11×9!

Availability

how readily available the service is available.

S3 standard is about 1 hr per year out of availability

big data, mobile, gaming, content distribution

Standard-IA:

less frequent access
lower cost than standard

99.9% available

good for DR and backups

1-Zone-IA

v high durability but 99.5% availability – not so high

thus best used for secondary backup copies or recreatable data

Glacier storage classes:

low-cost object storage for archiving or backup

you pay for storage plus a retrieval charge

Glacier Instant Retrieval IR

millisecond retrieval, min storage 90 days

Glacier Flexible Retrieval

1-5 mins to recover

standard 3-5 hrs

bulk 5-12 hrs – is free
min storage duration 90 days

Glacier Deep Archive

best for long-term only storage

12 hrs or bulk 48 hrs retrieval
lowest cost

min 180 days storage time

Intelligent Tiering

allows you to move objects between tiers based on monitoring
no retrieval charges

enables you to leave the moving between tiers to the system

S3 Lifecycle Rules

These automate the moving of storage objects from one storage class to another.

Lifecycle rules are in two parts: Transition Actions and Expiration Actions

Transition Actions:

configures objects for transitioning from one storage class to another.

This can be used to move objects to another class eg 60 days after creation, and to eg Glacier for archiving after 6 months

Expiration Actions:

configures objects to be deleted (“expired”) after a specified period of time

Access logs set to delete after 365 days

can also be used to delete old versions of files where you have file versioning enabled.

or to delete incomplete multi-part uploads after a specific time period

rules can be created for certain prefixes in the file object names or specific “folder” names (remember there are no real directory folders in S3!)

Exam Q:

often about image thumbnails…

Your application on EC2 generates thumbnail images from profile photos after they are uploaded to S3.

These thumbnails can be easily regenerated when needed and only need to be kept for 60 days. The source images should be able to be immediately retrieved during those 60 days. After this time period, users are ok with waiting up to 6 hours.

How would you design a lifecycle policy to allow for this?

Solution:

store the source images on Standard S3, with a lifecycle config to transition them to Glacier after 60 days.

The S3 thumbnails that are generated by the application can be stored on One-Zone-IA, with a lifecycle config to expire ie delete them after 60 days.

Another Exam Q scenario:

A company rule states you should be able to recover your deleted S3 objects immediately for 30 days, though this happens rarely in practice.

After this time, and for up to 365 days, deleted objects should be recoverable within 48 hours.

Solution:

First enable S3 Versioning so you will have multiple object versions, and so that “deleted” objects are hidden by a “delete market” and can be recovered if necessary.

Then, create a lifecycle rule to move non-current versions of the objects to Standard-IA

and then transition these non-current versions from there to Glacier Deep Archive later on.

How To Calculate The Optimum Number of Days To Transition Objects From One Storage Class To Another:

You can use S3 Storage Class Analytics which will give you recommendations for Standard and Standard-IA

But note – exam Q This does NOT work for One-Zone-IA or Glacier

The S3 Storage Class Analytics Report is updated daily, after 24-48 hours you will start to see the data analysis results

This is a useful tool for working out your optimum storage class lifecycle rule according to your actual real storage patterns.

Tags:

Navigation

How Can We Help?