Tags Archives: s3

AWS S3 Storage

S3 is the AWS object storage system. S3 stores objects in flat volumes or containers called buckets, rather than a hierarchical file system. There are no file directories as such in S3!

 

Buckets must have globally unique name – across ALL accounts of AWS – not just your own!

 

defined at region level – important!

 

 

bucket naming convention – must know this for exam!

 

no uppercase, no underscore, 3-63 chars long,

 

must start with lowercase letter or number

 

must not be an ip address

 

objects:

 

are files, must have a key – the full path to the file

 

eg

 

s3://my-bucket/my_file.txt -> my_file.txt is the key

 

you can add “folder” names but they are just a prefix ie tag not a real file directory system

 

 

so if you have

 

s3://my-bucket/my_folder/my_file.txt

 

then /my_folder/my_file.txt

is the object key for this file

 

 

object max size is 5TB but you can only upload 5GB in one go, to upload larger objects, you have to use multi-part upload

 

metadata can be added, also tags

 

and version id system if enabled

 

you can block public access if you want for a bucket

 

you receive an ARN or amazon resource name for the bucket

 

 

2 ways to open an S3

 

in the console click on object action and open

 

or via the url public object url

 

but for this you must set the permission for access to the bucket

 

bucket must be public access

 

pre-signed url – you give the client temp credentials to open the file

 

 

you can version your files, but have to enable versioning at the bucket level. Best practice is to use versioning, this protects against unintended deletes and you can roll back.

 

Any files existing before versioning is enabled will not have previous versions available.

 

if you click on objects -> list versions, you will see the available versions listed.

 

you can easily roll back in this way. So versioning should be enabled!

 

 

S3 Encryption

 

Exam question!

Know the 4 methods of encryption for S3: 

 

SSE-S3 – encrypts S3 objects using keys managed by AWS

 

SSE-KMS – uses AWS key management service to manage the keys

 

SSE-C – to manage own keys yourself

 

Client-Side Encryption

 

important to know which is best suited for each situation!

 

SSE-S3

 

managed by AWS S3

 

object encrypted server side

 

uses AES-256 algorithm

 

must set header “x-amx server-side-encryption”: “AES256”

 

 

SSE-KMS

 

managed by AWS KMS

 

gives you control over users and an audit trail

 

object is encrypted server side

 

set header “x-amx server-side-encryption”: “aws:kms”

 

 

 

SSE-C

 

server side with your own keys outside of aws

 

so s3 does NOT store the key

 

https has to be used for this as you will be sending your key

 

the actual client side data encryption key in every http header

 

s3 then uses the key to encrypt the data on s3 at the bucket.

 

 

Client Side Encryption

 

this happens before transmitting data to s3 at the client side

 

and decryption on the client side.

 

customer fully manages the keys and encryption/decryption.

 

there is a client library called aws s3 encryption client which you can use on your clients.

 

 

encryption in transit ssl or tls

 

https is “in flight” – you always use https for your in flight: mandatory

 

uses ssl or tls certificates

 

 

 

S3 Security

 

User-based security

first there is user-based

 

uses IAM policies, which api calls are allowed from a specific user

 

Resource-based security

 

sets bucket-wide rules from S3 console, allows cross-account

 

 

not very common (NOT in exam)

oacl – object access control list acl – this is finer grain

 

bacl – bucket access control list acl – this is less common

 

 

IAM principal can access an S3 object if

 

user iam permissions allow it or the resource policy allows it

 

and no explicit deny exists

 

 

S3 Bucket Policies

 

they are json based policies

 

actions: they allow a set of api to allow or deny

 

principle is the account or user the policy applies to

 

use the s3 bucket policy to

 

grant public access to the bucket

force encryption at upload to the bucket

grant access to another account – cross account access

 

 

Bucket settings for block public access

 

used to block public access

 

3 kinds:

 

new acls
any acl
new public bucket or access point policies

 

block public or cross-account access to buckets or objects through ANY public bucket or access point policies.

 

but exam will not test you on these.

 

created to prevent company data leaks

 

can also be set at account level

 

networking: supports vpc endpoints

 

S3 Logging and Audit

 

s3 access logs can be stored in other s3 buckets
api calls can be logged in CloudTrail

 

user security:

 

MFA Delete can be required to delete objects for versioned buckets

 

pre-signed urls valid for a limited time only

 

use case

 

eg to download a premium product or service eg video if user is logged in as a paid up user or has purchased the vid or service

 

 

 

S3 Websites

 

s3 can host static websites for public access

 

url will be

 

bucket-name.s3-website-aws-region.amazonaws.com

 

 

 

if you get 403 forbidden error then make sure bucket policy allows public reads – bucket must be publicly accessible for this.

 

 

CORS Cross-Origin Resource Sharing

 

web browser mechanism to allow requests to other origins while visiting the main origin

 

eg http://example.com/app1 and http://example.com/app2

 

a CORS header is needed for this – and the other origin must also allow the request.

 

 

the web-browser does a “pre-flight request” first – asking the cross-origin site if it is permitted – then if yes, then eg get put delete

 

these are the cors method access-control-allow-methods

 

 

S3 CORS

 

exam question!
if a client does a cross-origin request to an “3 bucket then you must enable the correct CORS headers.

 

you can allow for a specific origin, or for * ie all origins

 

 

S3 MFA Delete

 

forces a user to generate a code from a device eg mobile phone before doing some operations on S3

 

to activate MFA-Delete enable versioning on the S3 bucket

 

it is required to permanently delete an object version

and to suspend versioning on the bucket

 

not needed to enable versioning or list deleted versions

 

 

 

only bucket owner ie root account can enable/disable MFA-Delete

 

only possible via CLI at present.

 

 

 

first create an access key for the bucket in the web console of iam

 

 

then configure the aws cli to use this key

 

download the key file, and then set up a cli with your access key id and secret access key

 

command:

 

aws configure –profile root-mfa-delete-demo

 

you are then prompted to enter the access key id and secret access key

 

then you run

 

aws s3 ls –profile root-mfa-delete-demo to display

 

 

then do:

 

aws s3api put-bucket-versioning –bucket demo-mfa-2020 –versioning-configuration Status=Enabled, MFADelete=Enabled –mfa “<here enter the arn-of-mfa-device and-the-mfa-code-for-the-device>” –profile root-mfa-delete-demo

 

you can then test by uploading an object

 

and try deleting the version – you should get a message saying you cannot delete as mfa authentication delete is enabled for this bucket…

 

so to delete you must use the cli mfa-delete command and your chosen device eg mobile phone mfa – or alternatively for this demo just disable the mfa-delete again. then you can delete as per usual.

 

 

 

To force encryption you can set a bucket policy that refuses any api “put” calls to an object that does not have encryption headers

 

alternatively, you can use the default encryption option of S3

 

important for exam: bucket policies are evaluated *before* default encryption settings!

 

 

 

S3 Access Logs

 

– you can log to another bucket, or you can use AWS Athena

first, very important – and potential exam question!

 

NEVER set your logging bucket to be the monitored bucket or one of the monitored buckets! because this will create a big infinite loop! which means a huge AWS bill!

 

always keep logging bucket and the monitored bucket/s separate! – ie set a separate different target bucket that is NOT being logged!

 

tip: make sure you define a bucket with the word “access-logs” or similar in the name, so that you can easily identify your login bucket to avoid logging it by mistake.

 

S3 Replication – Cross Region (CRR) and Single Region (SRR)

 

– must enable versioning for this

 

the copying is asynchronous

 

buckets can belong to different accounts

 

must grant proper iam permissions to S3 for this

 

CRR: you synchronize against different regions

 

used for compliance, lower latency access, replicating across different accounts

 

SRR: used for log aggregation, live replication between eg production and test and development accounts

 

note: after activating only the new objects get replicated. to replicate existing objects, you need to use…

S3 Batch Replication feature

 

for DELETEs: you can replicate the delete markers from source to target

but deletions with a version id are not replicated – this is to avoid malicious deletes

 

and there is no “chaining” of replication…

this means eg if bucket 1 replicates to bucket 2 and 2 replicates to bucket 3, then bucket 1 is not automatically replicated to 3 – you have to explicitly set each replication for each pair of buckets.

 

first you have to enable versioning for the bucket

 

then create your target bucket for the replication if not already in existence – can be same region for SRR or a different region for CRR

 

then select in origin bucket:

 

management -> replication rules, you create a replication rule and you set the source and destination.

 

then you need to create an iam role:

 

and specify if you want to replicate existing objects or not

 

for existing objects you must use a one-time batch operation for this

 

 

S3 Pre-Signed URLs

 

can generate using sdk or cli

 

uploads: must use sdk
downloads can use cli, easy

 

valid fo 3600 secs ie 1hr default, can change

 

users are given a pre-signed url for get or put

 

use cases:
eg to only allow logged-in or premium uses to download a product eg video or service
allow a user temporary right to upload a file to your bucket

 

 

S3 Storage Classes

 

need to know for exam!

S3 offers

 

Standard General Purpose, and
Standard-Infrequent-Access (IA)
One Zone-IA
Glacier Instant Retrieval
Glacier Flexible Retrieval
Glacier Deep Archive
Intelligent Tiering

 

 

can move objects between classes or use S3 Lifecycle Management service

 

 

Standard S3:

Durability and Availability

 

difference:

 

S3 has very high durability, 99.9 11×9!

 

Availability

 

how readily available the service is available.

 

S3 standard is about 1 hr per year out of availability

 

 

big data, mobile, gaming, content distribution

 

Standard-IA:

 

less frequent access
lower cost than standard

 

99.9% available

 

good for DR and backups

 

1-Zone-IA

 

v high durability but 99.5% availability – not so high

 

thus best used for secondary backup copies or recreatable data

 

Glacier storage classes:

 

low-cost object storage for archiving or backup

 

you pay for storage plus a retrieval charge

 

Glacier Instant Retrieval IR

 

millisecond retrieval, min storage 90 days

 

Glacier Flexible Retrieval

 

1-5 mins to recover

 

standard 3-5 hrs

 

 

bulk 5-12 hrs – is free
min storage duration 90 days

 

Glacier Deep Archive

 

best for long-term only storage

 

12 hrs or bulk 48 hrs retrieval
lowest cost

 

min 180 days storage time

 

 

Intelligent Tiering

 

allows you to move objects between tiers based on monitoring
no retrieval charges

enables you to leave the moving between tiers to the system

 

 

S3 Lifecycle Rules

 

These automate the moving of storage objects from one storage class to another.

 

Lifecycle rules are in two parts: Transition Actions and Expiration Actions

 

Transition Actions: 

configures objects for transitioning from one storage class to another.

eg

This can be used to move objects to another class eg 60 days after creation, and to  eg Glacier for archiving after 6 months

 

Expiration Actions:

configures objects to be deleted (“expired”) after a specified period of time

eg

 

Access logs set to delete after 365 days

 

can also be used to delete old versions of files where you have file versioning enabled.

 

or to delete incomplete multi-part uploads after a specific time period

 

rules can be created for certain prefixes in the file object names or specific “folder” names (remember there are no real directory folders in S3!)

 

 

Exam Q: 

 

often about image thumbnails…

 

eg

 

Your application on EC2 generates thumbnail images from profile photos after they are uploaded to S3. 

 

These thumbnails can be easily regenerated when needed and only need to be kept for 60 days. The source images should be able to be immediately retrieved during those 60 days. After this time period, users are ok with waiting up to 6 hours. 

 

How would you design a lifecycle policy to allow for this?

 

Solution:

 

store the source images on Standard S3, with a lifecycle config to transition them to Glacier after 60 days.

 

The S3 thumbnails that are generated by the application can be stored on One-Zone-IA, with a lifecycle config to expire ie delete them after 60 days.

 

Another Exam Q scenario:

 

A company rule states you should be able to recover your deleted S3 objects immediately for  30 days, though this happens rarely in practice. 

 

After this time, and for up to 365 days, deleted objects should be recoverable within 48 hours.

 

 

Solution:

 

First enable S3 Versioning so you will have multiple object versions, and so that “deleted” objects are hidden by a “delete market” and can be recovered if necessary.

 

Then, create a lifecycle rule to move non-current versions of the objects to Standard-IA

and then transition these non-current versions from there to Glacier Deep Archive later on.

 

 

 

How To Calculate The Optimum Number of Days To Transition Objects From One Storage Class To Another:

 

You can use S3 Storage Class Analytics which will give you recommendations for Standard and Standard-IA 

 

But note  – exam Q  This does NOT work for One-Zone-IA or Glacier

 

The S3 Storage Class Analytics Report is updated daily, after 24-48 hours you will start to see the data analysis results

 

This is a useful tool for working out your optimum storage class lifecycle rule according to your actual real storage patterns.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Continue Reading