How Can We Help?

AWS S3 Storage

You are here:
< All Topics

S3 is the AWS object storage system. S3 stores objects in flat volumes or containers called buckets, rather than a hierarchical file system. There are no file directories as such in S3!


Buckets must have globally unique name – across ALL accounts of AWS – not just your own!


defined at region level – important!



bucket naming convention – must know this for exam!


no uppercase, no underscore, 3-63 chars long,


must start with lowercase letter or number


must not be an ip address




are files, must have a key – the full path to the file




s3://my-bucket/my_file.txt -> my_file.txt is the key


you can add “folder” names but they are just a prefix ie tag not a real file directory system



so if you have




then /my_folder/my_file.txt

is the object key for this file



object max size is 5TB but you can only upload 5GB in one go, to upload larger objects, you have to use multi-part upload


metadata can be added, also tags


and version id system if enabled


you can block public access if you want for a bucket


you receive an ARN or amazon resource name for the bucket



2 ways to open an S3


in the console click on object action and open


or via the url public object url


but for this you must set the permission for access to the bucket


bucket must be public access


pre-signed url – you give the client temp credentials to open the file



you can version your files, but have to enable versioning at the bucket level. Best practice is to use versioning, this protects against unintended deletes and you can roll back.


Any files existing before versioning is enabled will not have previous versions available.


if you click on objects -> list versions, you will see the available versions listed.


you can easily roll back in this way. So versioning should be enabled!



S3 Encryption


Exam question!

Know the 4 methods of encryption for S3: 


SSE-S3 – encrypts S3 objects using keys managed by AWS


SSE-KMS – uses AWS key management service to manage the keys


SSE-C – to manage own keys yourself


Client-Side Encryption


important to know which is best suited for each situation!




managed by AWS S3


object encrypted server side


uses AES-256 algorithm


must set header “x-amx server-side-encryption”: “AES256”





managed by AWS KMS


gives you control over users and an audit trail


object is encrypted server side


set header “x-amx server-side-encryption”: “aws:kms”






server side with your own keys outside of aws


so s3 does NOT store the key


https has to be used for this as you will be sending your key


the actual client side data encryption key in every http header


s3 then uses the key to encrypt the data on s3 at the bucket.



Client Side Encryption


this happens before transmitting data to s3 at the client side


and decryption on the client side.


customer fully manages the keys and encryption/decryption.


there is a client library called aws s3 encryption client which you can use on your clients.



encryption in transit ssl or tls


https is “in flight” – you always use https for your in flight: mandatory


uses ssl or tls certificates




S3 Security


User-based security

first there is user-based


uses IAM policies, which api calls are allowed from a specific user


Resource-based security


sets bucket-wide rules from S3 console, allows cross-account



not very common (NOT in exam)

oacl – object access control list acl – this is finer grain


bacl – bucket access control list acl – this is less common



IAM principal can access an S3 object if


user iam permissions allow it or the resource policy allows it


and no explicit deny exists



S3 Bucket Policies


they are json based policies


actions: they allow a set of api to allow or deny


principle is the account or user the policy applies to


use the s3 bucket policy to


grant public access to the bucket

force encryption at upload to the bucket

grant access to another account – cross account access



Bucket settings for block public access


used to block public access


3 kinds:


new acls
any acl
new public bucket or access point policies


block public or cross-account access to buckets or objects through ANY public bucket or access point policies.


but exam will not test you on these.


created to prevent company data leaks


can also be set at account level


networking: supports vpc endpoints


S3 Logging and Audit


s3 access logs can be stored in other s3 buckets
api calls can be logged in CloudTrail


user security:


MFA Delete can be required to delete objects for versioned buckets


pre-signed urls valid for a limited time only


use case


eg to download a premium product or service eg video if user is logged in as a paid up user or has purchased the vid or service




S3 Websites


s3 can host static websites for public access


url will be




if you get 403 forbidden error then make sure bucket policy allows public reads – bucket must be publicly accessible for this.



CORS Cross-Origin Resource Sharing


web browser mechanism to allow requests to other origins while visiting the main origin


eg and


a CORS header is needed for this – and the other origin must also allow the request.



the web-browser does a “pre-flight request” first – asking the cross-origin site if it is permitted – then if yes, then eg get put delete


these are the cors method access-control-allow-methods





exam question!
if a client does a cross-origin request to an “3 bucket then you must enable the correct CORS headers.


you can allow for a specific origin, or for * ie all origins



S3 MFA Delete


forces a user to generate a code from a device eg mobile phone before doing some operations on S3


to activate MFA-Delete enable versioning on the S3 bucket


it is required to permanently delete an object version

and to suspend versioning on the bucket


not needed to enable versioning or list deleted versions




only bucket owner ie root account can enable/disable MFA-Delete


only possible via CLI at present.




first create an access key for the bucket in the web console of iam



then configure the aws cli to use this key


download the key file, and then set up a cli with your access key id and secret access key




aws configure –profile root-mfa-delete-demo


you are then prompted to enter the access key id and secret access key


then you run


aws s3 ls –profile root-mfa-delete-demo to display



then do:


aws s3api put-bucket-versioning –bucket demo-mfa-2020 –versioning-configuration Status=Enabled, MFADelete=Enabled –mfa “<here enter the arn-of-mfa-device and-the-mfa-code-for-the-device>” –profile root-mfa-delete-demo


you can then test by uploading an object


and try deleting the version – you should get a message saying you cannot delete as mfa authentication delete is enabled for this bucket…


so to delete you must use the cli mfa-delete command and your chosen device eg mobile phone mfa – or alternatively for this demo just disable the mfa-delete again. then you can delete as per usual.




To force encryption you can set a bucket policy that refuses any api “put” calls to an object that does not have encryption headers


alternatively, you can use the default encryption option of S3


important for exam: bucket policies are evaluated *before* default encryption settings!




S3 Access Logs


– you can log to another bucket, or you can use AWS Athena

first, very important – and potential exam question!


NEVER set your logging bucket to be the monitored bucket or one of the monitored buckets! because this will create a big infinite loop! which means a huge AWS bill!


always keep logging bucket and the monitored bucket/s separate! – ie set a separate different target bucket that is NOT being logged!


tip: make sure you define a bucket with the word “access-logs” or similar in the name, so that you can easily identify your login bucket to avoid logging it by mistake.


S3 Replication – Cross Region (CRR) and Single Region (SRR)


– must enable versioning for this


the copying is asynchronous


buckets can belong to different accounts


must grant proper iam permissions to S3 for this


CRR: you synchronize against different regions


used for compliance, lower latency access, replicating across different accounts


SRR: used for log aggregation, live replication between eg production and test and development accounts


note: after activating only the new objects get replicated. to replicate existing objects, you need to use…

S3 Batch Replication feature


for DELETEs: you can replicate the delete markers from source to target

but deletions with a version id are not replicated – this is to avoid malicious deletes


and there is no “chaining” of replication…

this means eg if bucket 1 replicates to bucket 2 and 2 replicates to bucket 3, then bucket 1 is not automatically replicated to 3 – you have to explicitly set each replication for each pair of buckets.


first you have to enable versioning for the bucket


then create your target bucket for the replication if not already in existence – can be same region for SRR or a different region for CRR


then select in origin bucket:


management -> replication rules, you create a replication rule and you set the source and destination.


then you need to create an iam role:


and specify if you want to replicate existing objects or not


for existing objects you must use a one-time batch operation for this



S3 Pre-Signed URLs


can generate using sdk or cli


uploads: must use sdk
downloads can use cli, easy


valid fo 3600 secs ie 1hr default, can change


users are given a pre-signed url for get or put


use cases:
eg to only allow logged-in or premium uses to download a product eg video or service
allow a user temporary right to upload a file to your bucket



S3 Storage Classes


need to know for exam!

S3 offers


Standard General Purpose, and
Standard-Infrequent-Access (IA)
One Zone-IA
Glacier Instant Retrieval
Glacier Flexible Retrieval
Glacier Deep Archive
Intelligent Tiering



can move objects between classes or use S3 Lifecycle Management service



Standard S3:

Durability and Availability




S3 has very high durability, 99.9 11×9!




how readily available the service is available.


S3 standard is about 1 hr per year out of availability



big data, mobile, gaming, content distribution




less frequent access
lower cost than standard


99.9% available


good for DR and backups




v high durability but 99.5% availability – not so high


thus best used for secondary backup copies or recreatable data


Glacier storage classes:


low-cost object storage for archiving or backup


you pay for storage plus a retrieval charge


Glacier Instant Retrieval IR


millisecond retrieval, min storage 90 days


Glacier Flexible Retrieval


1-5 mins to recover


standard 3-5 hrs



bulk 5-12 hrs – is free
min storage duration 90 days


Glacier Deep Archive


best for long-term only storage


12 hrs or bulk 48 hrs retrieval
lowest cost


min 180 days storage time



Intelligent Tiering


allows you to move objects between tiers based on monitoring
no retrieval charges

enables you to leave the moving between tiers to the system



S3 Lifecycle Rules


These automate the moving of storage objects from one storage class to another.


Lifecycle rules are in two parts: Transition Actions and Expiration Actions


Transition Actions: 

configures objects for transitioning from one storage class to another.


This can be used to move objects to another class eg 60 days after creation, and to  eg Glacier for archiving after 6 months


Expiration Actions:

configures objects to be deleted (“expired”) after a specified period of time



Access logs set to delete after 365 days


can also be used to delete old versions of files where you have file versioning enabled.


or to delete incomplete multi-part uploads after a specific time period


rules can be created for certain prefixes in the file object names or specific “folder” names (remember there are no real directory folders in S3!)



Exam Q: 


often about image thumbnails…




Your application on EC2 generates thumbnail images from profile photos after they are uploaded to S3. 


These thumbnails can be easily regenerated when needed and only need to be kept for 60 days. The source images should be able to be immediately retrieved during those 60 days. After this time period, users are ok with waiting up to 6 hours. 


How would you design a lifecycle policy to allow for this?




store the source images on Standard S3, with a lifecycle config to transition them to Glacier after 60 days.


The S3 thumbnails that are generated by the application can be stored on One-Zone-IA, with a lifecycle config to expire ie delete them after 60 days.


Another Exam Q scenario:


A company rule states you should be able to recover your deleted S3 objects immediately for  30 days, though this happens rarely in practice. 


After this time, and for up to 365 days, deleted objects should be recoverable within 48 hours.





First enable S3 Versioning so you will have multiple object versions, and so that “deleted” objects are hidden by a “delete market” and can be recovered if necessary.


Then, create a lifecycle rule to move non-current versions of the objects to Standard-IA

and then transition these non-current versions from there to Glacier Deep Archive later on.




How To Calculate The Optimum Number of Days To Transition Objects From One Storage Class To Another:


You can use S3 Storage Class Analytics which will give you recommendations for Standard and Standard-IA 


But note  – exam Q  This does NOT work for One-Zone-IA or Glacier


The S3 Storage Class Analytics Report is updated daily, after 24-48 hours you will start to see the data analysis results


This is a useful tool for working out your optimum storage class lifecycle rule according to your actual real storage patterns.























Table of Contents