How Can We Help?
AWS S3 Storage
S3 is the AWS object storage system. S3 stores objects in flat volumes or containers called buckets, rather than a hierarchical file system. There are no file directories as such in S3!
Buckets must have globally unique name – across ALL accounts of AWS – not just your own!
defined at region level – important!
bucket naming convention – must know this for exam!
no uppercase, no underscore, 3-63 chars long,
must start with lowercase letter or number
must not be an ip address
objects:
are files, must have a key – the full path to the file
eg
s3://my-bucket/my_file.txt -> my_file.txt is the key
you can add “folder” names but they are just a prefix ie tag not a real file directory system
so if you have
s3://my-bucket/my_folder/my_file.txt
then /my_folder/my_file.txt
is the object key for this file
object max size is 5TB but you can only upload 5GB in one go, to upload larger objects, you have to use multi-part upload
metadata can be added, also tags
and version id system if enabled
you can block public access if you want for a bucket
you receive an ARN or amazon resource name for the bucket
2 ways to open an S3
in the console click on object action and open
or via the url public object url
but for this you must set the permission for access to the bucket
bucket must be public access
pre-signed url – you give the client temp credentials to open the file
you can version your files, but have to enable versioning at the bucket level. Best practice is to use versioning, this protects against unintended deletes and you can roll back.
Any files existing before versioning is enabled will not have previous versions available.
if you click on objects -> list versions, you will see the available versions listed.
you can easily roll back in this way. So versioning should be enabled!
S3 Encryption
Exam question!
Know the 4 methods of encryption for S3:
SSE-S3 – encrypts S3 objects using keys managed by AWS
SSE-KMS – uses AWS key management service to manage the keys
SSE-C – to manage own keys yourself
Client-Side Encryption
important to know which is best suited for each situation!
SSE-S3
managed by AWS S3
object encrypted server side
uses AES-256 algorithm
must set header “x-amx server-side-encryption”: “AES256”
SSE-KMS
managed by AWS KMS
gives you control over users and an audit trail
object is encrypted server side
set header “x-amx server-side-encryption”: “aws:kms”
SSE-C
server side with your own keys outside of aws
so s3 does NOT store the key
https has to be used for this as you will be sending your key
the actual client side data encryption key in every http header
s3 then uses the key to encrypt the data on s3 at the bucket.
Client Side Encryption
this happens before transmitting data to s3 at the client side
and decryption on the client side.
customer fully manages the keys and encryption/decryption.
there is a client library called aws s3 encryption client which you can use on your clients.
encryption in transit ssl or tls
https is “in flight” – you always use https for your in flight: mandatory
uses ssl or tls certificates
S3 Security
User-based security
first there is user-based
uses IAM policies, which api calls are allowed from a specific user
Resource-based security
sets bucket-wide rules from S3 console, allows cross-account
not very common (NOT in exam)
oacl – object access control list acl – this is finer grain
bacl – bucket access control list acl – this is less common
IAM principal can access an S3 object if
user iam permissions allow it or the resource policy allows it
and no explicit deny exists
S3 Bucket Policies
they are json based policies
actions: they allow a set of api to allow or deny
principle is the account or user the policy applies to
use the s3 bucket policy to
grant public access to the bucket
force encryption at upload to the bucket
grant access to another account – cross account access
Bucket settings for block public access
used to block public access
3 kinds:
new acls
any acl
new public bucket or access point policies
block public or cross-account access to buckets or objects through ANY public bucket or access point policies.
but exam will not test you on these.
created to prevent company data leaks
can also be set at account level
networking: supports vpc endpoints
S3 Logging and Audit
s3 access logs can be stored in other s3 buckets
api calls can be logged in CloudTrail
user security:
MFA Delete can be required to delete objects for versioned buckets
pre-signed urls valid for a limited time only
use case
eg to download a premium product or service eg video if user is logged in as a paid up user or has purchased the vid or service
S3 Websites
s3 can host static websites for public access
url will be
bucket-name.s3-website-aws-region.amazonaws.com
if you get 403 forbidden error then make sure bucket policy allows public reads – bucket must be publicly accessible for this.
CORS Cross-Origin Resource Sharing
web browser mechanism to allow requests to other origins while visiting the main origin
eg http://example.com/app1 and http://example.com/app2
a CORS header is needed for this – and the other origin must also allow the request.
the web-browser does a “pre-flight request” first – asking the cross-origin site if it is permitted – then if yes, then eg get put delete
these are the cors method access-control-allow-methods
S3 CORS
exam question!
if a client does a cross-origin request to an “3 bucket then you must enable the correct CORS headers.
you can allow for a specific origin, or for * ie all origins
S3 MFA Delete
forces a user to generate a code from a device eg mobile phone before doing some operations on S3
to activate MFA-Delete enable versioning on the S3 bucket
it is required to permanently delete an object version
and to suspend versioning on the bucket
not needed to enable versioning or list deleted versions
only bucket owner ie root account can enable/disable MFA-Delete
only possible via CLI at present.
first create an access key for the bucket in the web console of iam
then configure the aws cli to use this key
download the key file, and then set up a cli with your access key id and secret access key
command:
aws configure –profile root-mfa-delete-demo
you are then prompted to enter the access key id and secret access key
then you run
aws s3 ls –profile root-mfa-delete-demo to display
then do:
aws s3api put-bucket-versioning –bucket demo-mfa-2020 –versioning-configuration Status=Enabled, MFADelete=Enabled –mfa “<here enter the arn-of-mfa-device and-the-mfa-code-for-the-device>” –profile root-mfa-delete-demo
you can then test by uploading an object
and try deleting the version – you should get a message saying you cannot delete as mfa authentication delete is enabled for this bucket…
so to delete you must use the cli mfa-delete command and your chosen device eg mobile phone mfa – or alternatively for this demo just disable the mfa-delete again. then you can delete as per usual.
To force encryption you can set a bucket policy that refuses any api “put” calls to an object that does not have encryption headers
alternatively, you can use the default encryption option of S3
important for exam: bucket policies are evaluated *before* default encryption settings!
S3 Access Logs
– you can log to another bucket, or you can use AWS Athena
first, very important – and potential exam question!
NEVER set your logging bucket to be the monitored bucket or one of the monitored buckets! because this will create a big infinite loop! which means a huge AWS bill!
always keep logging bucket and the monitored bucket/s separate! – ie set a separate different target bucket that is NOT being logged!
tip: make sure you define a bucket with the word “access-logs” or similar in the name, so that you can easily identify your login bucket to avoid logging it by mistake.
S3 Replication – Cross Region (CRR) and Single Region (SRR)
– must enable versioning for this
the copying is asynchronous
buckets can belong to different accounts
must grant proper iam permissions to S3 for this
CRR: you synchronize against different regions
used for compliance, lower latency access, replicating across different accounts
SRR: used for log aggregation, live replication between eg production and test and development accounts
note: after activating only the new objects get replicated. to replicate existing objects, you need to use…
S3 Batch Replication feature
for DELETEs: you can replicate the delete markers from source to target
but deletions with a version id are not replicated – this is to avoid malicious deletes
and there is no “chaining” of replication…
this means eg if bucket 1 replicates to bucket 2 and 2 replicates to bucket 3, then bucket 1 is not automatically replicated to 3 – you have to explicitly set each replication for each pair of buckets.
first you have to enable versioning for the bucket
then create your target bucket for the replication if not already in existence – can be same region for SRR or a different region for CRR
then select in origin bucket:
management -> replication rules, you create a replication rule and you set the source and destination.
then you need to create an iam role:
and specify if you want to replicate existing objects or not
for existing objects you must use a one-time batch operation for this
S3 Pre-Signed URLs
can generate using sdk or cli
uploads: must use sdk
downloads can use cli, easy
valid fo 3600 secs ie 1hr default, can change
users are given a pre-signed url for get or put
use cases:
eg to only allow logged-in or premium uses to download a product eg video or service
allow a user temporary right to upload a file to your bucket
S3 Storage Classes
need to know for exam!
S3 offers
Standard General Purpose, and
Standard-Infrequent-Access (IA)
One Zone-IA
Glacier Instant Retrieval
Glacier Flexible Retrieval
Glacier Deep Archive
Intelligent Tiering
can move objects between classes or use S3 Lifecycle Management service
Standard S3:
Durability and Availability
difference:
S3 has very high durability, 99.9 11×9!
Availability
how readily available the service is available.
S3 standard is about 1 hr per year out of availability
big data, mobile, gaming, content distribution
Standard-IA:
less frequent access
lower cost than standard
99.9% available
good for DR and backups
1-Zone-IA
v high durability but 99.5% availability – not so high
thus best used for secondary backup copies or recreatable data
Glacier storage classes:
low-cost object storage for archiving or backup
you pay for storage plus a retrieval charge
Glacier Instant Retrieval IR
millisecond retrieval, min storage 90 days
Glacier Flexible Retrieval
1-5 mins to recover
standard 3-5 hrs
bulk 5-12 hrs – is free
min storage duration 90 days
Glacier Deep Archive
best for long-term only storage
12 hrs or bulk 48 hrs retrieval
lowest cost
min 180 days storage time
Intelligent Tiering
allows you to move objects between tiers based on monitoring
no retrieval charges
enables you to leave the moving between tiers to the system
S3 Lifecycle Rules
These automate the moving of storage objects from one storage class to another.
Lifecycle rules are in two parts: Transition Actions and Expiration Actions
Transition Actions:
configures objects for transitioning from one storage class to another.
eg
This can be used to move objects to another class eg 60 days after creation, and to eg Glacier for archiving after 6 months
Expiration Actions:
configures objects to be deleted (“expired”) after a specified period of time
eg
Access logs set to delete after 365 days
can also be used to delete old versions of files where you have file versioning enabled.
or to delete incomplete multi-part uploads after a specific time period
rules can be created for certain prefixes in the file object names or specific “folder” names (remember there are no real directory folders in S3!)
Exam Q:
often about image thumbnails…
eg
Your application on EC2 generates thumbnail images from profile photos after they are uploaded to S3.
These thumbnails can be easily regenerated when needed and only need to be kept for 60 days. The source images should be able to be immediately retrieved during those 60 days. After this time period, users are ok with waiting up to 6 hours.
How would you design a lifecycle policy to allow for this?
Solution:
store the source images on Standard S3, with a lifecycle config to transition them to Glacier after 60 days.
The S3 thumbnails that are generated by the application can be stored on One-Zone-IA, with a lifecycle config to expire ie delete them after 60 days.
Another Exam Q scenario:
A company rule states you should be able to recover your deleted S3 objects immediately for 30 days, though this happens rarely in practice.
After this time, and for up to 365 days, deleted objects should be recoverable within 48 hours.
Solution:
First enable S3 Versioning so you will have multiple object versions, and so that “deleted” objects are hidden by a “delete market” and can be recovered if necessary.
Then, create a lifecycle rule to move non-current versions of the objects to Standard-IA
and then transition these non-current versions from there to Glacier Deep Archive later on.
How To Calculate The Optimum Number of Days To Transition Objects From One Storage Class To Another:
You can use S3 Storage Class Analytics which will give you recommendations for Standard and Standard-IA
But note – exam Q This does NOT work for One-Zone-IA or Glacier
The S3 Storage Class Analytics Report is updated daily, after 24-48 hours you will start to see the data analysis results
This is a useful tool for working out your optimum storage class lifecycle rule according to your actual real storage patterns.