How Can We Help?

AWS – Choosing an AWS Database

You are here:
< All Topics

AWS offers several db solutions.


A number of questions need to be considered when choosing an AWS DB solution:


Questions to Consider


What is your need – read heavy, write heavy, balanced?


Throughput volume?


Will this change, does it need to scale, will it fluctuate up and down?, is it predictable or not?


How much data volume to store – and for how long


will it grow?


what is the average object size?


how do you need to access it?


what data durability do you require? what is the “source of truth” for the data?


are there compliance requirements?


what latency requirements – how many concurrent users


what is the data model – how will you query the data, will there be data joins? structured, semi-structured?


strong schema, flexible, reporting needs? searching? RBDMS or NoSQL?


what about license costs – cloud native db such as Aurora possible or not?


Overview of Database Types on AWS


RDBMS such as sql, oltp, this means: RDS, or Aurora, esp good for joins

NoSQL DBs such as DynamoDB – json, elasticache – key/value pairs or Nepture – good for graphs, but no joins and no sql


Object Stores: S3 for big objects, Glacier for backup and archives – may not seem like a DB but it works like one


Data Warehouse solutions eg SQL Analytics/BI, Redshift (OLAP), Athena


Search solutions: eg ElasticSearch (json), for free text unstructureed searches


Graph solutions: Neptune – this displays relationships between data



Overviews of AWS DB Solutions


RDS Overview


its a managed db on the postgresql/myswl/Oracle/SQL level


you must however an ec2 instance and ebs vol type and sufficient size


it supports read replicas and multi-AZ
security is via iam and security groups, kms, and ssl in transit
backup, snapshot and point in time restores all possible


managed and scheduled maintanance


monitoring available via cloudwatch


use cases include:


storing relational datasets rdbms/oltp performing sql queries, transactional inserts, update, delete is possible


rds for solutions architect, considerations include these “5 pillars”:





operations_ small downtimes when failover happens, when maintenance happens, when scaling read replicas, ec2 instances, and restoring from ebs, this requires manual intervention, and when application changes



security: aws is responsible for os security, but we are responsible for setting up kms, security groups, iam policies, authorizing users in db and using ssl



reliability: the multi-az feature makes rds v reliable, good for failover in failure situations


performance: dependent on ec2 instance type, ebs vol type, can add read replicas, storage autoscaling is possible, and manual scaling of instances is also possible


costs: is pay per hour based on provisioned number and type of ec2 instances and ebs usage



Aurora Overview


Compatible API for PostgreSQL and MySQL


Data is held in 6 replicas across 3 AZs
It has auto-healing capability
Multi-AZ Auto Scaling Read Replicas
Read Replicas can be global


Aurora DB can be global for DR or latency purposes
auto Scaling of storage from 10GB to 128 TB


Define EC2 instance type for Aurora instances


same security/monitoring/maintenance as for RDS but also has


Aurora Serverless for unpredictable/intermittent workloads
Aurora Multi-Master for continuous writes failover


use cases: same as for RDS but with less maintenance/more flexibility and higher performance


Operations: less operations, auto scaling storage


security: AWS as per usual, but we are responsible for kms, security groups, iam policy, user auth and ssl


reliability: multi AZ, high availability, serverless and multimaster options


performance 5x performance, plus max 15 read replicas (vs only 5 for rds)


cost: pay per hour acc to ec2 and storage usage, can be lower cost than eg oracle



ElastiCache Overview


it is really just a cache not a database


is a managed Redis or Memcached – similar to RDS but for caches


its an in-memory data store with very low latency
you must provision an ec2 instance type to use it


supports redis clustering and multi-az, plus read replicas using sharding


security is via iam, security groups, kms, and redis auth – NO IAM


backup, snapshot, point in time restores
managed and scheduled maintenance
monitoring via cloudwatch


use case: key/value store, frequent reads, less writes, cache results for db queries, storing of session data for websites, but cannot use sql – latter is important to be aware of.


you retrieve the data only by key value, you can’t query on other fields



operations: same as rds
security: usual can use iam policies,for users Redis Auth, and ssl


reliability: clustering and multi AZ


performance: in memory so extremely fast, read replicas for sharding, very efficient


cost: similar to rds pricing based on ec2 and storage usage





proprietary to AWS
a managed NoSQL DB
serverless, provisioned, auto-scaling, on-demand capacity


can replace elasticache as a key-value store, eg for storing session data


performance is slower than eg rds


highly available, multi-AZ by default, read–writes decoupled, DAW available for read cache


2 options for reads: eventually consistent or strongly consistent


security, authorization-authentication all done via iam


dynamodb streams integrate with lambda


backup-restore and global table feature – if you enable streams


monitoring via cloudwatch


but you can only query on primary key, sort key or indexes – exam q!
so you cannot query on “any” attribute – only the above.


use case:


serverless apps development, small docs 100s kb, distrib serverless cache, doesn’t have sql query language available, has transaction capability now built in




S3 Overview


acts as a simple key/value store for objects


great for big objects up to 5TB, not so good for small objects


serverless, scales infinitely, strong consistency for every operation


tiers for migrating data: s3 standard, s3 IA, s3 one-zone IA, Glacier for backups


features include: versioning, encryption, CRR – cross-region replication


security: IAM, bucket policies, ACL


encryption: SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit


use case: static files, key-value stores for big files and website hosting


operations: no operations necessary!
security: IAM, bucket policies, ACL, encryption set up correctly,


reliability: extremely high, durability also extremely good, multi-AZ and CRR


performance: scales easily, very high read/write, multipart for uploads


cost: only pay for storage used, no need to provision in advance, plus network costs for transfer/retrieval, plus charge per number of requests



Athena Overview


fully serverless database with SQL capability
used to query data in S3
pay per query
outputs results back to S3
is secured via IAM


use cases: one-time sql queries, serverless queries on S3, log analytics



Operations: no ops needed! is serverless


security: via S3 using bucket policies, IAM


reliability: is managed, uses Presto engine, highly available
performance: queries scale based on data size


cost: pay per query/per TB of data scanned, serverless



Redshift Overview


is a great data warehouse system


based on PostgreSQL but not used for oltp


instead, its an OLAP – online analytical processing — analytics and data warehousing


10x better performance than other data warehouses, scales to PBs of data


columnar storage of data rather than row-based


it is MPP – uses a massively parallel query execution engine which makes it extremely fast

pay as you go acc to no of instances provisioned
has an SQL interface for performing the queries.


data is loaded from S3, DynamoDB, DMS and other DBs,
1 to 128 nodes possible with up to 128 TB of space per node!


leader node used for query planning and results aggregation
compute node : performs the queries and sends results to leader node


Redshift Spectrum: performs queries directly against S3 with no load to load the data into the redshift cluster


Backup & Restore, SecurityVPC, IAM, KMS, monitoring


Redshift Enhanced VPC Routing: Copy / Unload goes through VPC, this avoids public internet


Redshift Snapshots and DR


has no multi-AZ node


the snapshots are p-i-t- point in time backups of a cluster, stored internally in S3


they are incremental — only changes are saved, makes it fast

can restore to a new cluster

automated snapshots: every 8 hrs every 5gb or according to a schedule, with a set retention period


manual snapshots: snapshot is retained until you delete it


neat feature:
you can configure Redshift to auto copy snapshots to a cluster of another region
either manually or automatically



loading data into redshift:


there are three possible ways:


1. use kinesis data firehouse loads data into redshift cluster via an s3 copy automatically to an S3 bucket


2. using the copy command manually without Kinesis
from S3 bucket via internet – ie without using enhanced vpc routing to redshift cluster
or via vpc not via internet, using enhanced vpc routing



3. from ec2 instance to redshift cluster using jdbc driver
in this case it is much better to write the data in batches rather than all at once.



Redshift Spectrum


must already have a redshift cluster operational


Spectrum is a way to query data that is already in S3 without having to load it into Redshift


you submit the query which is submitted to thousands of Redshift spectrum nodes for processing



Operations: similar to rds
security: uses iam, vpc, kms, ssl as for rds
reliability: auto healing, cross region snapshop copies possible
performance: 10x performance of other data warehousing systems, uses compression
cost: pay per node provisioned about 10% of cost of other dw systems
vs athena: is faster at querying, does joins, aggregations thanks to indexes


redshift = analytics/BI/Data Warehouse



Glue Overview


is a managed extract transform and load ETL service


used to prepare and transform data for analytics
fully serverless service


fetch data from s3 bucket or rds, send to glue which does extracting, transforming and loading to redshift data warehouse


glue data catalog: catalog of all the datasets you have in aws – ie metadata info about your data


you deploy glue data crawler to navigate s3, rds, dynamo DB, it then writes the metadata to the glue data catalog.


this can then be used b Glue Jobs eTL


data discovery -> Athena, Redshift Spectrum, EMR for analytics


all you need to know about Glue for exam




Neptune Overview


if you hear graphs mentioned in exam, this refers to neptune!


is a fully managed graph database


used for
high relationship data
social networking, eg users friends with users, replied to comment on post of user and likes other comments


knowledge graphs eg for wikipedia – links to other wiki pages, lots of these links
this is graph data – giving a massive graph


highly available across 3 AZs with up to 15 read replicas available


point in time recovery, continuous backup to S3


support for kms encryption, https, and ssl


operations: similar to rds
security: iam, vpc, kms, ssl – similar to rds, plus iam authentication
reliability: multi-AZ and clustering
performance: best suited for graphs, clustering to improve performance


cost: similar to rds, pay per provisioned node


remember neptune = graphs database



AWS OpenSearch


– the successor to AWS ElasticSearch


eg dynamodb only allows search by primary key or index,


whereas with OpenSearch you can search ANY field


often used as a complement to other db


also has usage for big data apps
can provision a cluster of instances
built-in integrations: various – kinesis data firehose, aws IoT, CloudWatch Logs



comes with visualization dashboard


operations: similar to rds


security: is via Cognito, IAM, KMS encryption, SSL and VPC
reliability: multi-AZ and clustering
performance: based on elasticsearch project – open source, pbyte scale
cost: pay per provisioned node


all you need to remember:


used to search and index data


Question 1:
Which database helps you store relational datasets, with SQL language compatibility and the capability of processing transactions such as insert, update, and delete?




Question 2:
Which AWS service provides you with caching capability that is compatible with Redis API?



Good job!
Amazon ElastiCache is a fully managed in-memory data store, compatible with Redis or Memcached.



Question 3:
You want to migrate an on-premises MongoDB NoSQL database to AWS. You don’t want to manage any database servers, so you want to use a managed NoSQL database, preferably Serverless, that provides you with high availability, durability, and reliability. Which database should you choose?


Amazon DynamoDB


Good job!

Amazon DynamoDB is a key-value, document, NoSQL database.



Question 4:
You are looking to perform Online Transaction Processing (OLTP). You would like to use a database that has built-in auto-scaling capabilities and provides you with the maximum number of replicas for its underlying storage. What AWS service do you recommend?


Amazon Aurora


Good job!

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database. It features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across 3 AZs.




Question 5:
As a Solutions Architect, a startup company asked you for help as they are working on an architecture for a social media website where users can be friends with each other, and like each other’s posts. The company plan on performing some complicated queries such as “What are the number of likes on the posts that have been posted by the friends of Mike?”. Which database do you recommend?


Amazon Neptune


Good job!
Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets.



Question 6:
You have a set of files, 100MB each, that you want to store in a reliable and durable key-value store. Which AWS service do you recommend?


Amazon S3


Good job!
Amazon S3 is indeed a key-value store! (where the key is the full path of the object in the bucket)


Question 7:
You would like to have a database that is efficient at performing analytical queries on large sets of columnar data. You would like to connect to this Data Warehouse using a reporting and dashboard tool such as Amazon QuickSight. Which AWS technology do you recommend?


Amazon Redshift


Good job!
Amazon Redshift



Question 8:
You have a lot of log files stored in an S3 bucket that you want to perform a quick analysis, if possible Serverless, to filter the logs and find users that attempted to make an unauthorized action. Which AWS service allows you to do so?


Amazon Athena


Good job!
Amazon Athena is an interactive serverless query service that makes it easy to analyze data in S3 buckets using Standard SQL.



Question 9:
As a Solutions Architect, you have been instructed you to prepare a disaster recovery plan for a Redshift cluster. What should you do?


enable automated snapshops, then configure your redshift cluster to autocopy the snapshots to another aws region


Good job!



Question 10:
Which feature in Redshift forces all COPY and UNLOAD traffic moving between your cluster and data repositories through your VPCs?



Enhanced VPC Routing


Good job!


Question 11:
You are running a gaming website that is using DynamoDB as its data store. Users have been asking for a search feature to find other gamers by name, with partial matches if possible. Which AWS technology do you recommend to implement this feature?




Good job!
Anytime you see “search”, think ElasticSearch.



Question 12:
An AWS service allows you to create, run, and monitor ETL (extract, transform, and load) jobs in a few clicks.


AWS Glue


Good job!
AWS Glue is a serverless data-preparation service for extract, transform, and load (ETL) operations.















Table of Contents