AWS – Choosing an AWS Database

Created OnAugust 28, 2022

Last Updated OnSeptember 22, 2022

byAdmin

AWS offers several db solutions.

A number of questions need to be considered when choosing an AWS DB solution:

Questions to Consider

What is your need – read heavy, write heavy, balanced?

Throughput volume?

Will this change, does it need to scale, will it fluctuate up and down?, is it predictable or not?

How much data volume to store – and for how long

will it grow?

what is the average object size?

how do you need to access it?

what data durability do you require? what is the “source of truth” for the data?

are there compliance requirements?

what latency requirements – how many concurrent users

what is the data model – how will you query the data, will there be data joins? structured, semi-structured?

strong schema, flexible, reporting needs? searching? RBDMS or NoSQL?

what about license costs – cloud native db such as Aurora possible or not?

Overview of Database Types on AWS

RDBMS such as sql, oltp, this means: RDS, or Aurora, esp good for joins

NoSQL DBs such as DynamoDB – json, elasticache – key/value pairs or Nepture – good for graphs, but no joins and no sql

Object Stores: S3 for big objects, Glacier for backup and archives – may not seem like a DB but it works like one

Data Warehouse solutions eg SQL Analytics/BI, Redshift (OLAP), Athena

Search solutions: eg ElasticSearch (json), for free text unstructureed searches

Graph solutions: Neptune – this displays relationships between data

Overviews of AWS DB Solutions

RDS Overview

its a managed db on the postgresql/myswl/Oracle/SQL level

you must however an ec2 instance and ebs vol type and sufficient size

it supports read replicas and multi-AZ
security is via iam and security groups, kms, and ssl in transit
backup, snapshot and point in time restores all possible

managed and scheduled maintanance

monitoring available via cloudwatch

use cases include:

storing relational datasets rdbms/oltp performing sql queries, transactional inserts, update, delete is possible

rds for solutions architect, considerations include these “5 pillars”:

operations
security
reliability
performance
cost

operations_ small downtimes when failover happens, when maintenance happens, when scaling read replicas, ec2 instances, and restoring from ebs, this requires manual intervention, and when application changes

security: aws is responsible for os security, but we are responsible for setting up kms, security groups, iam policies, authorizing users in db and using ssl

reliability: the multi-az feature makes rds v reliable, good for failover in failure situations

performance: dependent on ec2 instance type, ebs vol type, can add read replicas, storage autoscaling is possible, and manual scaling of instances is also possible

costs: is pay per hour based on provisioned number and type of ec2 instances and ebs usage

Aurora Overview

Compatible API for PostgreSQL and MySQL

Data is held in 6 replicas across 3 AZs
It has auto-healing capability
Multi-AZ Auto Scaling Read Replicas
Read Replicas can be global

Aurora DB can be global for DR or latency purposes
auto Scaling of storage from 10GB to 128 TB

Define EC2 instance type for Aurora instances

same security/monitoring/maintenance as for RDS but also has

Aurora Serverless for unpredictable/intermittent workloads
Aurora Multi-Master for continuous writes failover

use cases: same as for RDS but with less maintenance/more flexibility and higher performance

Operations: less operations, auto scaling storage

security: AWS as per usual, but we are responsible for kms, security groups, iam policy, user auth and ssl

reliability: multi AZ, high availability, serverless and multimaster options

performance 5x performance, plus max 15 read replicas (vs only 5 for rds)

cost: pay per hour acc to ec2 and storage usage, can be lower cost than eg oracle

ElastiCache Overview

it is really just a cache not a database

is a managed Redis or Memcached – similar to RDS but for caches

its an in-memory data store with very low latency
you must provision an ec2 instance type to use it

supports redis clustering and multi-az, plus read replicas using sharding

security is via iam, security groups, kms, and redis auth – NO IAM

backup, snapshot, point in time restores
managed and scheduled maintenance
monitoring via cloudwatch

use case: key/value store, frequent reads, less writes, cache results for db queries, storing of session data for websites, but cannot use sql – latter is important to be aware of.

you retrieve the data only by key value, you can’t query on other fields

operations: same as rds
security: usual can use iam policies,for users Redis Auth, and ssl

reliability: clustering and multi AZ

performance: in memory so extremely fast, read replicas for sharding, very efficient

cost: similar to rds pricing based on ec2 and storage usage

DynamoDB

proprietary to AWS
a managed NoSQL DB
serverless, provisioned, auto-scaling, on-demand capacity

can replace elasticache as a key-value store, eg for storing session data

performance is slower than eg rds

highly available, multi-AZ by default, read–writes decoupled, DAW available for read cache

2 options for reads: eventually consistent or strongly consistent

security, authorization-authentication all done via iam

dynamodb streams integrate with lambda

backup-restore and global table feature – if you enable streams

monitoring via cloudwatch

important
but you can only query on primary key, sort key or indexes – exam q!
so you cannot query on “any” attribute – only the above.

use case:

serverless apps development, small docs 100s kb, distrib serverless cache, doesn’t have sql query language available, has transaction capability now built in

S3 Overview

acts as a simple key/value store for objects

great for big objects up to 5TB, not so good for small objects

serverless, scales infinitely, strong consistency for every operation

tiers for migrating data: s3 standard, s3 IA, s3 one-zone IA, Glacier for backups

features include: versioning, encryption, CRR – cross-region replication

security: IAM, bucket policies, ACL

encryption: SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit

use case: static files, key-value stores for big files and website hosting

operations: no operations necessary!
security: IAM, bucket policies, ACL, encryption set up correctly,

reliability: extremely high, durability also extremely good, multi-AZ and CRR

performance: scales easily, very high read/write, multipart for uploads

cost: only pay for storage used, no need to provision in advance, plus network costs for transfer/retrieval, plus charge per number of requests

Athena Overview

fully serverless database with SQL capability
used to query data in S3
pay per query
outputs results back to S3
is secured via IAM

use cases: one-time sql queries, serverless queries on S3, log analytics

Operations: no ops needed! is serverless

security: via S3 using bucket policies, IAM

reliability: is managed, uses Presto engine, highly available
performance: queries scale based on data size

cost: pay per query/per TB of data scanned, serverless

Redshift Overview

is a great data warehouse system

based on PostgreSQL but not used for oltp

instead, its an OLAP – online analytical processing — analytics and data warehousing

10x better performance than other data warehouses, scales to PBs of data

columnar storage of data rather than row-based

it is MPP – uses a massively parallel query execution engine which makes it extremely fast

pay as you go acc to no of instances provisioned
has an SQL interface for performing the queries.

data is loaded from S3, DynamoDB, DMS and other DBs,
1 to 128 nodes possible with up to 128 TB of space per node!

leader node used for query planning and results aggregation
compute node : performs the queries and sends results to leader node

Redshift Spectrum: performs queries directly against S3 with no load to load the data into the redshift cluster

Backup & Restore, SecurityVPC, IAM, KMS, monitoring

Redshift Enhanced VPC Routing: Copy / Unload goes through VPC, this avoids public internet

Redshift Snapshots and DR

has no multi-AZ node

the snapshots are p-i-t- point in time backups of a cluster, stored internally in S3

they are incremental — only changes are saved, makes it fast

can restore to a new cluster

automated snapshots: every 8 hrs every 5gb or according to a schedule, with a set retention period

manual snapshots: snapshot is retained until you delete it

neat feature:
you can configure Redshift to auto copy snapshots to a cluster of another region
either manually or automatically

loading data into redshift:

there are three possible ways:

1. use kinesis data firehouse loads data into redshift cluster via an s3 copy automatically to an S3 bucket

2. using the copy command manually without Kinesis
from S3 bucket via internet – ie without using enhanced vpc routing to redshift cluster
or via vpc not via internet, using enhanced vpc routing

3. from ec2 instance to redshift cluster using jdbc driver
in this case it is much better to write the data in batches rather than all at once.

Redshift Spectrum

must already have a redshift cluster operational

Spectrum is a way to query data that is already in S3 without having to load it into Redshift

you submit the query which is submitted to thousands of Redshift spectrum nodes for processing

Operations: similar to rds
security: uses iam, vpc, kms, ssl as for rds
reliability: auto healing, cross region snapshop copies possible
performance: 10x performance of other data warehousing systems, uses compression
cost: pay per node provisioned about 10% of cost of other dw systems
vs athena: is faster at querying, does joins, aggregations thanks to indexes

redshift = analytics/BI/Data Warehouse

Glue Overview

is a managed extract transform and load ETL service

used to prepare and transform data for analytics
fully serverless service

fetch data from s3 bucket or rds, send to glue which does extracting, transforming and loading to redshift data warehouse

glue data catalog: catalog of all the datasets you have in aws – ie metadata info about your data

you deploy glue data crawler to navigate s3, rds, dynamo DB, it then writes the metadata to the glue data catalog.

this can then be used b Glue Jobs eTL

data discovery -> Athena, Redshift Spectrum, EMR for analytics

all you need to know about Glue for exam

Finally,

Neptune Overview

tip:
if you hear graphs mentioned in exam, this refers to neptune!

is a fully managed graph database

used for
high relationship data
social networking, eg users friends with users, replied to comment on post of user and likes other comments

knowledge graphs eg for wikipedia – links to other wiki pages, lots of these links
this is graph data – giving a massive graph

highly available across 3 AZs with up to 15 read replicas available

point in time recovery, continuous backup to S3

support for kms encryption, https, and ssl

operations: similar to rds
security: iam, vpc, kms, ssl – similar to rds, plus iam authentication
reliability: multi-AZ and clustering
performance: best suited for graphs, clustering to improve performance

cost: similar to rds, pay per provisioned node

remember neptune = graphs database

AWS OpenSearch

– the successor to AWS ElasticSearch

eg dynamodb only allows search by primary key or index,

whereas with OpenSearch you can search ANY field

often used as a complement to other db

also has usage for big data apps
can provision a cluster of instances
built-in integrations: various – kinesis data firehose, aws IoT, CloudWatch Logs

comes with visualization dashboard

operations: similar to rds

security: is via Cognito, IAM, KMS encryption, SSL and VPC
reliability: multi-AZ and clustering
performance: based on elasticsearch project – open source, pbyte scale
cost: pay per provisioned node

all you need to remember:

used to search and index data

Question 1:
Which database helps you store relational datasets, with SQL language compatibility and the capability of processing transactions such as insert, update, and delete?

RDS

Question 2:
Which AWS service provides you with caching capability that is compatible with Redis API?

ElastiCache

Good job!
Amazon ElastiCache is a fully managed in-memory data store, compatible with Redis or Memcached.

Question 3:
You want to migrate an on-premises MongoDB NoSQL database to AWS. You don’t want to manage any database servers, so you want to use a managed NoSQL database, preferably Serverless, that provides you with high availability, durability, and reliability. Which database should you choose?

Amazon DynamoDB

Good job!

Amazon DynamoDB is a key-value, document, NoSQL database.

Question 4:
You are looking to perform Online Transaction Processing (OLTP). You would like to use a database that has built-in auto-scaling capabilities and provides you with the maximum number of replicas for its underlying storage. What AWS service do you recommend?

Amazon Aurora

Good job!

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database. It features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across 3 AZs.

Question 5:
As a Solutions Architect, a startup company asked you for help as they are working on an architecture for a social media website where users can be friends with each other, and like each other’s posts. The company plan on performing some complicated queries such as “What are the number of likes on the posts that have been posted by the friends of Mike?”. Which database do you recommend?

Amazon Neptune

Good job!
Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets.

Question 6:
You have a set of files, 100MB each, that you want to store in a reliable and durable key-value store. Which AWS service do you recommend?

Amazon S3

Good job!
Amazon S3 is indeed a key-value store! (where the key is the full path of the object in the bucket)

Question 7:
You would like to have a database that is efficient at performing analytical queries on large sets of columnar data. You would like to connect to this Data Warehouse using a reporting and dashboard tool such as Amazon QuickSight. Which AWS technology do you recommend?

Amazon Redshift

Good job!
Amazon Redshift

Question 8:
You have a lot of log files stored in an S3 bucket that you want to perform a quick analysis, if possible Serverless, to filter the logs and find users that attempted to make an unauthorized action. Which AWS service allows you to do so?

Amazon Athena

Good job!
Amazon Athena is an interactive serverless query service that makes it easy to analyze data in S3 buckets using Standard SQL.

Question 9:
As a Solutions Architect, you have been instructed you to prepare a disaster recovery plan for a Redshift cluster. What should you do?

enable automated snapshops, then configure your redshift cluster to autocopy the snapshots to another aws region

Good job!

Question 10:
Which feature in Redshift forces all COPY and UNLOAD traffic moving between your cluster and data repositories through your VPCs?

Enhanced VPC Routing

Good job!

Question 11:
You are running a gaming website that is using DynamoDB as its data store. Users have been asking for a search feature to find other gamers by name, with partial matches if possible. Which AWS technology do you recommend to implement this feature?

ElasticSearch

Good job!
Anytime you see “search”, think ElasticSearch.

Question 12:
An AWS service allows you to create, run, and monitor ETL (extract, transform, and load) jobs in a few clicks.

AWS Glue

Good job!
AWS Glue is a serverless data-preparation service for extract, transform, and load (ETL) operations.

Tags:

Navigation

How Can We Help?

AWS – Choosing an AWS Database

Questions to Consider

Overview of Database Types on AWS

Overviews of AWS DB Solutions

RDS Overview

Aurora Overview

ElastiCache Overview

DynamoDB

S3 Overview

Athena Overview

Redshift Overview

Redshift Spectrum

Glue Overview

Neptune Overview

AWS OpenSearch