blog;

AWS.

Amazon Web Services

AWS.

Originally posted on Mon May 10 2021

Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally.
Source: https://aws.amazon.com/what-is-aws/?nc1=f_cc

Amazon Web Services, or AWS from here on out, is a cloud services platform from Amazon. AWS offers lots of different features, including computing power, storage, content delivery, security, and much much more. Many big businesses around the world use AWS for their cloud services, and if you watch sport you may see powered by AWS appearing to give you statistics and analysis, including in Formula 1 and the NFL.

Powered by AWS
Powered by AWS

I wanted to do a quick write up on some of the features AWS provides, as they form the basis of a cloud based application, and knowing about them is helpful when you are working on a system design.

With AWS you could set up a complete web application:

  1. You can host the application on an EC2 instance or Lambda.
  2. Data can be securely stored in S3
  3. The data can be managed using a SQL or NoSQL database.
  4. Files can be hosted on CDNs around the world. (See more on CDNs in this post).

Computing

Your application will likely live on either an EC2 instance or a Lambda. The one you choose depends on what you want your application to do.


EC2

AWS Elastic Compute Cloud (EC2) is a service where you can create virtual machines (EC2 Instances) and scale them easily. As they are full virtual machines, you can change the amount of disk space, CPU performance, memory, OS etc. You have root access to the machine, and can install your application to run on it. This is good for cloud hosting, as you can deploy servers as instances in the cloud. The EC2 cloud service provides automatic scaling and load balancing.


Lambda

AWS Lambda is a platform that lets you run a piece of code written in one of a list of supported languages: Java, JavaScript or Python. The code runs when it is triggered by an event of some kind. The code only runs when it is needed, so you don't need to worry about server management or environment configuration. Lambda is often referred to as serverless for this reason. Resources are provided by Amazon in accordance with application needs, and scaling is automatic and seamless.


EC2 vs Lambda

EC2 is very popular as it can be used for almost anything due to it's highly configurable nature. Some common uses are:

  • Hosting websites
  • Developing and testing applications
  • High performance computing
  • Disaster recovery

Lambda can be used for:

  • Automating tasks
  • Processing objects and uploading to S3
  • Log analysis
  • Filtering and transforming data

The list of triggers for a Lambda is below:

  • API Gateway
  • AWS IoT
  • Alexa Skills Kit
  • Alexa Smart Home
  • Application Load Balancer
  • CloudFront
  • CloudWatch Events
  • CloudWatch Logs
  • CodeCommit
  • Cognito Sync Trigger
  • DynamoDB
  • Kinesis
  • S3
  • SNS
  • SQS

Security

EC2 instances require you to take care of security. You can manually add a firewall using Amazon's VPC (Virtual Private Cloud) Firewall. You can also manually add antivirus software, create IAM roles, add security groups and specify permissions. AWS System Patch Manager allows you to install patches automatically.

Q: What is AWS Identity and Access Management (IAM)?

You can use AWS IAM to securely control individual and group access to your AWS resources. You can create and manage user identities ("IAM users") and grant permissions for those IAM users to access your resources. You can also grant permissions for users outside of AWS (federated users).
Source: https://aws.amazon.com/iam/faqs/

A Lambda will have access to certain AWS services by default. Each Lambda should have an IAM role configured which will allow it to connect to other Amazon services without needing to authorise. There is no need to patch or update a Lambda, as that is handled by Amazon.


Availability

Once started and EC2 instance will keep running until it is terminated manually or via a scheduled task. Any applications running on the instance will be available all the time. That makes it good for applications that run regularly for the entire day.

A Lambda is always available but is not running at all times, it is inactive until it is triggered. A Lambda will time out after being activated, with a maximum limit of 15 minutes. A Lambda's memory is limited to 3008MB, and you can execute thousands of Lambda's simultaneously. Due to the Lambda not running until it is triggered, there can be up to 100 milliseconds of delay between the request being made and the application executing, and if it needs information from S3 to run then there will be a further delay.

If you need more memory or need a quicker response, then an EC2 is a better option.


Storage

S3

Amazon Simple Storage Service (S3) is a cloud object store in AWS, it can store objects like files, folders, images, documents, songs etc. It is designed to have 99.99999999999% durability (ELEVEN 9s!) and to be highly scalable, reliable and fast.

In S3 data is stored in Buckets, which are persisted in a certain availability region. There are a few different S3 storage classes:

  1. Standard - High availability, durability and performance. Low latency and high throughput, and resilient as it is available across availability zones.
  2. Intelligent-Tiering - Designed to optimize costs by moving objects across four access tiers; 2 low latency tiers for high and low frequency access, and 2 archive tiers.
  3. Standard-Infrequent Access - Not accessed as often as the Standard class, but still required to be accessed rapidly when it is needed.
  4. One Zone-Infrequent Access - Same as above, but instead of being stored in 3 Availability Zones like the others, it is only stored in one. This reduces the cost but makes it less resilient.
  5. Glacier - For archiving data which you will not need to access frequently or quickly.
  6. Glacier Deep Archive - Like Glacier but cheaper, as it is for data that you will only need to access once or twice a year.
S3 Storage Classes
S3 Storage Classes

S3 is the most used, and is the one I have the most experience with, but I will touch on others briefly.

EFS

EFS (Elastic File System) provides file storage for use with your EC2 instances. It uses NFSv4 protocol and can be used concurrently by thousands of instances.

Storage Gateway

Storage Gateway is a virtual machine that you install on your on-premise servers. Your on-premise data can be backed up to AWS providing more durability.


Databases

AWS has many different database services:

AWS Database Services
AWS Database Services
Source: https://aws.amazon.com/products/databases/

In this blog I just want to focus on one example of a SQL database, and one NoSQL database.


RDS

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the AWS Cloud.
Source: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html

Amazon RDS provides 6 options for database engines that are commonly used: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server. It allows you to run a fully featured relational database and offload the database administration. Using RDS means that you can offload configuration, maintenance and security to AWS, as they can handle that for you.

Scaling

RDS can be scaled vertically or horizontally.

Vertically

Vertical scaling is when you add more resources to the machine, for example more RAM, CPU power or disk space. The master RDS database can be vertically scaled with the click of a button, which will enable it to handle a higher load. If you have multiple availability zones, then there is very minimal downtime when scaling, as the standby database will be upgraded first, then failed over to when the original database is upgraded.

Horizontally

Horizontal scaling is when you add more machines to handle more load. In RDS, horizontally scaling allows you to add up to 5 read replicas (15 if you are using Amazon Aurora). A read replica is a read only copy of your database that is synchronised with the main database. The replicas can be placed in different regions to improve performance for users in those areas. In the event of a failure in the main database, one of the replicas can be promoted to main to improve recovery time, however having multiple availability zones is a better option. Having replicas also means you can support load balancing. Each replica has it's own DNS endpoint so applications can decide which one to connect to.

RDS Read Replicas Behind Load Balancer
RDS Read Replicas Behind Load Balancer
Source: https://aws.amazon.com/products/databases/

Sharding

Sharding is a technique that splits data into smaller subsets and distributes them across a number of physically separated database servers. Each server is referred to as a database shard. All database shards usually have the same type of hardware, database engine, and data structure to generate a similar level of performance. However, they have no knowledge of each other, which is the key characteristic that differentiates sharding from other scale-out approaches such as database clustering or replication.

Sharding offers more scalability and fault tolerance. If one database shard has a hardware issue or goes through failover, no other shards are impacted because a single point of failure or slowdown is physically isolated.

However sharding has drawbacks, as specially engineered queries are needed to read and join data across multiple shards.


DocumentDB

Amazon DocumentDB is a non-relational database service designed from the ground-up to give you the performance, scalability, and availability you need when operating mission-critical MongoDB workloads at scale.
Source: https://aws.amazon.com/documentdb/

In Amazon DocumentDB, the storage and compute are decoupled, allowing each to scale independently. As it is a document db, your data will all be stored as JSON files.

Amazon DocumentDB is resilient too, designed for 99.99% availability and replicates six copies of your data across three AWS Availability Zones. The DocumentDB clusters are monitored continuously automatically to ensure they are healthy, and if any fail they are restarted automatically. Any database cache is isolated so that it is not lost upon restart. On instance failure, DocumentDB automates failover to one of up to 15 replicas you have created in any of three availability zones.

DocumentDB is also fault tolerant, as each 10GB portion of storage is replicated six ways, across three availability zones. DocumentDB also automatically backs up to S3 continuously, allowing you to restore back to any point in your configured retention period (up to 35 days).

You can also add encryption with AWS Key Management System (KMS). On a cluster running with Amazon DocumentDB encryption, data stored at rest in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster. By default, connections between a client and Amazon DocumentDB are encrypted-in-transit with TLS.

Scaling

As AWS DocumentDB has storage and compute separated, you can scale both horizontally and vertically depending on your needs.

Vertically

You can add more computing power and memory by creating a new instance with the required size. Then you can move to the new instance, which usually takes a few minutes.

Horizontally

DocumentDB allows you to increase the read capacity to millions of requests per second by adding up to 15 low latency read replicas, regardless of the size of your data.


I also want to talk about Caching, as that can be important for cloud applications that want to be as responsive as possible.


ElastiCache

Amazon ElastiCache allows you to seamlessly set up, run, and scale popular open-source compatible in-memory data stores in the cloud.
Source: https://aws.amazon.com/elasticache/

ElastiCache allows you to create a high throughput low latency in-memory data store for when you want data retrieved quickly. It supports Redis and Memcached, which are two of the most popular databases caches.

I'm not going to talk about the uses of caches here, as that's not entirely relevant, but in AWS the cache helps take load away from the primary database, and make information that is used often easier and quicker to access.

Similarly to with the databases, AWS takes care of a lot for you when you use ElastiCache, such as hardware provisioning, software patching, setup, configuration, monitoring, failure recovery, and backups. It can scale both vertically and horizontally in the same way too, adding more memory for increasing writes, or replicas to improve read times.


Security

As well as using IAM to control who can access what, there are some more security features in AWS which help keep your applications safe.

Amazon Virtual Private Cloud (VPC) is a service in AWS that lets you keep your cloud instances isolated in a virtual network that you have complete control over. Inside the VPC you can have a public-facing subnet for any web servers that need to access the internet, and a private-facing subnet with no internet access for anything else. The VPC can then be configured so it can only be accessed by certain IP addresses, meaning you would either need to be in a certain premises or using a VPN to gain access.

Most of the services I have mentioned already use IAM and/or Role-Based Access Control (RBAC) which means that only authorised users can make certain changes.


Amazon CloudFront

Amazon's CDN service, CloudFront, is distributed across the globe, so it can deliver low-latency performance for users in almost any location. I have talked about CDN's and their uses before (here), so won't go into detail on that. CloudFront however will link to all your other AWS services which may make it preferable if you are running your application in AWS. They also offer security protection on their CDN's, so that is another thing you don't need to worry about.


Regions and Availability Zones

I will let AWS explain what these are:

Regions

AWS has the concept of a Region, which is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of multiple, isolated, and physically separate AZ's within a geographic area.

Availability Zones

An Availability Zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center.
Source: https://aws.amazon.com/about-aws/global-infrastructure/regions_az/

So for example you could have the region eu-west-2, which is located in London. Then within that region could be 3 availability zones.

AWS Regions & Availability Zones
AWS Regions & Availability Zones

Having your application and database replicas split over AZ's allows you to be more fault tolerant, as you can failover to another AZ if needed. The network performance is sufficient to accomplish synchronous replication between AZs.

For disaster recover, having replicas in just one AZ would not be enough, as the data centres for an AZ are in the same geographic location, therefore having backups in another region is also a good idea.


API Gateway

API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, CORS support, authorization and access control, throttling, monitoring, and API version management.
Source: https://aws.amazon.com/api-gateway/

The API Gateway, as it sounds, acts as a barrier between incoming API requests and your application. It gives you the ability to check authorisation, validate the requests and transform them before they reach your application, and transform and validate your responses.

AWS API Gateway
AWS API Gateway
Credit for Inspiration: https://www.alexdebrie.com/posts/api-gateway-elements/

Using an API Gateway is useful because it provides a single entry point for your application. Depending on the request you can then route it to the correct backend application. It also allows you to rate limit requests hitting your endpoint, to help avoid things like DDoS attacks. You can make changes to your backend without effecting the endpoint the customer sees, as the API Gateway can stay the same.


CloudWatch

Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers.
Source: https://aws.amazon.com/cloudwatch/

CloudWatch collects logs, metrics and events to give you a view of your AWS resources and applications. These logs can help you identify issues, set alerts, and generally discover insights on your application.

For example, CloudWatch might see that your current EC2 instances are struggling to handle load, and could start the EC2 Auto Scaling.

AWS CloudWatch
AWS CloudWatch
Source: https://aws.amazon.com/cloudwatch/

I have really only scratched the surface on AWS here, but I think using what I have covered in this blog post, you could form the basis of a cloud based application, which is really what I wanted to cover!


Me

Post by

Josh Glasson

Software Developer. Creator and owner of this blog.