- S3
- IAM
- How to read Policy documents
- EC2
- Serverless
- DynamoDB
- Elastic Beanstalk
- CI/CD
- Containers on AWS
- Messaging Services
- Knowledgeful Handson
- Answers
-
Region is a location - Sydney, Capetown, Mumbai etc. Each Region contains 2 or more AZs.
-
Availability Zone is like a datacenter - Hitech Campus Eindhoven. This can be more than one datacenter. Each AZ in a region are like 100Kms from each other.
-
Edge Location - Endpoints for caching like CloudFront (CDN). More than 200 edge locations.
-
Whitepapers: https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html?did=wp_card&trk=wp_card
- How do you deploy a full stack application to AWS.
- How does the application connect to the database.
- *How are REST APIs exposed to outside world.
- *What are VPCs.
- What are Roles, IAM, security groups.
- 🏃 How are logs of an application viewed in AWS
- Deploy a NodeJS application to EC2 and EBS
- 👍 Deploy a SpringBoot application to EBS
- *How are docker based applications deployed to AWS
- Ecs service
- Where does K8s come into picture for AWS
- Build a simple application that uploads file to S3
- 👍 Build a simple application on SpringBoot and deploy to AWS using Elastic Beanstalk
- 👍 Build a simple application with a front end that communicates to AWS backend on EBS.
- 🏃 Build an app that retuns a list of movies of a given genre.
- *Build an app that leverages AWS load balancer
- Application load balancer
- Build a simple application on SpringBoot and deploy it to containers and then to AWS using containers
- CI/CD of a simple application
- How to connect to an Elastic Beanstalk server using SSH or RDP?.
- Storage
- S3
- Database
- DynamoDB
- ElasticCache
- Amazon Keyspaces
- RDS
- Servers
- EC2
- Elastic Beanstalk
- Lambda
- Containers
- Elastic Container Registry
- Elastic Container Service
- Elastic Kubernetes Service
- Deploying applications
- Logging / Monitoring
- Networking
- VPC
- Route 53
- API Gateway
- CI/CD
- CodeCommit
- CodeBuild
- CodeDeploy
- CodePipeline
Category | Service | Description |
---|---|---|
Storage | S3 | Simpe Storage Service. S3 offer 99.999999999 durability and 99 availability. It also offers various Storage classes. Data is automatically distributed across minimum of three physical Availability Zones within the AWS region. https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html https://aws.amazon.com/s3/pricing/ |
Serverless | Lambda | Serverless |
Logging | Cloud Watch | Application and Infrastructure Monitoring https://aws.amazon.com/cloudwatch/ |
CICD | CodeCommit | AWS CodeCommit is a secure, highly scalable, managed source control service that hosts private Git repositories. |
CICD | CodePipeline | Automated tests and deployment into environments when the source code changes |
CICD | CodeBuild | Fully manage service that compiles, tests and builds the deliverables for deployment |
CICD | CodeDeploy | Automates code deployments to any instances |
CICD | CodeStar | Used to create pipelines using the above 4 services as a wrapper |
- S3 is object based storages - Image, text, webpage. So can't be used as a database. S3 scales on demand.
- Unlimited storage. Object max size is 5TB.
- S3 buckets are like folders in S3 - Finance, HR etc, but need to have Universal Unique name.
- URL for the bucket looks like: https://bucket-name.s3.Region.aws.com/key-name
- When we upload an object to a bucket, each object would have the below information:
- Key for the objects is typically the name of the objects.
- Value is the actual content made up of sequence of bytes.
- Metadata is the actual metadata associated with the objects ( last accessed, content type etc ).
- Version ID To identify multiple versions of the objects.
- S3 Standard is the default version.
- High Availability 99.99%
- High Durability 99.999999999%
- Frequent access
- CDN, Mobile Apps, Big Data are applications.
- Lifecycle Management is the scheme for moving data that meets a certain threshold to move to cheaper S3 options like glacier/deep glacier.
- Bucket Policies work at Bucket level and specify what actions are allowed or denied at bucket level. Ex: PUT are allowed but not DELETE.
- ACLs work at individual object level and define which AWS accounts are granted access and they type of access. The ACLs can be attached on individual objects within a bucket as well.
- Strong Read-After-Write Consistency.
Object vs Block Storage ( https://cloud.netapp.com/blog/block-storage-vs-object-storage-cloud )
- Object storage is useful for data that does not change often. Write-once Ready-Many times usecases. Also this is compatible with distributed systems - data stored in multiple nodes.
- Block storage is useful for holding databases/caches. Also for high intensive IO operations.
- Searches are fast with Object storage. Big Data analytics usecases.
- Updates with Block storage are easier as we have access to individual blocks. With object storage, this is not possible. Entire file has to be created again.
- S3 offers Object storage. Amazon Elastic Block Storage offers Block storage solutions.
- https://aws.amazon.com/s3/faqs/
- Is a storage service with high durability and availability and can hold files/objects and host a static website, serve as an archive of data.
- Various storage classes exist that have varying cost implications.
- Different storage classes have different replication behavior.
- Can encrypt data using security keys.
- Lifecycle policy helps in moving data between storage classes, purging after a certain threshold etc.
- Replication policy helps in copying the data to a different location / region to help in minimising the access times.
- Individual objects within the S3 bucket can be shared for public access while the bucket itself is private.
- Selected objects can be specifically shared in an S3 bucket while the bucket itself is not shared.
- S3 has different storage classes that are used based on the frequency of access.
- Archive feature is related to S3 Glacier / S3 Deep Archive Glacier.
- Bucket Policy determines who all can use/access the bucket/its content.
- /* means making the Policy applicable to all the objects in the Bucket.
- We create an S3 bucket and upload the error.html and index.html files in it.
- We then update the policy for this bucket in its Permissions section.
- We also configure the Static website hosting in bucket Properties.
- Versioning can be configured at Bucket level so that when a file with same name is uploaded, a new version gets created.
- Versioning cannot be disabled, only suspended.
- Policies do not apply to the previous versions of the objects in the bucket. Explicit access needs to be granted via Make Public action.
- Deleting an object that has versioning turned on, will create a Delete marker instead. If this marker is deleted, then the objects are available again.
- Unlimited number of versions can be created. https://acloud.guru/forums/aws-csa-2019/discussion/-LzddK__zQps2CcoZum9/How%20many%20versions%20of%20a%20file%20can%20be%20saved%20in%20S3%3F
- Public access can be granted from the UI of AWS on the Bucket/Object. But with a Bucket Policy, better control is provided statements/principals/actions/resources.
{
"Version": "2012-10-17",
"Id": "unique-id-to-describe-below-statement",
"Statement": [
{
"Sid": "unique-sid",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::testbucket99912121/*"
}
]
}
- Create Bucket 1 and Bucket 2
- In Management of Bucket 1, create a Replication Rule with Source as Bucket 1 and Target as Bucket 2.
- Any new files that are updated to AWS in Bucket 1 will now sync to Bucket 2 automatically.
- ❓ How to sync existing ones?.
- Objects that are stored in S3 buckets may need to be deleted or moved to a different less costly storage after some time. Further, there may be certain objects in an S3 bucket that are to be archived. All these are configured using a Lifecycle rule.
- Lifecycle configuration also applies to versions of the objects.
- By defauly, Delete markers are not replicated. This could be turned on though.
- A note on encryption. https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-encryption.html
- Object locks are of two types - Governance and Compliance.
- They determine whether an update to an object is possible or not.
- Compliance Mode - An Object cannot be overwritten or deleted by any user, including the root user.
- Governance Mode - An Object cannot be overwritten or deleted by a user, unless that user has special permissions.
- Encryption in Transit
- HTTPS
- SSL/TLS
- Encryption at Rest: Server side Encryption
- SSE-S3: S3 managed keys, using AES 256 (x-amz-server-side-encryption:AES256)
- SSE-KMS: AWS KMS managed keys (x-amz-server-side-encryption:aaws:kms)(limites to upload and download)
- SSE-C: Customer provided keys
- Encryption at Rest: Client side Encryption
- Encrypt of files yourself before upload to S3
- S3 bucket policy can be introduced to stop the upload of files for the PUT requests that do not have x-amz-server-side-encryption flag.
- Using prefixes help in the performance of READs
- Prefix is nothing but the hierarchy. For ex: mybucketname/folder1/subfolder1/myfile.jpg.
- You can achieve better performance if the prefixes are more.
- Your applications can easily achieve thousands of transactions per second in request performance when uploading and retrieving storage from Amazon S3. Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by parallelizing reads. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second. Similarly, you can scale write operations by writing to multiple prefixes.
- Multipart uploads are used to upload larger files in S3 efficiently ( better to use for > 100MB and definitely > 5GB ).
- Use S3 byte range fetches to increase performance when downloading files to S3.
- When using KMS, there is an upper limit. Breaching this would slower performance.
- S3 is Simple Storage Service - highly available and durable.
- A static web hosting option is possible.
- Making a bucket and its content public can be done via a JSON policy.
- Is Object based storage and so is suitable for write-once and read-many times scenario.
- Not suitable as a database or cache.
- Storage classes decide the ease of retrieval and costs involved.
- Can configure replication rules to move data between regions, lifecycle rule to move data between different storage classes.
- Different versions of the same object could also live, which are configured by versioning rules.
- Encryption is supported - client side and server side.
- Prefixes to names help in increased performance.
- Root Account - Has full administrative account for the AWS account.
- Multi Factor Authentication configuration lets you secure the root account.
- Alternative options are creating an admin group for administrators and assigning the appropriate permissions to this group followed by creating user accounts to these administrators.
- Policies
- Access is controlled via policy documents, which are JSON documents.
- Ex: Allow the ability to do everything with every resource - user or group to which this policy is applied to.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
- In general, for ease of maintenance, policies are not applied to Users. Instead they are applied at Group level and Users are added to those groups.
- Users, Groups, Policies are Global.
- Building Blocks
- Users(Physical person)
- Groups(Functions such as Adminstrator, Developer)
- Roles(internal use in AWS, allows one part of AWS to access another part of AWS).
- Principle of least privilege
- Give minimum privilege and add more privilege as per need.
- When a new user is created, by default he does not have any access.
- SAML - To use the same username and password that a user uses to log into Windows, to log onto AWS. IAM Federation.
- Elastic Compute Cloud
- Types of instances
- On Demand
- Pay be the hour or second.
- Reserved
- Reserve the capacity for 1-3 years.
- Spot Instances
- Buy unused instances at cheaper prices. Good for applications that have flexible start and end times.
- Dedicated
- Banking, Finance applications could have certain regulations to not deploy in multi tenancy virtualization.
- On Demand
- EC2 instances need VPC for operation.
- A brief intro to the differences between these two services.
- Application Load Balancer
- The listeners work at layer 7 and have access to request headers.
- URL based, Host based, Query String based load balancing.
- This is suited for Microservices based applications as this service can load balance Containers as well.
- Works with Http, Https, gRPC protocols.
- Network Load Balancer
- Operates at Network level.
- TLS offloading can be done at Network Load Balancer.
- Suited for cases where Low latency is expected.
- UDP, TCP, TLS protocols apply here.
- Both the load balancing services are newer generation.
- Classic load balancer are deprecated.
- We designate a group of EC2 instances as a target group and an Application Load Balancer service is configured to bound this target group.
- We can even configure sticky sessions in the target group - which may be useful for stateful applications.
- Two EC2 instances for which an ALB shall be configured
- Target group looks like below
Lambda
- Execution role is like the service account that is used to run the lambda function.
- Create an AWS Lambda Function with NodeJS as the language.
- The logging information from a Lambda function gets routed to AWS cloud watch.
- Notice the time billed for the operation.
- Custom logging information also goes to AWS Cloud Watch - console.xxx statements.
- For reference: https://docs.aws.amazon.com/lambda/latest/dg/nodejs-logging.html
- How to choose a service for a database
- https://www.youtube.com/watch?v=1aY2KJldLz0
- RDBMS
- RDS
- Aurora
- Key Value
- Amazon Keyspaces
- Document Databases
- Amazon DynamoDB
- Graph Database
- Neptune
- In Memory Databases / Caches
- Amazon ElasticCache
- Search Databases
- Amazon ElasticSearch
- A simple SpringBoot application that processes GET, POST, DELETE and PUT requests onto the URI /movies.
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("movies")
public class RestAPIController {
@GetMapping
public String getUser() {
return "Get Movies";
}
@PostMapping
public String createUser() {
return "Post Movies";
}
@PutMapping
public String updateUser() {
return "Put Movies";
}
@DeleteMapping
public String deleteUser() {
return "Delete Movies";
}
}
- Good read: https://cloudkatha.com/how-to-deploy-spring-boot-application-to-aws-elastic-beanstalk/
- Set up an environmental variable SERVER_PORT with value 5000.
- AWS had a new service release in 2017, called CodeStar.
- It is a fully managed service and encapsulates the other CI/CD services.
- We have to choose a template in CodeStar and it would create the other services as shown below based on our selection.
- CodeStart also provisions CodeCommit repository for the project and an update to the content in that repository triggers a CI/CD pipeline.
-
We then create an AWS CodePipeline, using Github V1 API connect it with Github choosing the repo and branch to which we are going to configure CI/CD. We also choose the Github webhook option which sends notifications to AWS on file commit event.
-
Now an update to the repo/branch configured in the above step would trigger a CICD pipeline and our changes will get deployed onto the Elastic Beanstalk instance.
-
Note: Build artifacts, example, cloned repo are configured to be stored in an S3 bucket.
- https://aws.amazon.com/getting-started/hands-on/set-up-ci-cd-pipeline/
- https://aws.amazon.com/blogs/devops/integrating-aws-codecommit-with-jenkins/
- Build a container in build step and deploy it to ECR, ECS/EKS in the deploy step.
- Fargate
- ECS
- Elastic Cloud Service Read on: https://towardsdatascience.com/deploying-a-docker-container-with-ecs-and-fargate-7b0cbc9cd608
- EKS
- ECR
- Copilot?
- What problem does message queues solve?
- Tight coupling: If the two systems are directly connected, a message cannot be sent to the second system if it is down.
- Performance: The producer can produce as many messages as they would want and the consumer would only process them at its own pace.
- Fully managed service with auto scailng.
- Encryption at rest is automatically provided. For encryption in transit, developers need to create key/value pairs and manually do the configuration.
- Composed of Topics ( think of it as an event type or a subject ) and Subscriptions to those Topics.
- App to App { SQS }
- App to Person { Email, SMS }
- SNS employs a Pub-Sub model.
- It is a notification service. Useful when more than one system is to be notified.
- SNS can send notifications to SQS which can then be used by the target application for instance.
- SNS vs SQS
- If other systems need to know about an event generated, use SNS.
- If only the current system cares about it, use SQS.
- SNS is a PUSH system. SQS is a PULL system.
- Usually, SQS content is READ by only a single system. SNS, on the other hand, almost always sends data to multiple systems.
- One SQL can broadcast messages to several SQL using SNS.
- SQS -> SNS -> SQS1
-> SQS2
-> SQS3
-
A number of users may be subscribed to promotion offers form retail chains, they are notified via these pub-sub architectures.
-
Connecting Lambda with SNS has one disadvantage that if the lambda processing fails, the message gets lost.
-
SNS can be integrated with HTTP endpoints, Emails, SMS messages.
-
Options
- We can configure who can send messages to SNS.
- Delivery retry policy.
- Logging facility @ CloudWatch.
- Filter Policy determines who among the subscribers should be sent the message.
- Message Attributes can be created that help in realizing Filter Policy.
- Ex: purchaseType -> Internet.
- Dead Letter Queue - Temp Queue that contains messages which could not be sent to the receiver.
- Push notifications are supported by SNS.
- One REST API endpoint does a GET Request to Subscribe an email address to an SNS topic.
- Another REST API endpoint does a GET REQUEST to publish the content to SNS topic. This action would make the SNS service to send the email/notification to all the subscriber i.e., first point.
- https://www.youtube.com/watch?v=lF7Ba4-8ER4
@Configuration
public class AWSSNSConfig {
@Primary
@Bean
public AmazonSNSClient getSnsClient(){
return (AmazonSNSClient) AmazonSNSClientBuilder.standard().withRegion(Regions.US_EAST_1)
.withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials("AKIA2TKUBX7JOPEJLYUV","PHmuqdARP+hgQ/A9qtAKQuNHyC2lptycfoTQwgx4")))
.build();
}
}
@SpringBootApplication (exclude = {ContextStackAutoConfiguration.class, ContextRegionProviderAutoConfiguration.class})
@RestController
public class SnsConnectorApplication {
@Autowired
private AmazonSNSClient snsClient;
String TOPIC_ARN="arn:aws:sns:us-east-1:728710102994:hello-queue-topic";
@GetMapping("/subscribe/{email}")
public String subscribe(@PathVariable String email) {
SubscribeRequest request = new SubscribeRequest(TOPIC_ARN, "email", email);
snsClient.subscribe(request);
return "Programmatic subscription is pending. Check email!" + email;
}
@GetMapping("/publish")
public String publish(){
PublishRequest request = new PublishRequest(TOPIC_ARN, "Testing on-going", "Notification: Hello!!");
snsClient.publish(request);
return "Programmatic Notification send successfully";
}
public static void main(String[] args) {
SpringApplication.run(SnsConnectorApplication.class, args);
}
}
- Reference: https://www.youtube.com/watch?v=q3zo3YREfJI
- Reference: https://cloud.spring.io/spring-cloud-static/spring-cloud-aws/1.2.3.RELEASE/multi/multi__messaging.html
- First connect to AWS using the credentials
@Configuration
public class AWSSQSConfig {
@Bean
public QueueMessagingTemplate queueMessageTemplate(){
return new QueueMessagingTemplate(amazonSQSAsync());
}
@Primary
@Bean
public AmazonSQSAsync amazonSQSAsync() {
return AmazonSQSAsyncClientBuilder.standard().withRegion(Regions.US_EAST_1)
.withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials("-----345AZO24Q7UTX","---------w5mueVvbR09OVuAc2XmBafv4zsgT")))
.build();
}
}
- Next, using the QueueMessageTemplate instance, we would send and receive messages with the AWS SQS Queue.
@SpringBootApplication (exclude = {ContextStackAutoConfiguration.class, ContextRegionProviderAutoConfiguration.class})
@RestController
public class SqsConnectorApplication {
Logger logger = LoggerFactory.getLogger(SqsConnectorApplication.class);
@Autowired
private QueueMessagingTemplate queueMessagingTemplate;
String endpoint = "https://sqs.us-east-1.amazonaws.com/872809293874/hello-sqs";
@GetMapping("/send/{message}")
public String sendMessage(@PathVariable String message) {
queueMessagingTemplate.send(endpoint, MessageBuilder.withPayload(message).build());
return "Programmatic message post is complete";
}
@GetMapping("/receive")
public String receiveMessage() {
String message = queueMessagingTemplate.receiveAndConvert(endpoint, String.class);
return "Programmatic message receive is complete" + message;
}
// For test only - @SqsListener("hello-sqs")
public void loadMessage(String message)
{
logger.info("Message from SQS Queue is " + message);
}
public static void main(String[] args) {
SpringApplication.run(SqsConnectorApplication.class, args);
}
}
- Create SNS
- Create SQS
- Subscribe the SQS to the SNS topic
- Grant the SNS topic access to send messages to SQS.
{
"Version": "2008-10-17",
"Id": "__default_policy_ID",
"Statement": [
{
"Sid": "__owner_statement",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::094002318819:root"
},
"Action": "SQS:*",
"Resource": "arn:aws:sqs:us-east-1:094002318819:sqs-component"
},
{
"Sid": "MySQSPolicy001",
"Effect": "Allow",
"Principal": "*",
"Action": "sqs:SendMessage",
"Resource": "arn:aws:sqs:us-east-1:094002318819:sqs-component",
"Condition": {
"ArnEquals": {
"aws:SourceArn": "arn:aws:sns:us-east-1:094002318819:sns-component"
}
}
}
]
}
- (Session by Amar Mudiraj)
- Interactive query service that helps in analyzing data directly in Amazon S3 using standard S3.
- Athena is serverless and so we only pay for the queries that we run.
- Athena scales automatically - runs queries in parallel. Works with large datasets and complex queries.
- Uses schema-on-read technology. No loading or transformation required.
- AWS Glue is used for metadata store for data in S3. This reduces the complexity of queries.
- Develop a SpringBoot application and build the jar for it using mvn install.
- Application.properties has to be configured with server.port=5000 to facilitate Nginx reverse proxy.
- In AWS, under EBS, create a service uploading this jar as the payload.
- Once the service starts, the application can be accessed.
What services get created behind the scenes when EBS service starts and why does SpringBoot application properties has to be updated for server port?.
- When an EBS service starts, it also starts an Elastic load balancer, an Nginx proxy, atleast one EC2 instance, Security Groups, IP Addresses behind the scenes.
- https://pragmaticintegrator.wordpress.com/2016/07/12/run-your-spring-boot-application-on-aws-using-elastic-beanstalk/
- Nginx forwards the requests to destination server on port 5000 by default. This is why this setting is needed.