- Amazon S3 allows people to store objects (files) in “buckets” (directories)
- Buckets must have a globally unique name
- Naming convention:
- No uppercase
- No underscore
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number
- Naming convention:
- Objects
- Objects (files) have a Key. The key is the FULL path:
- <my_bucket>/my_file.txt
- <my_bucket>/my_folder/another_folder/my_file.txt
- There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise)
- Just keys with very long names that contain slashes (“/“)
- Object Values are the content of the body:
- Max Size is 5TB
- If uploading more than 100MB , must use “multipart upload”
- Metadata (list of text key / value pairs - system or user metadata)
- Tags (Unicode key / value pair - up to 10) - useful for security / lifecycle
- Version ID (if versioning is enabled)
- 3500 PUTS / 5500 GETS limit per prefix (this is why it's important to never group items with a fixed name like "user-123", "user-493", etc.)
- Objects (files) have a Key. The key is the FULL path:
- Enabled at the bucket level
- Same key overwrite will increment the “version”: 1, 2, 3
- It is best practice to version your buckets
- Protect against unintended deletes (ability to restore a version)
- Easy roll back to previous versions
- Any file that is not version prior to enabling versioning will have the version “null”
- There are 4 methods of encrypt objects in S3
- SSE-S3: encrypts S3 objects
- Encryption using keys handled & managed by AWS S3
- Object is encrypted server side
- AES-256 encryption type
- Must set header: “x-amz-server-side-encryption”:”AES256”
- SSE-KMS: encryption using keys handled & managed by KMS
- KMS Advantages: user control + audit trail
- Object is encrypted server side
- Maintain control of the rotation policy for the encryption keys
- Must set header: “x-amz-server-side-encryption”:”aws:kms”
- SSE-C: server-side encryption using data keys fully managed by the customer outside of AWS
- Amazon S3 does not store the encryption key you provide
- HTTPS must be used
- Encryption key must be provided in HTTP headers, for every HTTP request made
- Client Side Encryption
- Client library such as the amazon S3 Encryption Client
- Clients must encrypt data themselves before sending to S3
- Clients must decrypt data themselves when retrieving from S3
- Customer fully manages the keys and encryption cycle
- SSE-S3: encrypts S3 objects
- AWS S3 exposes:
- HTTP endpoint: non encrypted
- HTTPS endpoint: encryption in flight
- You’re free to use the endpoint your want, but HTTPS is recommended
- HTTPS is mandatory for SSE-C
- Encryption in flight is also called SSL / TLS
- User based
- IAM policies - which API calls should be allowed for a specific user from IAM console
- Resource based
- Bucket policies - bucket wide rules from the S3 console - allows cross account
- Object Access Control List (ACL) - finer grain
- Bucket Access Control List (ACL) - less common
- Networking
- Support VPC endpoints (for instances in VPC without www internet)
- Logging and Audit:
- S3 access logs can be stored in other S3 buckets
- API calls can be logged in AWS CloudTrail
- User Security:
- MFA (multi factor authentication) can be required in versioned buckets to delete objects
- Signed URLs: URLS that are valid only for a limited time (ex: premium video services for logged in users)
- JSON based policies
- Resources: buckets and objects
- Actions: Set of API to Allow or Deny
- Effect: Allow / Deny
- Principal: The account or user to apply the policy to
- Use S3 bucket for policy to:
- Grant public access to the bucket
- Force objects to be encrypted at upload
- Grant access to another account (Cross Account)
- S3 can host static website and have them accessible on the world wide web
- The website URL will be:
- http://bucket-name.s3-website-AWS-region.amazonaws.com
- OR
- http://bucket-name.s3-website.AWS-region.amazonaws.com
- If you get a 403 (forbidden) error, make sure the bucket policy allows public reads!
- If you request data from another S3 bucket, you need to enable CORS
- Cross Origin Resource Sharing allows you to limit the number of websites that can request your files in S3 (and limit your costs)
- This is a popular exam question
-
Read after write consistency for PUTS of new objects
- As soon as an object is written, we can retrieve it e.g: (PUT 200 -> GET 200)
- This is true, except if we did a GET before to see if the object existed
ex: (GET 404 -> PUT 200 -> GET 404) - eventually consistent
-
Eventual Consistency for DELETES and PUTS of existing objects
- If we read an object after updating, we might get the older version
ex: (PUT 200 -> PUT 200 -> GET 200 (might be older version))
- If we delete an object, we might still be able to retrieve it for a short time
ex: (DELETE 200 -> GET 200)
- S3 can send notifications on changes to
- AWS SQS: queue service
- AWS SNS: notification service
- AWS Lambda: serverless service
- S3 has a cross region replication feature (managed)
- Faster upload of large objects (>100MB), use multipart upload
- Parallelize PUTs for greater throughput
- Maximize your network bandwidth
- Decrease time to retry in case a part fails
- Use CloudFront to cache S3 objects around the world (improves reads)
- S3 Transfer Acceleration (uses edge locations) - just need to change the endpoint you write to, not the code
- If using SSE-KMS encryption, you may be limited to your AWS limits for KMS usage (~100s - 1000s downloads / uploads per second)