-
Notifications
You must be signed in to change notification settings - Fork 930
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2049 from awsarippa/s3-lambda-translate-cdk-python
New Pattern Submission - TranslateDocument s3-lambda-translate-cdk-python
- Loading branch information
Showing
12 changed files
with
1,837 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
|
||
### Project Structure within s3-lambda-translate-cdk-python: | ||
``` | ||
s3-lambda-translate-cdk-python | ||
- app.py | ||
- cdk.json | ||
- src/ | ||
- lambda_function.py | ||
- architecture.png | ||
- requirements.txt | ||
- assets/ | ||
- AmazonSimpleStorageService.html | ||
- fr-AmazonSimpleStorageService.html | ||
- python.zip | ||
``` | ||
|
||
## Common Errors & Troubleshooting | ||
|
||
### "ValueError: Must setup local AWS configuration with a region supported by AWS Services." | ||
Solution: You must set an AWS region with `export AWS_DEFAULT_REGION=<your-region>` | ||
|
||
### Error creating role | ||
``` | ||
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateRole operation: User: <user-arn> is not authorized to perform: iam:CreateRole on resource: <role-arn> because no identity-based policy allows the iam:CreateRole action | ||
``` | ||
Solution: you must ensure the IAM role you are using has sufficient permissions to create IAM roles | ||
|
||
#### Error processing tar file(exit status 1): write /path/libcublas.so.11: no space left on device | ||
Issue: Docker has run out of memory due to too many images | ||
Solution: Delete unused images in the Docker application and then [prune docker](https://docs.docker.com/config/pruning/) in command line | ||
|
||
#### ConnectionResetError: [Errno 104] Connection reset by peer | ||
Issue: Pip issue | ||
Solution: Clear pip cache (`python3 -m pip cache purge`) and run again |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Translate file with AWS Lambda, Amazon S3, and Amazon Translate | ||
This pattern uses AWS Cloud Development Kit (AWS CDK) to deploy Amazon S3, AWS Lambda, and Amazon Translate to perform document language translation. | ||
|
||
## Architecture | ||
![Diagram](src/architecture.png) | ||
|
||
### What resources will be created? | ||
This CDK code will create the following: | ||
- One Lambda function (to invoke the TranslateDocument API) | ||
- Two S3 buckets (One bucket to accept the user input to trigger the Lambda function and the second bucket to capture the output from Translate service.) | ||
- One IAM role (for the Lambda function to invoke Translate service, read and upload translated documents to S3 bucket.) | ||
|
||
## Requirements | ||
|
||
### Development Environment | ||
**Cloud 9** | ||
|
||
This demonstration for this pattern is executed in an AWS Cloud9 environment. The EC2 instance used is t2.micro (1 GiB RAM + 1 vCPU). However, users have an option to deploy the application using CDK from local environment as well. | ||
|
||
### AWS setup | ||
**Region** | ||
|
||
If you have not yet run `aws configure` and set a default region, you must do so, or you can also run `export AWS_DEFAULT_REGION=<your-region>`. The region used in the demonstration is us-east-1. Please make sure the region selected supports both Translate and Comprehend service. | ||
(If the user does not know the source language that needs to be translated, the source language is set as `auto` in the lambda function and Translate service internally invokes Comprehend API to detect the source language.) | ||
|
||
**Authorization** | ||
|
||
You must use a role that has sufficient permissions to create IAM roles, as well as CloudFormation resources | ||
|
||
#### Python >=3.8 | ||
Make sure you have [python3](https://www.python.org/downloads/) installed at a version >=3.8.x in the CDK environment. The demonstration uses python 3.10. | ||
As `TranslateDocument` API is yet to be made available in the latest Boto3 library, a layer `python.zip` with Boto3 version >= 1.28.56 has been attached. | ||
|
||
#### AWS CDK | ||
Make sure you have the [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) installed in the Cloud9 environment. | ||
|
||
|
||
## Setup | ||
|
||
### Set up environment and gather packages | ||
|
||
``` | ||
cd s3-lambda-translate-cdk-python | ||
``` | ||
|
||
Install the required dependencies (aws-cdk-lib and constructs) into your Python environment | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Gather and deploy resources with the CDK | ||
|
||
First synthesize, which executes the application, defines which resources will be created, and translates this into a CloudFormation template | ||
``` | ||
cdk synth | ||
``` | ||
All AWS CDK v2 deployments use dedicated AWS resources to hold data during deployment. Therefore, your AWS account and Region must be bootstrapped to create these resources before you can deploy. If you haven't already bootstrapped execute the below command | ||
``` | ||
cdk bootstrap | ||
``` | ||
and deploy with | ||
``` | ||
cdk deploy | ||
``` | ||
|
||
The deployment will create two S3 buckets and a Lambda function. | ||
|
||
## How it works | ||
The S3 bucket acts as a placeholder to upload the document, required for performing language translation. In the demonstration, we use the file `AmazonSimpleStorageService.html` inside the `assets` folder. | ||
Uploading a file to the S3 bucket invokes the Lambda function. | ||
The Lambda function invokes Translate's `TranslateDocument` API and uploads the response document with the naming pattern `target_language`-`source_file_name`. | ||
The target language for translation is set to French by default and users are requested to change it as per their use-case. | ||
At the time of creating this pattern, `TranslateDocument` API supports three formats of document: | ||
- `text/html` - The input data consists of HTML content. Amazon Translate translates only the text in the HTML element. | ||
- `text/plain` - The input data consists of unformatted text. Amazon Translate translates every character in the content. | ||
- `application/vnd.openxmlformats-officedocument.wordprocessingml.document` - The input data consists of a Word document (.docx). | ||
|
||
In this demonstration, we've chosen the `text/html` document format. | ||
|
||
## Testing | ||
Upon successful deployment of the stack, the Output section would provide the names of the S3 buckets from the variables `S3InputBucket` and `S3OutputBucket` in the CDK environment. | ||
Alternatively, these values can be found from the Output section of the `CloudFormation` stack. | ||
Upload the sample file `assets\AmazonSimpleStorageService.html` to the input S3 bucket. The upload action invokes the Lambda function and the document is analyzed. | ||
The translated document is stored in the output S3 bucket. In this demonstration, we have chosen to convert an html document to `French. | ||
Hence, the converted document would look similar to `assets\fr-AmazonSimpleStorageService.html`. | ||
|
||
## Cleanup | ||
To clean up the resources created as part of this demonstration, run the command `cdk destroy` in the directory `s3-lambda-translate-cdk-python`. In addition, users are advised to terminate the Cloud9 EC2 instance to avoid any unexpected charges. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
#!/usr/bin/env python3 | ||
import os | ||
import aws_cdk as cdk | ||
from aws_cdk import ( | ||
Duration, | ||
Stack, | ||
aws_iam as iam, | ||
aws_lambda as lambda_, | ||
aws_s3 as s3, | ||
aws_lambda_event_sources as eventsources, | ||
CfnOutput, | ||
aws_logs as logs | ||
) | ||
from constructs import Construct | ||
|
||
DIRNAME = os.path.dirname(__file__) | ||
|
||
|
||
class S3LambdaTranslateServerless(Stack): | ||
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None: | ||
super().__init__(scope, construct_id, **kwargs) | ||
|
||
# Replace the input bucket name with a preferred unique name, since S3 bucket names are globally unique. | ||
self.user_input_bucket = s3.Bucket( | ||
self, | ||
"s3-translate-input-bucket", | ||
versioned=True, | ||
bucket_name="s3-translate-input-bucket", | ||
encryption=s3.BucketEncryption.S3_MANAGED, | ||
block_public_access=s3.BlockPublicAccess.BLOCK_ALL, | ||
enforce_ssl=True, | ||
removal_policy=cdk.RemovalPolicy.DESTROY, | ||
auto_delete_objects=True, | ||
) | ||
|
||
# Replace the output bucket name with a preferred unique name, since S3 bucket names are globally unique. | ||
self.user_output_bucket = s3.Bucket( | ||
self, | ||
"s3-translate-output-bucket", | ||
versioned=True, | ||
bucket_name="s3-translate-output-bucket", | ||
encryption=s3.BucketEncryption.S3_MANAGED, | ||
block_public_access=s3.BlockPublicAccess.BLOCK_ALL, | ||
enforce_ssl=True, | ||
removal_policy=cdk.RemovalPolicy.DESTROY, | ||
auto_delete_objects=True, | ||
) | ||
|
||
# Iam role to invoke lambda | ||
lambda_cfn_role = iam.Role( | ||
self, | ||
"CfnRole", | ||
assumed_by=iam.ServicePrincipal("s3.amazonaws.com"), | ||
) | ||
lambda_cfn_role.add_managed_policy( | ||
iam.ManagedPolicy.from_aws_managed_policy_name("AWSLambdaExecute") | ||
) | ||
|
||
# lambda layer containing boto3, Pillow for image processing, and Pyshortener for shortening the pre-signed | ||
# s3 url. | ||
layer = lambda_.LayerVersion( | ||
self, | ||
"Boto3Layer", | ||
code=lambda_.Code.from_asset("./python.zip"), | ||
compatible_runtimes=[lambda_.Runtime.PYTHON_3_10], | ||
) | ||
|
||
# Log group for Lambda function | ||
log_group = logs.LogGroup(self, "Lambda Group", removal_policy=cdk.RemovalPolicy.DESTROY) | ||
|
||
# Lambda function for processing the incoming request triggered as part of S3 upload. Source and Target language are passed as environment variables to the Lambda function. | ||
lambda_function = lambda_.Function( | ||
self, | ||
"TranslateTextLambda", | ||
runtime=lambda_.Runtime.PYTHON_3_10, | ||
handler="lambda_function.lambda_handler", | ||
code=lambda_.Code.from_asset(os.path.join(DIRNAME, "src")), | ||
timeout=Duration.minutes(1), | ||
layers=[layer], | ||
memory_size=256, | ||
log_group=log_group, | ||
environment={ | ||
"environment": "dev", | ||
"src_lang": "auto", | ||
"target_lang": "fr", | ||
"destination_bucket": self.user_output_bucket.bucket_name, | ||
}, | ||
) | ||
|
||
# lambda policy | ||
lambda_function.add_to_role_policy( | ||
iam.PolicyStatement( | ||
actions=[ | ||
"translate:TranslateText", | ||
"translate:TranslateDocument", | ||
"comprehend:DetectDominantLanguage", | ||
], | ||
resources=["*"], | ||
) | ||
) | ||
|
||
# iam policy for S3 Get | ||
lambda_function.add_to_role_policy( | ||
iam.PolicyStatement( | ||
actions=[ | ||
"s3:GetObject", | ||
], | ||
resources=[self.user_input_bucket.arn_for_objects("*")], | ||
) | ||
) | ||
|
||
# iam policy for S3 Put | ||
lambda_function.add_to_role_policy( | ||
iam.PolicyStatement( | ||
actions=[ | ||
"s3:PutObject", | ||
], | ||
resources=[self.user_output_bucket.arn_for_objects("*")], | ||
) | ||
) | ||
|
||
lambda_function.add_event_source( | ||
eventsources.S3EventSource( | ||
self.user_input_bucket, events=[s3.EventType.OBJECT_CREATED] | ||
) | ||
) | ||
|
||
# Outputs | ||
CfnOutput( | ||
self, | ||
"S3 Output Bucket", | ||
description="S3 Translated Output Bucket", | ||
value=self.user_output_bucket.bucket_name, | ||
) | ||
CfnOutput( | ||
self, | ||
"S3 Input Bucket", | ||
description="S3 Input Bucket", | ||
value=self.user_input_bucket.bucket_name, | ||
) | ||
|
||
|
||
app = cdk.App() | ||
filestack = S3LambdaTranslateServerless(app, "S3LambdaTranslateServerless") | ||
|
||
app.synth() |
Oops, something went wrong.