Skip to content

Commit

Permalink
Merge pull request #2049 from awsarippa/s3-lambda-translate-cdk-python
Browse files Browse the repository at this point in the history
New Pattern Submission - TranslateDocument s3-lambda-translate-cdk-python
  • Loading branch information
ellisms authored Jan 17, 2024
2 parents bad4345 + 4b58819 commit 4677e06
Show file tree
Hide file tree
Showing 12 changed files with 1,837 additions and 0 deletions.
34 changes: 34 additions & 0 deletions s3-lambda-translate-cdk-python/DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

### Project Structure within s3-lambda-translate-cdk-python:
```
s3-lambda-translate-cdk-python
- app.py
- cdk.json
- src/
- lambda_function.py
- architecture.png
- requirements.txt
- assets/
- AmazonSimpleStorageService.html
- fr-AmazonSimpleStorageService.html
- python.zip
```

## Common Errors & Troubleshooting

### "ValueError: Must setup local AWS configuration with a region supported by AWS Services."
Solution: You must set an AWS region with `export AWS_DEFAULT_REGION=<your-region>`

### Error creating role
```
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateRole operation: User: <user-arn> is not authorized to perform: iam:CreateRole on resource: <role-arn> because no identity-based policy allows the iam:CreateRole action
```
Solution: you must ensure the IAM role you are using has sufficient permissions to create IAM roles

#### Error processing tar file(exit status 1): write /path/libcublas.so.11: no space left on device
Issue: Docker has run out of memory due to too many images
Solution: Delete unused images in the Docker application and then [prune docker](https://docs.docker.com/config/pruning/) in command line

#### ConnectionResetError: [Errno 104] Connection reset by peer
Issue: Pip issue
Solution: Clear pip cache (`python3 -m pip cache purge`) and run again
88 changes: 88 additions & 0 deletions s3-lambda-translate-cdk-python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Translate file with AWS Lambda, Amazon S3, and Amazon Translate
This pattern uses AWS Cloud Development Kit (AWS CDK) to deploy Amazon S3, AWS Lambda, and Amazon Translate to perform document language translation.

## Architecture
![Diagram](src/architecture.png)

### What resources will be created?
This CDK code will create the following:
- One Lambda function (to invoke the TranslateDocument API)
- Two S3 buckets (One bucket to accept the user input to trigger the Lambda function and the second bucket to capture the output from Translate service.)
- One IAM role (for the Lambda function to invoke Translate service, read and upload translated documents to S3 bucket.)

## Requirements

### Development Environment
**Cloud 9**

This demonstration for this pattern is executed in an AWS Cloud9 environment. The EC2 instance used is t2.micro (1 GiB RAM + 1 vCPU). However, users have an option to deploy the application using CDK from local environment as well.

### AWS setup
**Region**

If you have not yet run `aws configure` and set a default region, you must do so, or you can also run `export AWS_DEFAULT_REGION=<your-region>`. The region used in the demonstration is us-east-1. Please make sure the region selected supports both Translate and Comprehend service.
(If the user does not know the source language that needs to be translated, the source language is set as `auto` in the lambda function and Translate service internally invokes Comprehend API to detect the source language.)

**Authorization**

You must use a role that has sufficient permissions to create IAM roles, as well as CloudFormation resources

#### Python >=3.8
Make sure you have [python3](https://www.python.org/downloads/) installed at a version >=3.8.x in the CDK environment. The demonstration uses python 3.10.
As `TranslateDocument` API is yet to be made available in the latest Boto3 library, a layer `python.zip` with Boto3 version >= 1.28.56 has been attached.

#### AWS CDK
Make sure you have the [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) installed in the Cloud9 environment.


## Setup

### Set up environment and gather packages

```
cd s3-lambda-translate-cdk-python
```

Install the required dependencies (aws-cdk-lib and constructs) into your Python environment
```
pip install -r requirements.txt
```

### Gather and deploy resources with the CDK

First synthesize, which executes the application, defines which resources will be created, and translates this into a CloudFormation template
```
cdk synth
```
All AWS CDK v2 deployments use dedicated AWS resources to hold data during deployment. Therefore, your AWS account and Region must be bootstrapped to create these resources before you can deploy. If you haven't already bootstrapped execute the below command
```
cdk bootstrap
```
and deploy with
```
cdk deploy
```

The deployment will create two S3 buckets and a Lambda function.

## How it works
The S3 bucket acts as a placeholder to upload the document, required for performing language translation. In the demonstration, we use the file `AmazonSimpleStorageService.html` inside the `assets` folder.
Uploading a file to the S3 bucket invokes the Lambda function.
The Lambda function invokes Translate's `TranslateDocument` API and uploads the response document with the naming pattern `target_language`-`source_file_name`.
The target language for translation is set to French by default and users are requested to change it as per their use-case.
At the time of creating this pattern, `TranslateDocument` API supports three formats of document:
- `text/html` - The input data consists of HTML content. Amazon Translate translates only the text in the HTML element.
- `text/plain` - The input data consists of unformatted text. Amazon Translate translates every character in the content.
- `application/vnd.openxmlformats-officedocument.wordprocessingml.document` - The input data consists of a Word document (.docx).

In this demonstration, we've chosen the `text/html` document format.

## Testing
Upon successful deployment of the stack, the Output section would provide the names of the S3 buckets from the variables `S3InputBucket` and `S3OutputBucket` in the CDK environment.
Alternatively, these values can be found from the Output section of the `CloudFormation` stack.
Upload the sample file `assets\AmazonSimpleStorageService.html` to the input S3 bucket. The upload action invokes the Lambda function and the document is analyzed.
The translated document is stored in the output S3 bucket. In this demonstration, we have chosen to convert an html document to `French.
Hence, the converted document would look similar to `assets\fr-AmazonSimpleStorageService.html`.

## Cleanup
To clean up the resources created as part of this demonstration, run the command `cdk destroy` in the directory `s3-lambda-translate-cdk-python`. In addition, users are advised to terminate the Cloud9 EC2 instance to avoid any unexpected charges.
146 changes: 146 additions & 0 deletions s3-lambda-translate-cdk-python/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
#!/usr/bin/env python3
import os
import aws_cdk as cdk
from aws_cdk import (
Duration,
Stack,
aws_iam as iam,
aws_lambda as lambda_,
aws_s3 as s3,
aws_lambda_event_sources as eventsources,
CfnOutput,
aws_logs as logs
)
from constructs import Construct

DIRNAME = os.path.dirname(__file__)


class S3LambdaTranslateServerless(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)

# Replace the input bucket name with a preferred unique name, since S3 bucket names are globally unique.
self.user_input_bucket = s3.Bucket(
self,
"s3-translate-input-bucket",
versioned=True,
bucket_name="s3-translate-input-bucket",
encryption=s3.BucketEncryption.S3_MANAGED,
block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
enforce_ssl=True,
removal_policy=cdk.RemovalPolicy.DESTROY,
auto_delete_objects=True,
)

# Replace the output bucket name with a preferred unique name, since S3 bucket names are globally unique.
self.user_output_bucket = s3.Bucket(
self,
"s3-translate-output-bucket",
versioned=True,
bucket_name="s3-translate-output-bucket",
encryption=s3.BucketEncryption.S3_MANAGED,
block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
enforce_ssl=True,
removal_policy=cdk.RemovalPolicy.DESTROY,
auto_delete_objects=True,
)

# Iam role to invoke lambda
lambda_cfn_role = iam.Role(
self,
"CfnRole",
assumed_by=iam.ServicePrincipal("s3.amazonaws.com"),
)
lambda_cfn_role.add_managed_policy(
iam.ManagedPolicy.from_aws_managed_policy_name("AWSLambdaExecute")
)

# lambda layer containing boto3, Pillow for image processing, and Pyshortener for shortening the pre-signed
# s3 url.
layer = lambda_.LayerVersion(
self,
"Boto3Layer",
code=lambda_.Code.from_asset("./python.zip"),
compatible_runtimes=[lambda_.Runtime.PYTHON_3_10],
)

# Log group for Lambda function
log_group = logs.LogGroup(self, "Lambda Group", removal_policy=cdk.RemovalPolicy.DESTROY)

# Lambda function for processing the incoming request triggered as part of S3 upload. Source and Target language are passed as environment variables to the Lambda function.
lambda_function = lambda_.Function(
self,
"TranslateTextLambda",
runtime=lambda_.Runtime.PYTHON_3_10,
handler="lambda_function.lambda_handler",
code=lambda_.Code.from_asset(os.path.join(DIRNAME, "src")),
timeout=Duration.minutes(1),
layers=[layer],
memory_size=256,
log_group=log_group,
environment={
"environment": "dev",
"src_lang": "auto",
"target_lang": "fr",
"destination_bucket": self.user_output_bucket.bucket_name,
},
)

# lambda policy
lambda_function.add_to_role_policy(
iam.PolicyStatement(
actions=[
"translate:TranslateText",
"translate:TranslateDocument",
"comprehend:DetectDominantLanguage",
],
resources=["*"],
)
)

# iam policy for S3 Get
lambda_function.add_to_role_policy(
iam.PolicyStatement(
actions=[
"s3:GetObject",
],
resources=[self.user_input_bucket.arn_for_objects("*")],
)
)

# iam policy for S3 Put
lambda_function.add_to_role_policy(
iam.PolicyStatement(
actions=[
"s3:PutObject",
],
resources=[self.user_output_bucket.arn_for_objects("*")],
)
)

lambda_function.add_event_source(
eventsources.S3EventSource(
self.user_input_bucket, events=[s3.EventType.OBJECT_CREATED]
)
)

# Outputs
CfnOutput(
self,
"S3 Output Bucket",
description="S3 Translated Output Bucket",
value=self.user_output_bucket.bucket_name,
)
CfnOutput(
self,
"S3 Input Bucket",
description="S3 Input Bucket",
value=self.user_input_bucket.bucket_name,
)


app = cdk.App()
filestack = S3LambdaTranslateServerless(app, "S3LambdaTranslateServerless")

app.synth()
Loading

0 comments on commit 4677e06

Please sign in to comment.