Skip to content

Server less data Pipeline to perform Language Translation and Named Entity Masking on Podcasts or Audio files

Notifications You must be signed in to change notification settings

adison1994/Breaking-Language-Barriers-Serverless-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breaking-Language-Barriers-Pipeline

Codelabs

https://codelabs-preview.appspot.com/?file_id=1dXKHuohRDgSZ0ZNEV1IM7HvMwl88bMCi3bkk1zInp8Y#0

Overview

Natural Language Processing based Big Data Pipeline system to convert product information, blogs or news podcasts published in any language to other multiple languages based on user's choice using state of the art machine learning models powered by AWS architecture.

  • Language Translation
  • NER Audio Masking
  • Summarization

Architecture

img

Step Function Workflow

For our application we have developed a specific design pattern for our step functions which is parent and child processes or also known as Nested Workflows. The parent Step function spawns multiple Child Step functions to process multiple input Audio links/files parallelly.

Parent workflow:

img

Child Workflow:

img

Language Translation Step function Execution

img

NER Masking Step function Execution

img

X-Ray Service Graph

We are using AWS X-Ray to debug and monitor our application. By enabling X-Ray functionality mainly for step functions in our application. It generates service graph to display the execution of different processes and highlight them as shown above.

img

Application Screenshots

Login/Signup img

Services

img img

NER Masking Output

img

Language Translation Output img

Quick Sight Integration

For our application we have integrated our meta database tables located in DynamoDB with AWS Quick Sight service to generate various analyses/dashboards to visualize different aspects of our application.

img

Install instructions

Create an Amazon Web Services (AWS) account

If you already have an account, skip this step.

Go to this link and follow the instructions. You will need a valid debit or credit card. You will not be charged, it is only to validate your ID.

Install AWS Command Line Interface (AWSCLI)

Install the AWS CLI Version 1 for your operating system. Please follow the appropriate link below based on your operating system.

** Please make sure you add the AWS CLI version 2 executable to your command line Path. Verify that AWS CLI is installed correctly by running aws --version.

  • You should see something similar to aws-cli/1.18.197 Python/3.6.0 Windows/10 botocore/1.19.37.

Configuring the AWS CLI

You need to retrieve AWS credentials that allow your AWS CLI to access AWS resources.

  1. Sign into the AWS console. This simply requires that you sign in with the email and password you used to create your account. If you already have an AWS account, be sure to log in as the root user.
  2. Choose your account name in the navigation bar at the top right, and then choose My Security Credentials.
  3. Expand the Access keys (access key ID and secret access key) section.
  4. Press Create New Access Key.
  5. Press Download Key File to download a CSV file that contains your new Access Key Id and Secret Key. Keep this file somewhere where you can find it easily.

Now, you can configure your AWS CLI with the credentials you just created and downloaded.

  1. In your Terminal, run aws configure.

    i. Enter your AWS Access Key ID from the file you downloaded. ii. Enter the AWS Secret Access Key from the file. iii. For Default region name, enter us-east-1. iv. For Default output format, enter json.

  2. Run aws s3 ls in your Terminal. If your AWS CLI is configured correctly, you should see nothing (because you do not have any existing AWS S3 buckets) or if you have created AWS S3 buckets before, they will be listed in your Terminal window.

** If you get an error, then please try to configure your AWS CLI again.

Run Sequence

Run requirements.txt

pip install -r requirements.txt

Run Streamlit application

streamlit run app.py

Built With

  • AWS Transcribe : Service that adds speech to text capabilities in applications.
  • AWS Translate : Machine translation service for fast, high-quality, & affordable language translation.
  • AWS Polly : Service that turns text into lifelike speech.
  • AWS Comprehend : NLP service that uses machine learning to find insights and relationships in text.
  • AWS Polly : Service that turns text into lifelike speech.
  • AWS X-Ray: Service which helps to analyze and debug production, distributed applications
  • AWS Cognito :Service for authentication, authorization, and user management for web & mobile apps.
  • Streamlit :The fastest way to build and share data apps

About

Server less data Pipeline to perform Language Translation and Named Entity Masking on Podcasts or Audio files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages