Crowdbreaks Streamer

For data collection, Crowdbreaks leverages streaming endpoints within the Twitter Developer API. The infrastructure is set up using Amazon Web Services (AWS).

There is a Python streamer app that runs on an AWS Fargate cluster and uses a POST statuses/filter (API v1.1) request to connect to a filtered stream of relevant tweets. The relevant tweets are filtered based on keywords and languages that are provided for each project within Crowdbreaks.

The whole data pipeline is set up using AWS. The streamer app itself runs on a Fargate cluster. After aquiring the tweets, it sends them over to their corresponding Kinesis Firehose Delivery Streams (one per project), which saves each project's tweets with a separate key-prefix ("folder") to a bucket in Simple Cloud Storage (S3). Each new batch of tweets being saved to S3 triggers an event that invokes a Lambda function, which preprocesses the tweets in the batch, makes predictions using a SageMaker endpoint and sends the preprocessed data over to a project's Elasticsearch index.

This way, Crowdbreaks is able to collect and keep Twitter data in a flexible and scalable fashion.

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.github/workflows		.github/workflows
awstools		awstools
lambda-es-rotation		lambda-es-rotation
lambda-s3-to-es		lambda-s3-to-es
lambda-sample-for-annotations		lambda-sample-for-annotations
lambda-streamer-management		lambda-streamer-management
lambda-subsample-annotations		lambda-subsample-annotations
pics		pics
streamer		streamer
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
remove-all-layers.sh		remove-all-layers.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdbreaks Streamer

About

Releases

Packages

Languages

digitalepidemiologylab/crowdbreaks-streamer

Folders and files

Latest commit

History

Repository files navigation

Crowdbreaks Streamer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages