Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ability to handle concurrent requests #270

Open
paulalbert1 opened this issue Aug 13, 2018 · 0 comments
Open

Improve ability to handle concurrent requests #270

paulalbert1 opened this issue Aug 13, 2018 · 0 comments
Assignees

Comments

@paulalbert1
Copy link
Contributor

Top: current system for retrieval. Bottleneck is present when saving PubMed articles

Bottom: proposed design for making both retrieval and storage of PubMed articles using a bunch of "workers" (mostly likely EC2 instances).

image from ios

Also, the "retrieval" call from ReCiter to PubMed service will be async. ReCiter will be notified of the success/failure of retrieval (list of pmids retrieved successfully, list of pmids retrieved unsucessfully) via a callback.

PubMed service will orchestrate the retrieval "workers" (retrieve PubMed articles) and storage "workers" (storing PubMed articles into DynamoDB). PubMed Retrieval Service still does what it does now: accept PubMed queries and returns results of the queries.

Q: How do you divvy up work among the workers? 100 each?
A: # pubmed_articles / # workers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants