-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVA-3696 - Processing via a scanner and new brokering method #232
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, also very useful for me because it just makes the orchestration more concrete and easier for me to understand.
For testing, I wouldn't mind including some integration tests against our submission-ws and Biosamples in dev, as long as we clean up the dev dbs and the tests aren't too long (or run selectively, e.g. manually triggered or on tags only) I think it's fine.
We can also just mock the submission-ws and write unit tests. Incidentally I think this is also easier if we use a client object as per my suggestion, means just one thing to mock rather than patching all over the place...
|
||
class SubCliProcessValidation(SubCliProcess): | ||
|
||
all_validation_tasks = ['metadata_check', 'assembly_check', 'aggregation_check', 'vcf_check', 'sample_check', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we use these tasks for sub cli processing?
import os | ||
import shutil | ||
|
||
from eva_submission.ENA_submission.upload_to_ENA import ENAUploader, ENAUploaderAsync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all imports are being used
PROCESSING_STATUS = [READY_FOR_PROCESSING, FAILURE, SUCCESS, RUNNING, ON_HOLD] | ||
|
||
|
||
def sub_ws_auth(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth extracting the submission WS client into common-pyutils, so it can be used in both eva-sub-cli and eva-submission? It's some extra refactoring, but I think it could be beneficial in the long run to keep python interactions with the submission WS in one place.
# We need to get back to the reader to get all the names that were present in the spreadsheet | ||
return [sample_row.get('Sample Name') or sample_row.get('Sample ID') for sample_row in self.reader.samples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be updated to refer to the JSON instead of the reader
|
||
def _update_submission_ws(self): | ||
put_to_sub_ws(sub_ws_url_build('admin', 'submission', self.submission_id, 'status', self.submission_status)) | ||
put_to_sub_ws('admin', 'submission-process', self.submission_id, self.processing_step, self.processing_status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing call to sub_ws_url_build
f'Found {len(sample_name_to_accession)} and expected ' | ||
f'{len(sample_submitter.all_sample_names())}. ' | ||
f'Missing samples are ' | ||
f'{[sample_name for sample_name in sample_submitter.all_sample_names() if sample_name not in sample_name_to_accession]}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This process needs to set the processing status to RUNNING
, SUCCESS
or FAILURE
, correct? Or is that someone else's responsibility?
|
||
def scan(self): | ||
def _scan_per_status(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially very confused about why we needed to scan both tables, before I realised that this scan is (I think) only used to add the first processing step. If that's the case, then maybe it could be a bit less generic and used only in that specific situation - even if we used it for other operations (e.g. scanning for cancelled submissions to clean up the db or something), I don't think creating a SubmissionStep
for validation would make sense in those cases.
pretty_print(header, lines) | ||
|
||
|
||
class NewSubmissionScanner(SubmissionScanner): | ||
|
||
statuses = ['UPLOADED'] | ||
step_statuses = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a matter of taste, but I think I would prefer the different scanning tasks as methods rather than classes. So we would have just one SubmissionScanner
with the generic _scan_per_step_status
helper method, and find_new_submissions
, find_completed_submission_steps
, etc. that call that method with the relevant statuses.
On the other hand, maybe there's more functionality that would go into these subclasses that I'm not thinking of, in which case having the extra classes makes sense.
No description provided.