Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVA-3696 - Processing via a scanner and new brokering method #232

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tcezard
Copy link
Member

@tcezard tcezard commented Dec 16, 2024

No description provided.

Copy link
Contributor

@apriltuesday apriltuesday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, also very useful for me because it just makes the orchestration more concrete and easier for me to understand.

For testing, I wouldn't mind including some integration tests against our submission-ws and Biosamples in dev, as long as we clean up the dev dbs and the tests aren't too long (or run selectively, e.g. manually triggered or on tags only) I think it's fine.

We can also just mock the submission-ws and write unit tests. Incidentally I think this is also easier if we use a client object as per my suggestion, means just one thing to mock rather than patching all over the place...


class SubCliProcessValidation(SubCliProcess):

all_validation_tasks = ['metadata_check', 'assembly_check', 'aggregation_check', 'vcf_check', 'sample_check',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we use these tasks for sub cli processing?

import os
import shutil

from eva_submission.ENA_submission.upload_to_ENA import ENAUploader, ENAUploaderAsync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all imports are being used

PROCESSING_STATUS = [READY_FOR_PROCESSING, FAILURE, SUCCESS, RUNNING, ON_HOLD]


def sub_ws_auth():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth extracting the submission WS client into common-pyutils, so it can be used in both eva-sub-cli and eva-submission? It's some extra refactoring, but I think it could be beneficial in the long run to keep python interactions with the submission WS in one place.

Comment on lines +410 to +411
# We need to get back to the reader to get all the names that were present in the spreadsheet
return [sample_row.get('Sample Name') or sample_row.get('Sample ID') for sample_row in self.reader.samples]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be updated to refer to the JSON instead of the reader


def _update_submission_ws(self):
put_to_sub_ws(sub_ws_url_build('admin', 'submission', self.submission_id, 'status', self.submission_status))
put_to_sub_ws('admin', 'submission-process', self.submission_id, self.processing_step, self.processing_status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing call to sub_ws_url_build

f'Found {len(sample_name_to_accession)} and expected '
f'{len(sample_submitter.all_sample_names())}. '
f'Missing samples are '
f'{[sample_name for sample_name in sample_submitter.all_sample_names() if sample_name not in sample_name_to_accession]}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process needs to set the processing status to RUNNING, SUCCESS or FAILURE, correct? Or is that someone else's responsibility?


def scan(self):
def _scan_per_status(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially very confused about why we needed to scan both tables, before I realised that this scan is (I think) only used to add the first processing step. If that's the case, then maybe it could be a bit less generic and used only in that specific situation - even if we used it for other operations (e.g. scanning for cancelled submissions to clean up the db or something), I don't think creating a SubmissionStep for validation would make sense in those cases.

pretty_print(header, lines)


class NewSubmissionScanner(SubmissionScanner):

statuses = ['UPLOADED']
step_statuses = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a matter of taste, but I think I would prefer the different scanning tasks as methods rather than classes. So we would have just one SubmissionScanner with the generic _scan_per_step_status helper method, and find_new_submissions, find_completed_submission_steps, etc. that call that method with the relevant statuses.

On the other hand, maybe there's more functionality that would go into these subclasses that I'm not thinking of, in which case having the extra classes makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants