EVA-3696 - Processing via a scanner and new brokering method #232

tcezard · 2024-12-16T14:13:38Z

No description provided.

apriltuesday

Great work, also very useful for me because it just makes the orchestration more concrete and easier for me to understand.

For testing, I wouldn't mind including some integration tests against our submission-ws and Biosamples in dev, as long as we clean up the dev dbs and the tests aren't too long (or run selectively, e.g. manually triggered or on tags only) I think it's fine.

We can also just mock the submission-ws and write unit tests. Incidentally I think this is also easier if we use a client object as per my suggestion, means just one thing to mock rather than patching all over the place...

apriltuesday · 2024-12-18T11:17:39Z

eva_sub_cli_processing/sub_cli_validation.py

+
+class SubCliProcessValidation(SubCliProcess):
+
+    all_validation_tasks = ['metadata_check', 'assembly_check', 'aggregation_check', 'vcf_check', 'sample_check',


Will we use these tasks for sub cli processing?

apriltuesday · 2024-12-18T11:23:30Z

eva_sub_cli_processing/sub_cli_brokering.py

+import os
+import shutil
+
+from eva_submission.ENA_submission.upload_to_ENA import ENAUploader, ENAUploaderAsync


Not all imports are being used

apriltuesday · 2024-12-18T11:37:49Z

eva_sub_cli_processing/sub_cli_utils.py

+PROCESSING_STATUS = [READY_FOR_PROCESSING, FAILURE, SUCCESS, RUNNING, ON_HOLD]
+
+
+def sub_ws_auth():


Do you think it's worth extracting the submission WS client into common-pyutils, so it can be used in both eva-sub-cli and eva-submission? It's some extra refactoring, but I think it could be beneficial in the long run to keep python interactions with the submission WS in one place.

apriltuesday · 2024-12-18T14:22:26Z

eva_submission/biosample_submission/biosamples_submitters.py

+        # We need to get back to the reader to get all the names that were present in the spreadsheet
+        return [sample_row.get('Sample Name') or sample_row.get('Sample ID') for sample_row in self.reader.samples]


Should be updated to refer to the JSON instead of the reader

apriltuesday · 2024-12-18T15:37:08Z

eva_sub_cli_processing/process_jobs.py

+
+    def _update_submission_ws(self):
+        put_to_sub_ws(sub_ws_url_build('admin', 'submission', self.submission_id, 'status', self.submission_status))
+        put_to_sub_ws('admin', 'submission-process', self.submission_id, self.processing_step, self.processing_status)


Missing call to sub_ws_url_build

apriltuesday · 2024-12-18T16:17:02Z

eva_sub_cli_processing/sub_cli_brokering.py

+                             f'Found {len(sample_name_to_accession)} and expected '
+                             f'{len(sample_submitter.all_sample_names())}. '
+                             f'Missing samples are '
+                             f'{[sample_name for sample_name in sample_submitter.all_sample_names() if sample_name not in sample_name_to_accession]}')


This process needs to set the processing status to RUNNING, SUCCESS or FAILURE, correct? Or is that someone else's responsibility?

apriltuesday · 2024-12-18T16:24:39Z

eva_sub_cli_processing/process_jobs.py


-    def scan(self):
+    def _scan_per_status(self):


I was initially very confused about why we needed to scan both tables, before I realised that this scan is (I think) only used to add the first processing step. If that's the case, then maybe it could be a bit less generic and used only in that specific situation - even if we used it for other operations (e.g. scanning for cancelled submissions to clean up the db or something), I don't think creating a SubmissionStep for validation would make sense in those cases.

apriltuesday · 2024-12-18T16:31:22Z

eva_sub_cli_processing/process_jobs.py

        pretty_print(header, lines)


 class NewSubmissionScanner(SubmissionScanner):

    statuses = ['UPLOADED']
+    step_statuses = []


This is mostly a matter of taste, but I think I would prefer the different scanning tasks as methods rather than classes. So we would have just one SubmissionScanner with the generic _scan_per_step_status helper method, and find_new_submissions, find_completed_submission_steps, etc. that call that method with the relevant statuses.

On the other hand, maybe there's more functionality that would go into these subclasses that I'm not thinking of, in which case having the extra classes makes sense.

tcezard added 2 commits December 16, 2024 14:12

processing via a scanner and new brokering method

e265b27

processing via a scanner and new brokering method

b61560d

tcezard requested review from apriltuesday and nitin-ebi December 17, 2024 14:55

apriltuesday reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVA-3696 - Processing via a scanner and new brokering method #232

EVA-3696 - Processing via a scanner and new brokering method #232

tcezard commented Dec 16, 2024

apriltuesday left a comment

apriltuesday Dec 18, 2024

apriltuesday Dec 18, 2024

apriltuesday Dec 18, 2024

apriltuesday Dec 18, 2024

apriltuesday Dec 18, 2024

apriltuesday Dec 18, 2024

apriltuesday Dec 18, 2024

apriltuesday Dec 18, 2024


		class SubCliProcessValidation(SubCliProcess):

		all_validation_tasks = ['metadata_check', 'assembly_check', 'aggregation_check', 'vcf_check', 'sample_check',

		PROCESSING_STATUS = [READY_FOR_PROCESSING, FAILURE, SUCCESS, RUNNING, ON_HOLD]


		def sub_ws_auth():

		# We need to get back to the reader to get all the names that were present in the spreadsheet
		return [sample_row.get('Sample Name') or sample_row.get('Sample ID') for sample_row in self.reader.samples]

EVA-3696 - Processing via a scanner and new brokering method #232

Are you sure you want to change the base?

EVA-3696 - Processing via a scanner and new brokering method #232

Conversation

tcezard commented Dec 16, 2024

apriltuesday left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment