Issue 416 - Parallelise evidence string generation #421

apriltuesday · 2024-03-01T09:19:04Z

Closes #416
Closes #420

tcezard

looks good.

tcezard · 2024-03-05T13:14:06Z

cmat/output_generation/report.py

+    def dump_to_file(self, dir_out, filename=COUNTS_FILE_NAME):
+        with open(os.path.join(dir_out, filename), 'w') as f:
+            yaml.safe_dump(vars(self), f)
+
+    def load_from_file(self, filename):
+        with open(filename, 'r') as f:
+            data = yaml.safe_load(f)
+            self.__dict__.update(**data)
+        # yaml loads a dict, convert to counter
+        self.unmapped_trait_names = Counter(self.unmapped_trait_names)


I was going to say that pickling the objet might be safer but I 'm guessing that you want to use the final output yaml elsewhere?

Yes, I was thinking having it in yaml might be more readable and flexible for downstream purposes (e.g. updating the metrics or generating sankey diagrams), but it's probably overkill if those processes end up implemented in Python anyway... I'd probably keep it as it is for now.

apriltuesday · 2024-03-06T16:33:53Z

Counts from test run match last submission, aside from a very silly error that I just fixed.
Runtime was about 14 hours for the entire pipeline, as opposed to about 26 hours before. About 2 hours of that is consequence prediction, so we've basically halved the run time for the evidence generation. I think it's definitely still worth speeding up the JSON validation.

apriltuesday added 3 commits February 29, 2024 15:32

chunk and parallelise evidence string generation

32b8d54

output and aggregate counts for evidence strings

089ad47

add tests and cleanup nextflow

6d6e450

apriltuesday marked this pull request as ready for review March 4, 2024 15:23

apriltuesday self-assigned this Mar 4, 2024

apriltuesday requested a review from tcezard March 4, 2024 15:23

tcezard approved these changes Mar 5, 2024

View reviewed changes

adjust counts to appropriately skip invalid clinical significance values

ddd9eaa

tcezard approved these changes Mar 6, 2024

View reviewed changes

bugfix for summing reports

0668213

apriltuesday merged commit 1e1afdb into EBIvariation:master Mar 7, 2024
1 check passed

apriltuesday deleted the parallel-evidence branch March 7, 2024 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 416 - Parallelise evidence string generation #421

Issue 416 - Parallelise evidence string generation #421

apriltuesday commented Mar 1, 2024 •

edited

Loading

tcezard left a comment

tcezard Mar 5, 2024

apriltuesday Mar 5, 2024

apriltuesday commented Mar 6, 2024

Issue 416 - Parallelise evidence string generation #421

Issue 416 - Parallelise evidence string generation #421

Conversation

apriltuesday commented Mar 1, 2024 • edited Loading

tcezard left a comment

Choose a reason for hiding this comment

tcezard Mar 5, 2024

Choose a reason for hiding this comment

apriltuesday Mar 5, 2024

Choose a reason for hiding this comment

apriltuesday commented Mar 6, 2024

apriltuesday commented Mar 1, 2024 •

edited

Loading