Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVA-3423 - Fixes from tests #16

Merged
merged 15 commits into from
Nov 9, 2023
Merged

EVA-3423 - Fixes from tests #16

merged 15 commits into from
Nov 9, 2023

Conversation

tcezard
Copy link
Member

@tcezard tcezard commented Oct 16, 2023

This is a relatively large PR that refactor several things but should not change any of the core functionality

  • Put all the scripts that are meant to be executable in a bin directory
  • Rename the main script from eva_sub_cli.py because python confused it with the module eva-sub-cli.py see here
  • Remove some command line argument that should go in a config file
  • Make sample_checker.py take full path rather than relative path to the VCF to make sure they can be found
  • Make sure the config and report are created in the submission directory

There are still issues post validation

@tcezard tcezard marked this pull request as draft October 16, 2023 14:36
@tcezard tcezard marked this pull request as ready for review October 20, 2023 10:59
vcf_check_db_report = resolve_single_file_path(
os.path.join(self.output_dir, 'vcf_format', vcf_name + '.*.db')
)
vcf_check_log = self._vcf_check_log(vcf_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to pass in vcf_name directly rather than re-compute in each method? Same for the assembly check files below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason in particular except that I didn't think the function should assume the string would only contain the name. I can pass the vcf_name and remove the extra call.

result_files_per_analysis[analysis_alias].append(file_path)
else:
raise FileNotFoundError(f'{file_path} cannot be resolved')
def resolve_vcf_file_location(vcf_files, files_per_analysis):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this method confusing, maybe could use a docstring... I guess it's checking concordance between the VCF files provided in the metadata (files_per_analysis) and the ones passed on the command-line (vcf_files), the latter of which gives the full path... if this is true it feels like the resolving of location is kind of incidental now.

As an aside, I think I understand better the complexity you were talking about, it kind of feels like we should only receive metadata on the command line (including the files/analysis association) and use that as our sole source of truth. Then the metadata is more of a self-contained, complete description of the project, and there's less room for error (both user error and programmer error).

@tcezard tcezard merged commit 0951b12 into main Nov 9, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants