-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVA-3423 - Fixes from tests #16
Conversation
af85e34
to
5616dd9
Compare
…ed to analysis using the metadata
eva_sub_cli/reporter.py
Outdated
vcf_check_db_report = resolve_single_file_path( | ||
os.path.join(self.output_dir, 'vcf_format', vcf_name + '.*.db') | ||
) | ||
vcf_check_log = self._vcf_check_log(vcf_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to pass in vcf_name
directly rather than re-compute in each method? Same for the assembly check files below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason in particular except that I didn't think the function should assume the string would only contain the name. I can pass the vcf_name
and remove the extra call.
bin/samples_checker.py
Outdated
result_files_per_analysis[analysis_alias].append(file_path) | ||
else: | ||
raise FileNotFoundError(f'{file_path} cannot be resolved') | ||
def resolve_vcf_file_location(vcf_files, files_per_analysis): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this method confusing, maybe could use a docstring... I guess it's checking concordance between the VCF files provided in the metadata (files_per_analysis
) and the ones passed on the command-line (vcf_files
), the latter of which gives the full path... if this is true it feels like the resolving of location is kind of incidental now.
As an aside, I think I understand better the complexity you were talking about, it kind of feels like we should only receive metadata on the command line (including the files/analysis association) and use that as our sole source of truth. Then the metadata is more of a self-contained, complete description of the project, and there's less room for error (both user error and programmer error).
This is a relatively large PR that refactor several things but should not change any of the core functionality
eva_sub_cli.py
because python confused it with the moduleeva-sub-cli.py
see heresample_checker.py
take full path rather than relative path to the VCF to make sure they can be foundThere are still issues post validation