Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False Duplication VCF Export #570

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

matren395
Copy link
Contributor

Code to take False Duplication (of three chr21 genes) Hail Table and convert it to a VCF, verify, and export.

03-07-24: still needs verification and header added, but just wanted to get the PR opened

@matren395 matren395 requested a review from klaricch March 7, 2024 21:33
@matren395 matren395 self-assigned this Mar 7, 2024
@matren395 matren395 requested review from klaricch and removed request for klaricch March 12, 2024 21:47
@matren395
Copy link
Contributor Author

still some to do, but unfurling and the existing header code could use a lookover now

@klaricch klaricch self-assigned this Mar 14, 2024
@@ -51,6 +51,136 @@ def filter_liftover_to_false_dups(
return ht


def _v4_false_dup_unfurl_annotations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an unused function that needs to be deleted?

Comment on lines +660 to +665
"--overwrite",
help="Option to overwrite existing custom liftover table.",
action="store_true",
)
parser.add_argument(
"--test",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neither overwrite or test is referenced anywhere in the script

logger = logging.getLogger("false_dup_genes")
logger.setLevel(logging.INFO)

FALSE_DUP_GENES = ["KCNE1", "CBS", "CRYAA"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is already defined in create_false_dup_liftover.py. Constants that already exist should be imported rather than redefined. However, I feel like all the false dup code can just be combined into one script, with arguments that can be supplied to either create the Table or export the VCF.

Comment on lines +354 to +357
:param ht: Release Hail Table
:param vcf_info_reorder: Order of VCF INFO fields
:return: Hail Table prepared for validity checks and export
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param ht: Release Hail Table
:param vcf_info_reorder: Order of VCF INFO fields
:return: Hail Table prepared for validity checks and export
"""
:param ht: Release Hail Table of false dup genes.
:param vcf_info_reorder: Order of VCF INFO fields.
:return: Hail Table prepared for validity checks and export.
"""

:return: Hail Table prepared for validity checks and export
"""
logger.info(
"Unfurling nested gnomAD frequency annotations and add to INFO field..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Unfurling nested gnomAD frequency annotations and add to INFO field..."
"Unfurling nested gnomAD frequency annotations and adding to INFO field..."

return vcf_info_dict


def _joint_filters(ht: hl.Table) -> hl.Table:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_joint_filters -> prepare_joint_filters

variant_qc_filter="RF",
)

custom_filter_dict = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous note about considering missing to be PASS

}


def populate_subset_info_dict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused function?

return vcf_info_dict


def populate_info_dict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused function?

def main(args):
ht = hl.read_table(get_false_dup_genes_path(release_version="4.0"))
ht = prepare_false_dup_ht_for_validation(ht)
header_dict = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does header_dict get used? why call prepare_vcf_filter_header twice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants