The CHISEL command chisel
as well as the command chisel_nonormal
runs the entire CHISEL pipeline starting from the required inputs (e.g. BAM files).
During the executiong, the command creates six folders which contain the temporary and final results produced by the 5 distinct steps of CHISEL.
This step aims to estimate the RDR for every genomic bin in each cell.
Moreover, it selects the barcodes that correspond to cells using a specified threshold on the minimum number of reads.
This step creates a folder rdr
with three files:
total.tsv
: a TSV dataframe containing the number of sequencing reads observed for every selected cell. More specifically, the fields are:CELL
: the name of a cell or the namenormal
indicating the matched-normal sample;TOTAL
: the total number of sequencing reads observed for the cell.
rdr.tsv
: a TSV dataframe containg the estimated RDRs with the following fields:CHROMOSOME
: the name of a chromosome;START
: the starting coordinate of a genomic bin;END
: the ending coordinate of the genomic bin;CELL
: the name of a cell;NORMAL
: the number of sequencing reads from the matched-normal sample for the bin;COUNT
: the number of sequencing reads from the cellCELL
in the bin;RDR
: the estimated RDR.
log
: a logging file of the execution of this step (optional).
This step aims to estimate the BAF for phased germline heterozygous SNPs in the selected cells.
This step creates a folder baf
with two files:
baf.tsv
: a TSV dataframe with the following fields:CHROMOSOME
: the name of a chromosome;POS
: a genomic position in the chromosomeCHROMOSOME
for a germline heterozygous SNP;CELL
: the name of a cell;A-COUNT
: the number of observed sequencing reads from the haplotype A of the SNP;B-COUNT
: the number of observed sequencing reads from the haplotype B of the SNP.
log
: a logging file of the execution of this step (optional).
This step aims to combine the RDRs and BAFs for the selected bins in the selected cells.
This step creates a folder combo
with two files:
combo.tsv
: a TSV dataframe with the following fields:CHROMOSOME
: the name of a chromosome;START
: the starting coordinate of a genomic bin;END
: the ending coordinate of the genomic bin;CELL
: the name of a cell;NORMAL
: the number of sequencing reads from the matched-normal sample for the bin;COUNT
: the number of sequencing reads from the cellCELL
in the bin;RDR
: the estimated RDR for the bin in the cellCELL
;A-COUNT
: the number of observed sequencing reads from the haplotype A of the SNP;B-COUNT
: the number of observed sequencing reads from the haplotype B of the SNP;BAF
: the B-allele frequency estimated for the bin in the cellCELL
.
log
: a logging file of the execution of this step (optional).
This step aims to infer the ploidy of each cell and, after global clustering of RDRs and BAFs, to infer the allele- and haplotype-specific copy numbers for every bin in every cell.
This step creates a folder calls
with two files
calls.tsv
: a TSV dataframe with the following fields:CHROMOSOME
: the name of a chromosome;START
: the starting coordinate of a genomic bin;END
: the ending coordinate of the genomic bin;CELL
: the name of a cell;NORMAL
: the number of sequencing reads from the matched-normal sample for the bin;COUNT
: the number of sequencing reads from the cellCELL
in the bin;RDR
: the estimated RDR for the bin in the cellCELL
;A-COUNT
: the number of observed sequencing reads from the haplotype A of the SNP;B-COUNT
: the number of observed sequencing reads from the haplotype B of the SNP;BAF
: the B-allele frequency estimated for the bin in the cellCELL
;ALLELECN
: dash-separated ordered pair of the inferred haplotype-specific copy numbers for the bin in the cellCELL
.
log
: a logging file of the execution of this step (optional).
This steps aims to infer the clones by clustering cells based on the inferred haplotype-specific copy numbers and selecting the clusters that correspond to actual clones.
This step creates a folder clones
with two files:
mapping.tsv
: a TSV dataframe with the following fields:CELL
: the name of a selected cell;CLUSTER
: the cluster where the cellCELL
has been assigned;CLONE
: the clone of the cellCELL
, however it corresponds toNone
if the cells is classified as noisy.
log
: a logging file of the execution of this step (optional).
Moreover, this step introduces a new field (right-most field) in the file calls.tsv
which is CORRECTED_CNS
and corresponds to the final haplotype-specific copy numbers estimated after consensus of cells in the same clone.
This step generate several useful plots about the results, which are fully described here.