CHANGELOG

################################################################################
#####                                                                      #####
#####                              CHANGELOG                               #####
#####                                                                      #####
################################################################################

0.20.0: Stable release on 20.03.2024
--------------------------------------------------------------------------------

0.19.* UPDATES
--------------------------------------------------------------------------------
27: Bugfix, minimap uses now different tmp suffixes for different jobs
26: Script getFastaFromVCF could create negative coordinatethat leaad to an error, fixed.
25: transanno added to minimap2 docker image and updated to version 2.26
24: Improvement for Bugfix 0.19.18, there was a copy+paste error that prevented the bugfix to work properly
23: CRITICAL Bugfix: There was a bug in the in-silico size-selection script, leading to a wrong in-silico reference genome 
22: The overhang of the restriction enzymes can now be provided
21: Updated the in-silico digestion script, as it contained apparently a little bug in the SimRAD package
20: Added a heatmap function to summarize the quantification locations on the reference genome.
19: Reintroduced the image visualisation of te VCF image
18: Bugfix, the pipeline crashed when no sampleinfo.tsv file was given
17: Bugfix, sampleinfo colnames caused an error in the casecontrol part, that is fixed now. 
16: Dependency adjusted for mock alignment against reference genome in finalReport
15: Bugfix, when only one group information was provided in samplinfo.tsv the finalReport crashed
14: Added the liftOver rules to liftover the finalMock based VCF to an existing reference genome
13: The mock references are now also fa-indexed
12: Added --eqx option to minimap steps, as the "assert(has_cigar + no_cigar == 1);" test failed sometimes
11: Now I write down the mock to insilico mapping rates
10: Changed the index for Mock vs insilico to bai from csi, as csi threw an error for the mock reference
9: Adjusted the file paths to the log file
8: Mock reference is now also aligned against the different in-silico predictions
7: Removed the previous additions and started an own similarity metric script instead.
6: Added the mock to reference similarity step, using MUMmer
5: Enabled the SE option, so the pipeline should now be ready to handle also single-end input (not in the reports, yet)
4: Handled the case that no reads were disgarded from the mock reference building step
3: Removed minimum lengths criteris in pear command
2: Use N for stitching instead of T
1: New dev branch

0.18.*: Stable release on 18.01.2023
--------------------------------------------------------------------------------
2: Another critial bugfix for variant calling report
1: VariantCalling report dependency bugfix

0.17.* UPDATES
--------------------------------------------------------------------------------
69: Adjusted the remaining input and report files to work with and without provided reference genome
68: Adjusted the final report so that it runs also without provided reference genome
67: Input files changed for final report so that pipeline dependencies work without provided reference genome
66: Exception handling if the first sample does not have reads aligned to reference genome. Now the sample with the most alignemtns is used for visualisation (instead of the first one)
65: Missing variables added for variatn calling report
64: Fixed dependencies from the report
63: Server resource allocation for the Insilico standalo report added to the server default file
62: Pairwise comparisons added to the report
61: More chapter explanations added to the QC-R-Module
60: Minimum and Maximum fragment lengths added to the basic stats report part
59: Location information added for the most covered alignment loci
58: Location names added for the most covered alignment loci
57: Small additions to the alignments-R-module (language, explanations, etc)
56: Small additions to the Cluster and Reference-R-module (language, explanations, etc)
55: Small additions to the QC-R-module (language, explanations, etc)
54: Default paths adjusted
53: Function "getConfigField" reports now only field names a row starts with
52: Added the --cluster-cancel option to the start script
51: Changed the fasta-import function in the in-silico prediction, as it might generate errors for larger genomes
50: Prepared a stand-alone In-silico report script
49: Added the In-Silico section to the final report
48: Alignment stats are determined for the size selected in-silico prediction alignments
47: Alignment stats are determined for the full in-silico prediction alignments
46: Reads are now aligned agains the size selected in-silico prediction loci
45: Reads are now aligned agains the full in-silico prediction loci
44: More details on the welcoming screen
43: In situ predicted locations for ddRad digestion are now prepared and written to References/
42: Table design in Markdown adjusted
41: Cosmetics for finalReport, status messages printed into the final report were removed.
40: Plenty of scatterplots added to related metrics to sequencing depth
39: More exportable tables added
38: Harmonised the plot colours in the final report
37: Added more descriptions to the final report
36: Deduplication vs Coverage plot added
35: QC tables display now the full dataset, without any threshold
34: QC tables can now be exported
33: Server config information added to the reports
32: QC plots now show ommit the missing R2 output (and do not duplicate R1 to pretend it to be R2)
31: Expection added in finalReport in case FastQC/MultiQC report different binnings in their reports for R1 and R2
30: Internal improvement: Duplicated Chunk names in report-knitr files are allowed
29: Paths in the configuration file can now be relative
28: Special case for flanking site determination caught (in case alignment starts at base 0, bed could have been negativ)
27: FinalReport polished and more statistics added
26: Chapter "Individual analysis" added to the finalReport
25: FinalReport: The ACGT-plot was improved in readability
24: FinalReport: Total sequences stats report now table with samples with less than 100k reads 
23: Bugfix, flanking sequence extraction ignored the first alignment
22: flanking bed files are now tab-delimted, as reqired by bedtools
21: Header lines removed from Stringtie output to ensure strict gtf compatability
20: Variant calling missingnes in percentages are now displayed on a fixed 0-100 scale
19: Added the Gini coefficient to represent the concentration of reads against the different alignment loci
18: Added the correlation plot for th loci aligned to reference genome
17: Critical bugfix in 'Mockrefloci_vs_samplealignments' that crashed the pipeline byt wrong output-filename
16: QC diverging samples are now printed for upper and lower throeshols in finalReport
15: Missing percentages are now always displayed on a scale between 0 and 100%
14: summaryTable cluster to reference genome added to the final report
13: Caught the special case, if the "genome" is also part of the filename of "genome" that produced an error
12: Dependency of preliminary mock vcf added to the report to avoid errors
11: QC module cleaned up (needs still a little)
10: Added a fasta-index dependency to avoid crashes in case fasta index was not created before stats are calculated
9: Fixed hard-coded link to docker container in Module 3 (r-gbs)
8: Special case caught, in case a reference aligned bam is empty
7: Flanking sites from individual alignments are now extracted
6: Input bug in the preparesamplesheet.R script fixed (tab-delimited input enforced)
5: Removed for the time-being the R2 reads from raw-read FastQC, due to some bug in function.
4: Created functions to handle individual fastq files in FastQC report
3: Variant Calling report added
2: Added a server-small config to downscale the server requirements for smaller projects
1: Define a TMPDIR for the create vertical ref rules, as it seems to crash for large datasets when the system default tmp folder is small

Previous updates
--------------------------------------------------------------------------------
0.16.1: * Small path adjustments
0.16  : * New release with previous changes
0.15.9: * Added the callvariants rule
0.15.8: * Added the readalignment rule
0.15.7: * Exception catching added for QC-Report block "qualityValueDistribution {Concatenated,Trimmed} data"
0.15.6: * Bugfixes and restructing of the script-files
REVERSED!!! 0.15.5: * config and server config files are now taken from bash call and do not need to be specified again in the config file itself 
0.15.4: * Changed final Report also to modules that can be reused from other pipelines
0.15.3: * Dedicated QC report added for the QC rule
0.15.2: * Added the barcodes file as a dependency for rules that need it, otherwise the pipeline ran into a missing file error.
0.15.1: * Changed the r-gbs standard container to version 4.2.1-0.2
0.14  : * Major change, switching from unhandy rawsamples, samples, barcodes.txt to single, universal sample sheet file
0.12  : * Stable release after bugfixes and added features
0.11.5: * Mock reference contig alignemtns against reference genome coverage plot added
0.11.4: * StringMerge stats removed from the FinalReport and Cleaning scripts adjusted
0.11.3: * Bugfix in CountFileList.txt - it assumed same order as in barcodesID.txt
0.11.2: * Quality value distribution added to the final report.
0.11.1: * Minor bugfixes
0.10  : * Stable release version that sumarises provious fixes in last stable release
0.8.16: * Added a docker samtools container for the variant calling for existing mock and variant loci
0.8.15: * Case handling when some samples hardly have any variants called
0.8.14: * Error handling for the case that some mock clusters have no reads aligned to
0.8.13: * DOS encoding does not affect the GBS-perl script 4 anymore
0.8.12: * The minimum sample size can now be smaller than 6 for the final report
0.8.11: * The visualisation of the vcf work now smoothly, also if not all alleles for a genotypes are present
0.8.10: * Critical bugfix: vcf-barplot in finalReport.Rmd threw an error for larger datasets
0.8.9 : * ICS visualisation uses now the Scores S4 object to plot.
0.8.8 : * "final reads per coverage group"-rule handles now also empty groups
0.8.7 : * Increased the memory for minimap2 from 4G (default) to 8G and added the --split-prefix=tmp option. TODO: This could be also an option in the configuration file
0.8.6 : * Log files for several rules upgraded
0.8.5 : * Reaining bai indices changed to csi indices
0.8.4 : * Bugfix: If flanking site is in the first six bases of a cluster the script threw an error
0.8.3 : * Missingness vs Seq depth added (NOT AS GENERAL CODE!!!)
0.8.2 : * Improved visualisation of vcf
0.8.1 : * Bugfix: Flanking site frequencies are now properly merged and counted
0.8   : * Latest release version
0.7.16: * Bugfix: MockReference refinement filtered as character, not numerical
0.7.15: * Removed the NVME disc areas for the default run configuration
0.7.14: * Email notification for slurm jobs added
0.7.13: * Mock reference evaluation target rule added (MockEval)
0.7.12: * New output target rule MockEval added
0.7.11: * New output target rule QC added
0.7.10: * Final report: Average mapping rate for reference genome added
0.7.9 : * Final Report: Length of final mock reference in bases added to the report
0.7.8 : * Bugfix: Wrong vcf output path for calling variants with reference genome
0.7.7 : * Bugfix: Wrong vcf output path for calling variants with existing mockref
0.7.6 : * Bugfix: Wrong path given for filtering existing mock ref variants
0.7.5 : * Distribution of the bp postion of the variants added
0.7.4 : * Bugfix: Wrong vcf was analysed in final report
0.7.3 : * Conditional inputs for existing mock added (pipeline recognises now if it should perform that branch of the analysis or not)
0.7.2 : * Variant calling for existing mock reference added
0.7.1 : * Structure of final report adjusted
0.6   : * Release version for previous additions
0.5.15: * Report adjusted
0.5.14: * Bugfix: Rule FinalMockVsRef_alignment didn't have resource allocations
0.5.13: * Statistics on final mock reference added
0.5.12: * Coverage for finalMock alignments are now also calculated
0.5.11: * Input rule all cleaned up a bit to account for temporary output
0.5.10: * Mapping stats for final mock vs reference genome added
0.5.9 : * Bugfix - Final report crashed, in case that no casecontrol is given in samplelist
0.5.8 : * Final mock reference was not declared in output
0.5.7 : * Dependencies between the rules improved
0.5.6 : * Bugfix - wrong path for reference based vcf fixed
0.5.5 : * Bugfix - wrong reference vcf address in module 6
0.5.4 : * Variant calling for final mock added
0.5.3 : * Rule all rule simplified
0.5.2 : * Script to refine the mock reference added
0.5.1 : * samtools index set to csi instead of bai
0.4.1 : * Bugfix - Missing outputfile script added
0.4   : * Release version
0.3.16: * Several other small path typos
0.3.15: * Mixup in samflags anf flagsts solev for final report
0.3.14: * Path typos in final report fixed
0.3.13: * Increased default NVME space for mock reference building
0.3.12: * Singularity binding added and makeClean script extended
0.3.11: * Typo in rule MockVsRef_samtools_SamToSortedBam output fixed
0.3.10: * Module 6 and 7 finalized
0.3.9 : * Step 6 and 7 moved to Module 5 and reorganised
0.3.8 : * Step 5 moved to Module4
0.3.7 : * Step3 moved to Module3 and reorganised
0.3.6 : * Welcoming screen added
        * Config file simplified
        * Added a makeClean.sh script to delete all files created from the pipeline (=use it carefully!!!)
0.3.5 : * Cutadapt container 0.3 used now
0.3.4 : * Step2 moved to Module2 and organized 
0.3.3 : * Step4 merged into Module0
0.3.2 : * Step0 renamed to Module0 and organised
0.3.1 : * Step1 renamed to Module1 and organised
        * Changelog introduced