diff --git a/00-introduction.md b/00-introduction.md new file mode 100644 index 000000000..e3bea9ba4 --- /dev/null +++ b/00-introduction.md @@ -0,0 +1,549 @@ +--- +title: Introducing R and RStudio IDE +teaching: 30 +exercises: 15 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Know advantages of analyzing data in R +- Know advantages of using RStudio +- Create an RStudio project, and know the benefits of working within a project +- Be able to customize the RStudio layout +- Be able to locate and change the current working directory with `getwd()` and `setwd()` +- Compose an R script file containing comments and commands +- Understand what an R function is +- Locate help for an R function using `?`, `??`, and `args()` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- Why use R? +- Why use RStudio and how does it differ from R? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + +## Getting ready to use R for the first time + +In this lesson we will take you through the very first things you need to get +R working. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: This lesson works best on the cloud + +Remember, these lessons assume we are using the pre-configured virtual machine +instances provided to you at a genomics workshop. Much of this work could be +done on your laptop, but we use instances to simplify workshop setup +requirements, and to get you familiar with using the cloud (a common +requirement for working with big data). +Visit the [Genomics Workshop setup page](https://www.datacarpentry.org/genomics-workshop/setup.html) +for details on getting this instance running on your own, or for the info you +need to do this on your own computer. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## A Brief History of R + +[The programming language R](https://en.wikipedia.org/wiki/R_\(programming_language\)) has been around +since 1995, and was created by Ross Ihaka and Robert Gentleman at the University +of Auckland, New Zealand. R is based off the S programming language developed +at Bell Labs and was developed to teach intro statistics. See this [slide deck](https://www.stat.auckland.ac.nz/~ihaka/downloads/Massey.pdf) +by Ross Ihaka for more info on the subject. + +## Advantages of using R + +At more than 20 years old, R is fairly mature and [growing in popularity](https://www.tiobe.com/tiobe-index/r/). However, programming isn't a popularity contest. Here are key advantages of +analyzing data in R: + +- **R is [open source](https://en.wikipedia.org/wiki/Open-source_software)**. + This means R is free - an advantage if you are at an institution where you + have to pay for your own MATLAB or SAS license. Open source, is important to + your colleagues in parts of the world where expensive software in + inaccessible. It also means that R is actively developed by a community (see + [r-project.org](https://www.r-project.org/)), + and there are regular updates. +- **R is widely used**. Ok, maybe programming is a popularity contest. Because, + R is used in many areas (not just bioinformatics), you are more likely to + find help online when you need it. Chances are, almost any error message you + run into, someone else has already experienced. +- **R is powerful**. R runs on multiple platforms (Windows/MacOS/Linux). It can + work with much larger datasets than popular spreadsheet programs like + Microsoft Excel, and because of its scripting capabilities is far more + reproducible. Also, there are thousands of available software packages for + science, including genomics and other areas of life science. + +:::::::::::::::::::::::::::::::::::::: discussion + +## Discussion: Your experience + +What has motivated you to learn R? Have you had a research question for which +spreadsheet programs such as Excel have proven difficult to use, or where the +size of the data set created issues? + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Introducing RStudio Server + +In these lessons, we will be making use of a software called [RStudio](https://www.rstudio.com/products/RStudio/), +an [Integrated Development Environment (IDE)](https://en.wikipedia.org/wiki/Integrated_development_environment). +RStudio, like most IDEs, provides a graphical interface to R, making it more +user-friendly, and providing dozens of useful features. We will introduce +additional benefits of using RStudio as you cover the lessons. In this case, +we are specifically using [RStudio Server](https://www.rstudio.com/products/RStudio/#Server), +a version of RStudio that can be accessed in your web browser. RStudio Server +has the same features of the Desktop version of RStudio you could download as +standalone software. + +## Log on to RStudio Server + +Open a web browser and enter the URL you used to log in at the terminal +(provided by your instructors), followed by +`:8787`. For example, if your URL was ec2.12.2.45.678.compute-1.amazonaws.com, you should enter: + +``` +http://ec2.12.2.45.678.compute-1.amazonaws.com:8787 +``` + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Make sure there are no spaces before or after your URL. + +If your URL has spaces, your web browser may interpret it as a search query. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +You should now be looking at a page that will allow you to login to the RStudio +server: + +rstudio default session + +Enter your user credentials and click Sign In. The credentials for +the genomics Data Carpentry instances will be provided by your instructors. + +You should now see the RStudio interface: + +rstudio default session + +## Create an RStudio project + +One of the first benefits we will take advantage of in RStudio is something +called an **RStudio Project**. An RStudio project allows you to more easily: + +- Save data, files, variables, packages, etc. related to a specific + analysis project +- Restart work where you left off +- Collaborate, especially if you are using version control such as [git](https://swcarpentry.github.io/git-novice/). + +1. To create a project, go to the File menu, and click New Project.... + +rstudio default session + +2. In the window that opens select **New Directory**, then **New Project**. For + "Directory name:" enter **dc\_genomics\_r**. For "Create project as subdirectory of", click Browse... and then click Choose which will select your home directory "~". + +3. Finally click Create Project. In the "Files" tab of your output + pane (more about the RStudio layout in a moment), you should see an RStudio + project file, **dc\_genomics\_r.Rproj**. All RStudio projects end with the + "**.Rproj**" file extension. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Make your project more reproducible with renv + +One of the most wonderful and also frustrating aspects of working with R is +managing packages. We will talk more about them, but packages (e.g. ggplot2) +are add-ons that extend what you can do with R. Unfortunately it is very +common that you may run into versions of R and/or R packages that are not +compatible. This may make it difficult for someone to run your R script using +their version of R or a given R package, and/or make it more difficult to run +their scripts on your machine. [renv](https://rstudio.github.io/renv/) +is an RStudio add-on that will associate your packages and project so that +your work is more portable and reproducible. To turn on renv click on +the Tools menu and select Project Options. Under +**Enviornments** check off "**Use renv with this project**" and follow any +installation instructions. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Creating your first R script + +Now that we are ready to start exploring R, we will want to keep a record of the +commands we are using. To do this we can create an R script: + +Click the File menu and select New File and then +R Script. Before we go any further, save your script by clicking the +save/disk icon that is in the bar above the first line in the script editor, or +click the File menu and select save. In the "Save File" +window that opens, name your file **"genomics\_r\_basics"**. The new script +**genomics\_r\_basics.R** should appear under "files" in the output pane. By +convention, R scripts end with the file extension **.R**. + +## Overview and customization of the RStudio layout + +Here are the major windows (or panes) of the RStudio +environment: + +rstudio default session + +- **Source**: This pane is where you will write/view R scripts. Some outputs + (such as if you view a dataset using `View()`) will appear as a tab here. +- **Console/Terminal/Jobs**: This is actually where you see the execution of + commands. This is the same display you would see if you were using R at the + command line without RStudio. You can work interactively (i.e. enter R + commands here), but for the most part we will run a script (or lines in a + script) in the source pane and watch their execution and output here. The + "Terminal" tab give you access to the BASH terminal (the Linux operating + system, unrelated to R). RStudio also allows you to run jobs (analyses) in the background. This is useful if some analysis will take a while to run. You can see the status of those jobs in the background. +- **Environment/History**: Here, RStudio will show you what datasets and + objects (variables) you have created and which are defined in memory. + You can also see some properties of objects/datasets such as their type + and dimensions. The "History" tab contains a history of the R commands you've executed R. +- **Files/Plots/Packages/Help/Viewer**: This multipurpose pane will show you the + contents of directories on your computer. You can also use the "Files" tab to + navigate and set the working directory. The "Plots" tab will show the output + of any plots generated. In "Packages" you will see what packages are actively + loaded, or you can attach installed packages. "Help" will display help files + for R functions and packages. "Viewer" will allow you to view local web + content (e.g. HTML outputs). + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Uploads and downloads in the cloud + +In the "Files" tab you can select a file and download it from your cloud +instance (click the "more" button) to your local computer. +Uploads are also possible. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +All of the panes in RStudio have configuration options. For example, you can +minimize/maximize a pane, or by moving your mouse in the space between +panes you can resize as needed. The most important customization options for +pane layout are in the View menu. Other options such as font sizes, +colors/themes, and more are in the Tools menu under +Global Options. + +::::::::::::::::::::::::::::::::::::::::: callout + +## You are working with R + +Although we won't be working with R at the terminal, there are lots of reasons +to. For example, once you have written an RScript, you can run it at any Linux +or Windows terminal without the need to start up RStudio. We don't want +you to get confused - RStudio runs R, but R is not RStudio. For more on +running an R Script at the terminal see this [Software Carpentry lesson](https://swcarpentry.github.io/r-novice-inflammation/05-cmdline/). + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Getting to work with R: navigating directories + +Now that we have covered the more aesthetic aspects of RStudio, we can get to +work using some commands. We will write, execute, and save the commands we +learn in our **genomics\_r\_basics.R** script that is loaded in the Source pane. +First, lets see what directory we are in. To do so, type the following command +into the script: + + +```r +getwd() +``` + +To execute this command, make sure your cursor is on the same line the command +is written. Then click the Run button that is just above the first +line of your script in the header of the Source pane. + +In the console, we expect to see the following output\*: + +```output +[1] "/home/dcuser/dc_genomics_r" +``` + +\* Notice, at the Console, you will also see the instruction you executed +above the output in blue. + +Since we will be learning several commands, we may already want to keep some +short notes in our script to explain the purpose of the command. Entering a `#` +before any line in an R script turns that line into a comment, which R will +not try to interpret as code. Edit your script to include a comment on the +purpose of commands you are learning, e.g.: + + +```r +# this command shows the current working directory +getwd() +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Work interactively in R + +What happens when you try to enter the `getwd()` command in the Console pane? + +::::::::::::::: solution + +## Solution + +You will get the same output you did as when you ran `getwd()` from the +source. You can run any command in the Console, however, executing it from +the source script will make it easier for us to record what we have done, +and ultimately run an entire script, instead of entering commands one-by-one. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +For the purposes of this exercise we want you to be in the directory `"/home/dcuser/R_data"`. +What if you weren't? You can set your home directory using the `setwd()` +command. Enter this command in your script, but *don't run* this yet. + + +```r +# This sets the working directory +setwd() +``` + +You may have guessed, you need to tell the `setwd()` command +what directory you want to set as your working directory. To do so, inside of +the parentheses, open a set of quotes. Inside the quotes enter a `/` which is +the root directory for Linux. Next, use the Tab key, to take +advantage of RStudio's Tab-autocompletion method, to select `home`, `dcuser`, +and `dc_genomics_r` directory. The path in your script should look like this: + + +```r +# This sets the working directory +setwd("/home/dcuser/dc_genomics_r") +``` + +When you run this command, the console repeats the command, but gives you no +output. Instead, you see the blank R prompt: `>`. Congratulations! Although it +seems small, knowing what your working directory is and being able to set your +working directory is the first step to analyzing your data. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Never use `setwd()` + +Wait, what was the last 2 minutes about? Well, setting your working directory +is something you need to do, you need to be very careful about using this as +a step in your script. For example, what if your script is being on a computer +that has a different directory structure? The top-level path in a Unix file +system is root `/`, but on Windows it is likely `C:\`. This is one of several +ways you might cause a script to break because a file path is configured +differently than your script anticipates. R packages like ['here'](https://cran.r-project.org/package=here) +and ['file.path'](https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/file.path) +allow you to specify file paths in a way that is more operating system +independent. See Jenny Bryan's [blog post](https://www.tidyverse.org/articles/2017/12/workflow-vs-script/) for this +and other R tips. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Using functions in R, without needing to master them + +A function in R (or any computing language) is a short +program that takes some input and returns some output. Functions may seem like an advanced topic (and they are), but you have already +used at least one function in R. `getwd()` is a function! The next sections will help you understand what is happening in +any R script. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: What do these functions do? + +Try the following functions by writing them in your script. See if you can +guess what they do, and make sure to add comments to your script about your +assumed purpose. + +- `dir()` +- `sessionInfo()` +- `date()` +- `Sys.time()` + +::::::::::::::: solution + +## Solution + +- `dir()` # Lists files in the working directory +- `sessionInfo()` # Gives the version of R and additional info including + on attached packages +- `date()` # Gives the current date +- `Sys.time()` # Gives the current time + +*Notice*: Commands are case sensitive! + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +You have hopefully noticed a pattern - an R +function has three key properties: + +- Functions have a name (e.g. `dir`, `getwd`); note that functions are case + sensitive! +- Following the name, functions have a pair of `()` +- Inside the parentheses, a function may take 0 or more arguments + +An argument may be a specific input for your function and/or may modify the +function's behavior. For example the function `round()` will round a number +with a decimal: + + +```r +# This will round a number to the nearest integer +round(3.14) +``` + +```{.output} +[1] 3 +``` + +## Getting help with function arguments + +What if you wanted to round to one significant digit? `round()` can +do this, but you may first need to read the help to find out how. To see the help +(In R sometimes also called a "vignette") enter a `?` in front of the function +name: + + +```r +?round() +``` + +The "Help" tab will show you information (often, too much information). You +will slowly learn how to read and make sense of help files. Checking the "Usage" or "Examples" +headings is often a good place to look first. If you look under "Arguments," we +also see what arguments we can pass to this function to modify its behavior. +You can also see a function's argument using the `args()` function: + + +```r +args(round) +``` + +```{.output} +function (x, digits = 0) +NULL +``` + +`round()` takes two arguments, `x`, which is the number to be +rounded, and a +`digits` argument. The `=` sign indicates that a default (in this case 0) is +already set. Since `x` is not set, `round()` requires we provide it, in contrast +to `digits` where R will use the default value 0 unless you explicitly provide +a different value. We can explicitly set the digits parameter when we call the +function: + + +```r +round(3.14159, digits = 2) +``` + +```{.output} +[1] 3.14 +``` + +Or, R accepts what we call "positional arguments", if you pass a function +arguments separated by commas, R assumes that they are in the order you saw +when we used `args()`. In the case below that means that `x` is 3.14159 and +digits is 2. + + +```r +round(3.14159, 2) +``` + +```{.output} +[1] 3.14 +``` + +Finally, what if you are using `?` to get help for a function in a package not installed on your system, such as when you are running a script which has dependencies. + + +```r +?geom_point() +``` + +will return an error: + +```error +Error in .helpForCall(topicExpr, parent.frame()) : + no methods for ‘geom_point' and no documentation for it as a function +``` + +Use two question marks (i.e. `??geom_point()`) and R will return +results from a search of the documentation for packages you have installed on your computer +in the "Help" tab. Finally, if you think there +should be a function, for example a statistical test, but you aren't +sure what it is called in R, or what functions may be available, use +the `help.search()` function. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Searching for R functions + +Use `help.search()` to find R functions for the following statistical +functions. Remember to put your search query in quotes inside the function's +parentheses. + +- Chi-Squared test +- Student t-test +- mixed linear model + +::::::::::::::: solution + +## Solution + +While your search results may return several tests, we list a few you might +find: + +- Chi-Squared test: `stats::Chisquare` +- Student t-test: `stats::t.test` +- mixed linear model: `stats::lm.glm` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +We will discuss more on where to look for the libraries and packages that contain functions you want to use. For now, be aware that two important ones are [CRAN](https://cran.r-project.org/) - the main repository for R, and [Bioconductor](https://bioconductor.org/) - a popular repository for bioinformatics-related R packages. + +## RStudio contextual help + +Here is one last bonus we will mention about RStudio. It's difficult to +remember all of the arguments and definitions associated with a given function. +When you start typing the name of a function and hit the Tab key, +RStudio will display functions and associated help: + +rstudio default session + +Once you type a function, hitting the Tab inside the parentheses +will show you the function's arguments and provide additional help +for each of these arguments. + +rstudio default session + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- R is a powerful, popular open-source scripting language +- You can customize the layout of RStudio, and use the project feature to manage the files and packages used in your analysis +- RStudio allows you to run R in an easy-to-use interface and makes it easy to find help + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/01-r-basics.md b/01-r-basics.md new file mode 100644 index 000000000..7c3495ad4 --- /dev/null +++ b/01-r-basics.md @@ -0,0 +1,1054 @@ +--- +title: R Basics +teaching: 60 +exercises: 20 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Be able to create the most common R objects including vectors +- Understand that vectors have modes, which correspond to the type of data they contain +- Be able to use arithmetic operators on R objects +- Be able to retrieve (subset), name, or replace, values from a vector +- Be able to use logical operators in a subsetting operation + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- What will these lessons not cover? +- What are the basic features of the R language? +- What are the most common objects in R? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + +## "The fantastic world of R awaits you" OR "Nobody wants to learn how to use R" + +Before we begin this lesson, we want you to be clear on the goal of the workshop +and these lessons. We believe that every learner can **achieve competency +with R**. You have reached competency when you find that you are able to +**use R to handle common analysis challenges in a reasonable amount of time** +(which includes time needed to look at learning materials, search for answers +online, and ask colleagues for help). As you spend more time using R (there is +no substitute for regular use and practice) you will find yourself gaining +competency and even expertise. The more familiar you get, the more +complex the analyses you will be able to carry out, with less frustration, and +in less time - the fantastic world of R awaits you! + +## What these lessons will not teach you + +Nobody wants to learn how to use R. People want to learn how to use R to analyze +their own research questions! Ok, maybe some folks learn R for R's sake, but +these lessons assume that you want to start analyzing genomic data as soon as +possible. Given this, there are many valuable pieces of information about R +that we simply won't have time to cover. Hopefully, we will clear the hurdle of +giving you just enough knowledge to be dangerous, which can be a high bar +in R! We suggest you look into the additional learning materials in the tip box +below. + +**Here are some R skills we will *not* cover in these lessons** + +- How to create and work with R matrices +- How to create and work with loops and conditional statements, and the "apply" + family of functions (which are super useful, read [this blog post to learn more about these functions](https://www.r-bloggers.com/r-tutorial-on-the-apply-family-of-functions/)) +- How to do basic string manipulations (e.g. finding patterns in text using grep, replacing text) +- How to plot using the default R graphic tools (we *will* cover plot creation, but will do so using the popular plotting package `ggplot2`) +- How to use advanced R statistical functions + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Where to learn more + +The following are good resources for learning more about R. Some of them +can be quite technical, but if you are a regular R user you may ultimately +need this technical knowledge. + +- [R for Beginners](https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf): + By Emmanuel Paradis and a great starting point +- [The R Manuals](https://cran.r-project.org/manuals.html): Maintained by the + R project +- [R contributed documentation](https://cran.r-project.org/other-docs.html): + Also linked to the R project; importantly there are materials available in + several languages +- [R for Data Science](https://r4ds.had.co.nz/): A wonderful collection by + noted R educators and developers Garrett Grolemund and Hadley Wickham +- [Practical Data Science for Stats](https://peerj.com/collections/50-practicaldatascistats/): + Not exclusively about R usage, but a nice collection of pre-prints on data science + and applications for R +- [Programming in R Software Carpentry lesson](https://software-carpentry.org/lessons/): + There are several Software Carpentry lessons in R to choose from + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Creating objects in R + +:::::::::::::::::::::::::::::::::::::::::: prereq + +## Reminder + +At this point you should be coding along in the "**genomics\_r\_basics.R**" +script we created in the last episode. Writing your commands in the script +(and commenting it) will make it easier to record what you did and why. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +What might be called a variable in many languages is called an **object** +in R. + +**To create an object you need:** + +- a name (e.g. 'first_value') +- a value (e.g. '1') +- the assignment operator ('\<-') + +In your script, "**genomics\_r\_basics.R**", using the R assignment operator '\<-', +assign '1' to the object 'first_value' as shown. Remember to leave a comment in the line +above (using the '#') to explain what you are doing: + + +```r +# this line creates the object 'first_value' and assigns it the value '1' + +first_value <- 1 +``` + +Next, run this line of code in your script. You can run a line of code +by hitting the Run button that is just above the first line of your +script in the header of the Source pane or you can use the appropriate shortcut: + +- Windows execution shortcut: Ctrl\+Enter +- Mac execution shortcut: Cmd(⌘)\+Enter + +To run multiple lines of code, you can highlight all the line you wish to run +and then hit Run or use the shortcut key combo listed above. + +In the RStudio 'Console' you should see: + +```output +first_value <- 1 +> +``` + +The 'Console' will display lines of code run from a script and any outputs or +status/warning/error messages (usually in red). + +In the 'Environment' window you will also get a table: + +| Values | | +| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| first_value | 1 | + +The 'Environment' window allows you to keep track of the objects you have +created in R. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Create some objects in R + +Create the following objects; give each object an appropriate name +(your best guess at what name to use is fine): + +1. Create an object that has the value of number of pairs of human chromosomes +2. Create an object that has a value of your favorite gene name +3. Create an object that has this URL as its value: "[ftp://ftp.ensemblgenomes.org/pub/bacteria/release-39/fasta/bacteria\_5\_collection/escherichia\_coli\_b\_str\_rel606/](ftp://ftp.ensemblgenomes.org/pub/bacteria/release-39/fasta/bacteria_5_collection/escherichia_coli_b_str_rel606/)" +4. Create an object that has the value of the number of chromosomes in a + diploid human cell + +::::::::::::::: solution + +## Solution + +Here as some possible answers to the challenge: + + +```r +human_chr_number <- 23 +gene_name <- 'pten' +ensemble_url <- 'ftp://ftp.ensemblgenomes.org/pub/bacteria/release-39/fasta/bacteria_5_collection/escherichia_coli_b_str_rel606/' +human_diploid_chr_num <- 2 * human_chr_number +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Naming objects in R + +Here are some important details about naming objects in R. + +- **Avoid spaces and special characters**: Object names cannot contain spaces or the minus sign (`-`). You can use '\_' to make names more readable. You should avoid + using special characters in your object name (e.g. ! @ # . , etc.). Also, + object names cannot begin with a number. +- **Use short, easy-to-understand names**: You should avoid naming your objects + using single letters (e.g. 'n', 'p', etc.). This is mostly to encourage you + to use names that would make sense to anyone reading your code (a colleague, + or even yourself a year from now). Also, avoiding excessively long names will + make your code more readable. +- **Avoid commonly used names**: There are several names that may already have a + definition in the R language (e.g. 'mean', 'min', 'max'). One clue that a name + already has meaning is that if you start typing a name in RStudio and it gets + a colored highlight or RStudio gives you a suggested autocompletion you have + chosen a name that has a reserved meaning. +- **Use the recommended assignment operator**: In R, we use '\<- ' as the + preferred assignment operator. '=' works too, but is most commonly used in + passing arguments to functions (more on functions later). There is a shortcut + for the R assignment operator: + - Windows execution shortcut: Alt\+\- + - Mac execution shortcut: Option\+\- + +There are a few more suggestions about naming and style you may want to learn +more about as you write more R code. There are several "style guides" that +have advice. One of the more widely used is the [tidyverse R style guide](https://style.tidyverse.org/index.html), +but there is also a [Google R style guide](https://google.github.io/styleguide/Rguide.html), and +[Jean Fan's R style guide](https://jef.works/R-style-guide/), among others. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Pay attention to warnings in the script console + +If you enter a line of code in your script that contains an error, RStudio +may give you an error message and underline this mistake. Sometimes these +messages are easy to understand, but often the messages may need some figuring +out. Paying attention to these warnings will help you avoid mistakes. In the example below, our object name has a space, which +is not allowed in R. The error message does not say this directly, +but R is "not sure" +about how to assign the name to "human\_ chr\_number" when the object name we +want is "human\_chr\_number". + +rstudio script warning + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Reassigning object names or deleting objects + +Once an object has a value, you can change that value by overwriting it. R will +not give you a warning or error if you overwriting an object, which +may or may not be a good thing +depending on how you look at it. + + +```r +# gene_name has the value 'pten' or whatever value you used in the challenge. +# We will now assign the new value 'tp53' +gene_name <- 'tp53' +``` + +You can also remove an object from R's memory entirely. The `rm()` function +will delete the object. + + +```r +# delete the object 'gene_name' +rm(gene_name) +``` + +If you run a line of code that has only an object name, R will normally display +the contents of that object. In this case, we are told the object no +longer exists. + +```error +Error: object 'gene_name' not found +``` + +## Understanding object data types (modes) + +In R, **every object has two properties**: + +- **Length**: How many distinct values are held in that object +- **Mode**: What is the classification (type) of that object. + +We will get to the "length" property later in the lesson. The **"mode" property** +**corresponds to the type of data an object represents**. The most common modes +you will encounter in R are: + +| Mode (abbreviation) | Type of data | +| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Numeric (num) | Numbers such floating point/decimals (1.0, 0.5, 3.14), there are also more specific numeric types (dbl - Double, int - Integer). These differences are not relevant for most beginners and pertain to how these values are stored in memory | +| Character (chr) | A sequence of letters/numbers in single '' or double " " quotes | +| Logical | Boolean values - TRUE or FALSE | + +There are a few other modes (i.e. "complex", "raw" etc.) but these +are the three we will work with in this lesson. + +Data types are familiar in many programming languages, but also in natural +language where we refer to them as the parts of speech, e.g. nouns, verbs, +adverbs, etc. Once you know if a word - perhaps an unfamiliar one - is a noun, +you can probably guess you can count it and make it plural if there is more than +one (e.g. 1 [Tuatara](https://en.wikipedia.org/wiki/Tuatara), or 2 Tuataras). If +something is a adjective, you can usually change it into an adverb by adding +"-ly" (e.g. [jejune](https://www.merriam-webster.com/dictionary/jejune) vs. +jejunely). Depending on the context, you may need to decide if a word is in one +category or another (e.g "cut" may be a noun when it's on your finger, or a verb +when you are preparing vegetables). These concepts have important analogies when +working with R objects. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Create objects and check their modes + +Create the following objects in R, then use the `mode()` function to verify +their modes. Try to guess what the mode will be before you look at the solution + +1. `chromosome_name <- 'chr02'` +2. `od_600_value <- 0.47` +3. `chr_position <- '1001701'` +4. `spock <- TRUE` +5. `pilot <- Earhart` + +::::::::::::::: solution + +## Solution + + +```{.error} +Error in eval(expr, envir, enclos): object 'Earhart' not found +``` + + +```r +mode(chromosome_name) +``` + +```{.output} +[1] "character" +``` + +```r +mode(od_600_value) +``` + +```{.output} +[1] "numeric" +``` + +```r +mode(chr_position) +``` + +```{.output} +[1] "character" +``` + +```r +mode(spock) +``` + +```{.output} +[1] "logical" +``` + +```r +mode(pilot) +``` + +```{.error} +Error in eval(expr, envir, enclos): object 'pilot' not found +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +Notice from the solution that even if a series of numbers is given as a value +R will consider them to be in the "character" mode if they are enclosed as +single or double quotes. Also, notice that you cannot take a string of alphanumeric +characters (e.g. Earhart) and assign as a value for an object. In this case, +R looks for an object named `Earhart` but since there is no object, no assignment can +be made. If `Earhart` did exist, then the mode of `pilot` would be whatever +the mode of `Earhart` was originally. If we want to create an object +called `pilot` that was the **name** "Earhart", we need to enclose +`Earhart` in quotation marks. + + +```r +pilot <- "Earhart" +mode(pilot) +``` + +```{.output} +[1] "character" +``` + +## Mathematical and functional operations on objects + +Once an object exists (which by definition also means it has a mode), R can +appropriately manipulate that object. For example, objects of the numeric modes +can be added, multiplied, divided, etc. R provides several mathematical +(arithmetic) operators including: + +| Operator | Description | +| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| \+ | addition | +| \- | subtraction | +| \* | multiplication | +| / | division | +| ^ or \*\* | exponentiation | +| a%/%b | integer division (division where the remainder is discarded) | +| a%%b | modulus (returns the remainder after division) | + +These can be used with literal numbers: + + +```r +(1 + (5 ** 0.5))/2 +``` + +```{.output} +[1] 1.618034 +``` + +and importantly, can be used on any object that evaluates to (i.e. interpreted +by R) a numeric object: + + + + +```r +# multiply the object 'human_chr_number' by 2 + +human_chr_number * 2 +``` + +```{.output} +[1] 46 +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Compute the golden ratio + +One approximation of the golden ratio (φ) can be found by taking the sum of 1 +and the square root of 5, and dividing by 2 as in the example above. Compute +the golden ratio to 3 digits of precision using the `sqrt()` and `round()` +functions. Hint: remember the `round()` function can take 2 arguments. + +::::::::::::::: solution + +## Solution + + +```r +round((1 + sqrt(5))/2, digits = 3) +``` + +```{.output} +[1] 1.618 +``` + +Notice that you can place one function inside of another. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Vectors + +Vectors are probably the +most used commonly used object type in R. +**A vector is a collection of values that are all of the same type (numbers, characters, etc.)**. +One of the most common +ways to create a vector is to use the `c()` function - the "concatenate" or +"combine" function. Inside the function you may enter one or more values; for +multiple values, separate each value with a comma: + + +```r +# Create the SNP gene name vector + +snp_genes <- c("OXTR", "ACTN3", "AR", "OPRM1") +``` + +Vectors always have a **mode** and a **length**. +You can check these with the `mode()` and `length()` functions respectively. +Another useful function that gives both of these pieces of information is the +`str()` (structure) function. + + +```r +# Check the mode, length, and structure of 'snp_genes' +mode(snp_genes) +``` + +```{.output} +[1] "character" +``` + +```r +length(snp_genes) +``` + +```{.output} +[1] 4 +``` + +```r +str(snp_genes) +``` + +```{.output} + chr [1:4] "OXTR" "ACTN3" "AR" "OPRM1" +``` + +Vectors are quite important in R. Another data type that we will +work with later in this lesson, data frames, are collections of +vectors. What we learn here about vectors will pay off even more +when we start working with data frames. + +## Creating and subsetting vectors + +Let's create a few more vectors to play around with: + + +```r +# Some interesting human SNPs +# while accuracy is important, typos in the data won't hurt you here + +snps <- c("rs53576", "rs1815739", "rs6152", "rs1799971") +snp_chromosomes <- c("3", "11", "X", "6") +snp_positions <- c(8762685, 66560624, 67545785, 154039662) +``` + +Once we have vectors, one thing we may want to do is specifically retrieve one +or more values from our vector. To do so, we use **bracket notation**. We type +the name of the vector followed by square brackets. In those square brackets +we place the index (e.g. a number) in that bracket as follows: + + +```r +# get the 3rd value in the snp vector +snps[3] +``` + +```{.output} +[1] "rs6152" +``` + +In R, every item your vector is indexed, starting from the first item (1) +through to the final number of items in your vector. You can also retrieve a +range of numbers: + + +```r +# get the 1st through 3rd value in the snp vector + +snps[1:3] +``` + +```{.output} +[1] "rs53576" "rs1815739" "rs6152" +``` + +If you want to retrieve several (but not necessarily sequential) items from +a vector, you pass a **vector of indices**; a vector that has the numbered +positions you wish to retrieve. + + +```r +# get the 1st, 3rd, and 4th value in the snp vector + +snps[c(1, 3, 4)] +``` + +```{.output} +[1] "rs53576" "rs6152" "rs1799971" +``` + +There are additional (and perhaps less commonly used) ways of subsetting a +vector (see [these +examples](https://thomasleeper.com/Rcourse/Tutorials/vectorindexing.html)). +Also, several of these subsetting expressions can be combined: + + +```r +# get the 1st through the 3rd value, and 4th value in the snp vector +# yes, this is a little silly in a vector of only 4 values. +snps[c(1:3,4)] +``` + +```{.output} +[1] "rs53576" "rs1815739" "rs6152" "rs1799971" +``` + +## Adding to, removing, or replacing values in existing vectors + +Once you have an existing vector, you may want to add a new item to it. To do +so, you can use the `c()` function again to add your new value: + + +```r +# add the gene "CYP1A1" and "APOA5" to our list of snp genes +# this overwrites our existing vector +snp_genes <- c(snp_genes, "CYP1A1", "APOA5") +``` + +We can verify that "snp\_genes" contains the new gene entry + + +```r +snp_genes +``` + +```{.output} +[1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" "APOA5" +``` + +Using a negative index will return a version of a vector with that index's +value removed: + + +```r +snp_genes[-6] +``` + +```{.output} +[1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" +``` + +We can remove that value from our vector by overwriting it with this expression: + + +```r +snp_genes <- snp_genes[-6] +snp_genes +``` + +```{.output} +[1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" +``` + +We can also explicitly rename or add a value to our index using double bracket notation: + + +```r +snp_genes[6]<- "APOA5" +snp_genes +``` + +```{.output} +[1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" "APOA5" +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Examining and subsetting vectors + +Answer the following questions to test your knowledge of vectors + +Which of the following are true of vectors in R? +A) All vectors have a mode **or** a length +B) All vectors have a mode **and** a length +C) Vectors may have different lengths +D) Items within a vector may be of different modes +E) You can use the `c()` to add one or more items to an existing vector +F) You can use the `c()` to add a vector to an existing vector + +::::::::::::::: solution + +## Solution + +A) False - Vectors have both of these properties +B) True +C) True +D) False - Vectors have only one mode (e.g. numeric, character); all items in +a vector must be of this mode. +E) True +F) True + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Logical Subsetting + +There is one last set of cool subsetting capabilities we want to introduce. It is possible within R to retrieve items in a vector based on a logical evaluation or numerical comparison. For example, let's say we wanted get all of the SNPs in our vector of SNP positions that were greater than 100,000,000. We could index using the '>' (greater than) logical operator: + + +```r +snp_positions[snp_positions > 100000000] +``` + +```{.output} +[1] 154039662 +``` + +In the square brackets you place the name of the vector followed by the comparison operator and (in this case) a numeric value. Some of the most common logical operators you will use in R are: + +| Operator | Description | +| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| \< | less than | +| \<= | less than or equal to | +| \> | greater than | +| \>= | greater than or equal to | +| \== | exactly equal to | +| != | not equal to | +| !x | not x | +| a \| b | a or b | +| a \& b | a and b | + +::::::::::::::::::::::::::::::::::::::::: callout + +## The magic of programming + +The reason why the expression `snp_positions[snp_positions > 100000000]` works +can be better understood if you examine what the expression "snp\_positions > 100000000" +evaluates to: + + +```r +snp_positions > 100000000 +``` + +```{.output} +[1] FALSE FALSE FALSE TRUE +``` + +The output above is a logical vector, the 4th element of which is TRUE. When +you pass a logical vector as an index, R will return the true values: + + +```r +snp_positions[c(FALSE, FALSE, FALSE, TRUE)] +``` + +```{.output} +[1] 154039662 +``` + +If you have never coded before, this type of situation starts to expose the +"magic" of programming. We mentioned before that in the bracket notation you +take your named vector followed by brackets which contain an index: +**named\_vector[index]**. The "magic" is that the index needs to *evaluate to* +a number. So, even if it does not appear to be an integer (e.g. 1, 2, 3), as +long as R can evaluate it, we will get a result. That our expression +`snp_positions[snp_positions > 100000000]` evaluates to a number can be seen +in the following situation. If you wanted to know which **index** (1, 2, 3, or +4\) in our vector of SNP positions was the one that was greater than +100,000,000? + +We can use the `which()` function to return the indices of any item that +evaluates as TRUE in our comparison: + + +```r +which(snp_positions > 100000000) +``` + +```{.output} +[1] 4 +``` + +**Why this is important** + +Often in programming we will not know what inputs +and values will be used when our code is executed. Rather than put in a +pre-determined value (e.g 100000000) we can use an object that can take on +whatever value we need. So for example: + + +```r +snp_marker_cutoff <- 100000000 +snp_positions[snp_positions > snp_marker_cutoff] +``` + +```{.output} +[1] 154039662 +``` + +Ultimately, it's putting together flexible, reusable code like this that gets +at the "magic" of programming! + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## A few final vector tricks + +Finally, there are a few other common retrieve or replace operations you may +want to know about. First, you can check to see if any of the values of your +vector are missing (i.e. are `NA`, that stands for `not avaliable`). Missing data will get a more detailed treatment later, +but the `is.NA()` function will return a logical vector, with TRUE for any NA +value: + + +```r +# current value of 'snp_genes': +# chr [1:7] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" NA "APOA5" + +is.na(snp_genes) +``` + +```{.output} +[1] FALSE FALSE FALSE FALSE FALSE FALSE +``` + +Sometimes, you may wish to find out if a specific value (or several values) is +present a vector. You can do this using the comparison operator `%in%`, which +will return TRUE for any value in your collection that is in +the vector you are searching: + + +```r +# current value of 'snp_genes': +# chr [1:7] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" NA "APOA5" + +# test to see if "ACTN3" or "APO5A" is in the snp_genes vector +# if you are looking for more than one value, you must pass this as a vector + +c("ACTN3","APOA5") %in% snp_genes +``` + +```{.output} +[1] TRUE TRUE +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Review Exercise 1 + +What data modes are the following vectors? +a. `snps` +b. `snp_chromosomes` +c. `snp_positions` + +::::::::::::::: solution + +## Solution + + +```r +mode(snps) +``` + +```{.output} +[1] "character" +``` + +```r +mode(snp_chromosomes) +``` + +```{.output} +[1] "character" +``` + +```r +mode(snp_positions) +``` + +```{.output} +[1] "numeric" +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Review Exercise 2 + +Add the following values to the specified vectors: +a. To the `snps` vector add: "rs662799" +b. To the `snp_chromosomes` vector add: 11 +c. To the `snp_positions` vector add: 116792991 + +::::::::::::::: solution + +## Solution + + +```r +snps <- c(snps, "rs662799") +snps +``` + +```{.output} +[1] "rs53576" "rs1815739" "rs6152" "rs1799971" "rs662799" +``` + +```r +snp_chromosomes <- c(snp_chromosomes, "11") # did you use quotes? +snp_chromosomes +``` + +```{.output} +[1] "3" "11" "X" "6" "11" +``` + +```r +snp_positions <- c(snp_positions, 116792991) +snp_positions +``` + +```{.output} +[1] 8762685 66560624 67545785 154039662 116792991 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Review Exercise 3 + +Make the following change to the `snp_genes` vector: + +Hint: Your vector should look like this in 'Environment': +`chr [1:7] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" NA "APOA5"`. If not +recreate the vector by running this expression: +`snp_genes <- c("OXTR", "ACTN3", "AR", "OPRM1", "CYP1A1", NA, "APOA5")` + +a. Create a new version of `snp_genes` that does not contain CYP1A1 and then +b. Add 2 NA values to the end of `snp_genes` + +::::::::::::::: solution + +## Solution + + +```r +snp_genes <- snp_genes[-5] +snp_genes <- c(snp_genes, NA, NA) +snp_genes +``` + +```{.output} +[1] "OXTR" "ACTN3" "AR" "OPRM1" "APOA5" NA NA +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Review Exercise 4 + +Using indexing, create a new vector named `combined` that contains: + +- The the 1st value in `snp_genes` +- The 1st value in `snps` +- The 1st value in `snp_chromosomes` +- The 1st value in `snp_positions` + +::::::::::::::: solution + +## Solution + + +```r +combined <- c(snp_genes[1], snps[1], snp_chromosomes[1], snp_positions[1]) +combined +``` + +```{.output} +[1] "OXTR" "rs53576" "3" "8762685" +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Review Exercise 5 + +What type of data is `combined`? + +::::::::::::::: solution + +## Solution + + +```r +typeof(combined) +``` + +```{.output} +[1] "character" +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Lists + +Lists are quite useful in R, but we won't be using them in the genomics lessons. +That said, you may come across lists in the way that some bioinformatics +programs may store and/or return data to you. One of the key attributes of a +list is that, unlike a vector, a list may contain data of more than one mode. +Learn more about creating and using lists using this [nice +tutorial](https://r4ds.had.co.nz/vectors.html#lists). In this one example, we will create +a named list and show you how to retrieve items from the list. + + +```r +# Create a named list using the 'list' function and our SNP examples +# Note, for easy reading we have placed each item in the list on a separate line +# Nothing special about this, you can do this for any multiline commands +# To run this command, make sure the entire command (all 4 lines) are highlighted +# before running +# Note also, as we are doing all this inside the list() function use of the +# '=' sign is good style +snp_data <- list(genes = snp_genes, + refference_snp = snps, + chromosome = snp_chromosomes, + position = snp_positions) +# Examine the structure of the list +str(snp_data) +``` + +```{.output} +List of 4 + $ genes : chr [1:7] "OXTR" "ACTN3" "AR" "OPRM1" ... + $ refference_snp: chr [1:5] "rs53576" "rs1815739" "rs6152" "rs1799971" ... + $ chromosome : chr [1:5] "3" "11" "X" "6" ... + $ position : num [1:5] 8.76e+06 6.66e+07 6.75e+07 1.54e+08 1.17e+08 +``` + +To get all the values for the `position` object in the list, we use the `$` notation: + + +```r +# return all the values of position object + +snp_data$position +``` + +```{.output} +[1] 8762685 66560624 67545785 154039662 116792991 +``` + +To get the first value in the `position` object, use the `[]` notation to index: + + +```r +# return first value of the position object + +snp_data$position[1] +``` + +```{.output} +[1] 8762685 +``` +::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Effectively using R is a journey of months or years. Still you don't have to be an expert to use R and you can start using and analyzing your data with with about a day's worth of training +- It is important to understand how data are organized by R in a given object type and how the mode of that type (e.g. numeric, character, logical, etc.) will determine how R will operate on that data. +- Working with vectors effectively prepares you for understanding how data are organized in R. + +:::::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/02-data-prelude.md b/02-data-prelude.md new file mode 100644 index 000000000..6f4abb79d --- /dev/null +++ b/02-data-prelude.md @@ -0,0 +1,173 @@ +--- +title: Introduction to the example dataset and file type +teaching: 15 +exercises: 0 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Know what the example dataset represents +- Know the concepts of how VCF files are generated + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- What data are we using in the lesson? +- What are VCF files? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + +## Preface + +The Intro to R and RStudio for Genomics is a part of the Genomics Data Carpentry lessons. In this lesson we will learn the necessary skill sets for R and RStudio and apply them directly to a real next-generation sequencing (NGS) data in the variant calling format (VCF) file type. Previous Genomics Data Carpentry lessons teach learners how to generate a VCF file from FASTQ files downloaded from NCBI Sequence Read Archive (SRA), so we won't cover that here. Instead, in this episode we will give a brief overview of the data and a what VCF file types are for those who wish to teach the Intro to R and RStudio for Genomics lesson independently of the Genomics Data Carpentry lessons. + +This dataset was selected for several reasons, including: + +- Simple, but iconic NGS-problem: Examine a population where we want to characterize changes in sequence *a priori* +- Dataset publicly available - in this case through the NCBI SRA ([http://www.ncbi.nlm.nih.gov/sra](https://www.ncbi.nlm.nih.gov/sra)) + +## Introduction to the dataset + +Microbes are ideal organisms for exploring 'Long-term Evolution Experiments' (LTEEs) - thousands of generations can be generated and stored in a way that would be virtually impossible for more complex eukaryotic systems. In [Tenaillon et al 2016](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988878/), 12 populations of *Escherichia coli* were propagated for more than 50,000 generations in a glucose-limited minimal medium. This medium was supplemented with citrate which *E. coli* cannot metabolize in the aerobic conditions of the experiment. Sequencing of the populations at regular time points reveals that spontaneous citrate-using mutants (Cit+) appeared in a population of *E.coli* (designated Ara-3) at around 31,000 generations. It should be noted that spontaneous Cit+ mutants are extraordinarily rare - inability to metabolize citrate is one of the defining characters of the *E. coli* species. Eventually, Cit+ mutants became the dominant population as the experimental growth medium contained a high concentration of citrate relative to glucose. Around the same time that this mutation emerged, another phenotype become prominent in the Ara-3 population. Many *E. coli* began to develop excessive numbers of mutations, meaning they became hypermutable. + +Strains from generation 0 to generation 50,000 were sequenced, including ones that were both Cit+ and Cit- and hypermutable in later generations. + +For the purposes of this workshop we're going to be working with 3 of the sequence reads from this experiment. + +| SRA Run Number | Clone | Generation | Cit | Hypermutable | Read Length | Sequencing Depth | +| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ------- | ------------ | ----------- | ---------------- | +| SRR2589044 | REL2181A | 5,000 | Unknown | None | 150 | 60\.2 | +| SRR2584863 | REL7179B | 15,000 | Unknown | None | 150 | 88 | +| SRR2584866 | REL11365 | 50,000 | Cit+ | plus | 150 | 138\.3 | + +We want to be able to look at differences in mutation rates between hypermutable and non-hypermutable strains. We also want to analyze the sequences to figure out what changes occurred in genomes to make the strains Cit+. Ultimately, we will use R to answer the questions: + +- How many base pair changes are there between the Cit+ and Cit- strains? +- What are the base pair changes between strains? + +## How VCF files are generated + +Publicly accessible sequencing files in FASTQ formats can be downloaded from NCBI SRA. However, at FASTQ files contain unaligned sequences of varying quality, and requires clean up and alignment steps for variants to be called from the reference genome. + +Five steps are taken to transform FASTQ files to variant calls contained in VCF files and at each step, specialized non-R based bioinformatics tools that are used: + +
+variant calling workflow. Sequence reads (FASTQ files), Quality control (FASTQ files), Alignment to Genome (SAM/BAM files), Alignment cleanup (BAM file ready for variant calling), Variant Calling (VCF file) +
+ +## How variant calls are stored in VCF files + +VCF files contain variants that were called against a reference genome. These files are slightly more complicated than regular tables you can open using programs like Excel and contain two sections: header and records. + +Below you will see the header (which describes the format), the time and date the file was created, the version of bcftools that was used, the command line parameters used, and some additional information: + +``` +##fileformat=VCFv4.2 +##FILTER= +##bcftoolsVersion=1.8+htslib-1.8 +##bcftoolsCommand=mpileup -O b -o results/bcf/SRR2584866_raw.bcf -f data/ref_genome/ecoli_rel606.fasta results/bam/SRR2584866.aligned.sorted.bam +##reference=file://data/ref_genome/ecoli_rel606.fasta +##contig= +##ALT= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##FORMAT= +##FORMAT= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##INFO= +##bcftools_callVersion=1.8+htslib-1.8 +##bcftools_callCommand=call --ploidy 1 -m -v -o results/bcf/SRR2584866_variants.vcf results/bcf/SRR2584866_raw.bcf; Date=Tue Oct 9 18:48:10 2018 +``` + +Followed by information on each of the variations observed: + +``` +#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT results/bam/SRR2584866.aligned.sorted.bam +CP000819.1 1521 . C T 207 . DP=9;VDB=0.993024;SGB=-0.662043;MQSB=0.974597;MQ0F=0;AC=1;AN=1;DP4=0,0,4,5;MQ=60 +CP000819.1 1612 . A G 225 . DP=13;VDB=0.52194;SGB=-0.676189;MQSB=0.950952;MQ0F=0;AC=1;AN=1;DP4=0,0,6,5;MQ=60 +CP000819.1 9092 . A G 225 . DP=14;VDB=0.717543;SGB=-0.670168;MQSB=0.916482;MQ0F=0;AC=1;AN=1;DP4=0,0,7,3;MQ=60 +CP000819.1 9972 . T G 214 . DP=10;VDB=0.022095;SGB=-0.670168;MQSB=1;MQ0F=0;AC=1;AN=1;DP4=0,0,2,8;MQ=60 GT:PL +CP000819.1 10563 . G A 225 . DP=11;VDB=0.958658;SGB=-0.670168;MQSB=0.952347;MQ0F=0;AC=1;AN=1;DP4=0,0,5,5;MQ=60 +CP000819.1 22257 . C T 127 . DP=5;VDB=0.0765947;SGB=-0.590765;MQSB=1;MQ0F=0;AC=1;AN=1;DP4=0,0,2,3;MQ=60 GT:PL +CP000819.1 38971 . A G 225 . DP=14;VDB=0.872139;SGB=-0.680642;MQSB=1;MQ0F=0;AC=1;AN=1;DP4=0,0,4,8;MQ=60 GT:PL +CP000819.1 42306 . A G 225 . DP=15;VDB=0.969686;SGB=-0.686358;MQSB=1;MQ0F=0;AC=1;AN=1;DP4=0,0,5,9;MQ=60 GT:PL +CP000819.1 45277 . A G 225 . DP=15;VDB=0.470998;SGB=-0.680642;MQSB=0.95494;MQ0F=0;AC=1;AN=1;DP4=0,0,7,5;MQ=60 +CP000819.1 56613 . C G 183 . DP=12;VDB=0.879703;SGB=-0.676189;MQSB=1;MQ0F=0;AC=1;AN=1;DP4=0,0,8,3;MQ=60 GT:PL +CP000819.1 62118 . A G 225 . DP=19;VDB=0.414981;SGB=-0.691153;MQSB=0.906029;MQ0F=0;AC=1;AN=1;DP4=0,0,8,10;MQ=59 +CP000819.1 64042 . G A 225 . DP=18;VDB=0.451328;SGB=-0.689466;MQSB=1;MQ0F=0;AC=1;AN=1;DP4=0,0,7,9;MQ=60 GT:PL +``` + +The first few columns represent the information we have about a predicted variation. + +| column | info | +| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| CHROM | contig location where the variation occurs | +| POS | position within the contig where the variation occurs | +| ID | a `.` until we add annotation information | +| REF | reference genotype (forward strand) | +| ALT | sample genotype (forward strand) | +| QUAL | Phred-scaled probability that the observed variant exists at this site (higher is better) | +| FILTER | a `.` if no quality filters have been applied, PASS if a filter is passed, or the name of the filters this variant failed | + +In an ideal world, the information in the `QUAL` column would be all we needed to filter out bad variant calls. +However, in reality we need to filter on multiple other metrics. + +The last two columns contain the genotypes and can be tricky to decode. + +| column | info | +| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| FORMAT | lists in order the metrics presented in the final column | +| results | lists the values associated with those metrics in order | + +For our file, the metrics presented are GT:PL:GQ. + +| metric | definition | +| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| AD, DP | the depth per allele by sample and coverage | +| GT | the genotype for the sample at this loci. For a diploid organism, the GT field indicates the two alleles carried by the sample, encoded by a 0 for the REF allele, 1 for the first ALT allele, 2 for the second ALT allele, etc. A 0/0 means homozygous reference, 0/1 is heterozygous, and 1/1 is homozygous for the alternate allele. | +| PL | the likelihoods of the given genotypes | +| GQ | the Phred-scaled confidence for the genotype | + +For more information on VCF files visit The Broad Institute's [VCF guide](https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format). + +## References + +Tenaillon O, Barrick JE, Ribeck N, Deatherage DE, Blanchard JL, Dasgupta A, Wu GC, Wielgoss S, Cruveiller S, Médigue C, Schneider D, Lenski RE. +Tempo and mode of genome +evolution in a 50,000-generation experiment (2016) Nature. 536(7615): 165–170. +[Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988878/), [Supplemental materials](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988878/#) +Data on NCBI SRA: [https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP064605](https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP064605) +Data on EMBL-EBI ENA: [https://www.ebi.ac.uk/ena/data/view/PRJNA295606](https://www.ebi.ac.uk/ena/data/view/PRJNA295606) + +This episode was adapted from the Data Carpentry Genomic lessons: + +- [Project organization and management for Genomics](https://datacarpentry.org/organization-genomics/data/) +- [Data wrangling and processing for genomics](https://datacarpentry.org/wrangling-genomics/04-variant_calling/index.html) + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- The dataset comes from a real world experiment in E. coli. +- Publicly available FASTQ files can be downloaded from NCBI SRA. +- Several steps are taken outside of R/RStudio to create VCF files from FASTQ files. +- VCF files store variant calls in a special format. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/03-basics-factors-dataframes.md b/03-basics-factors-dataframes.md new file mode 100644 index 000000000..3b47fbbf5 --- /dev/null +++ b/03-basics-factors-dataframes.md @@ -0,0 +1,1400 @@ +--- +title: R Basics continued - factors and data frames +teaching: 60 +exercises: 30 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Explain the basic principle of tidy datasets +- Be able to load a tabular dataset using base R functions +- Be able to determine the structure of a data frame including its dimensions and the datatypes of variables +- Be able to subset/retrieve values from a data frame +- Understand how R may coerce data into different modes +- Be able to change the mode of an object +- Understand that R uses factors to store and manipulate categorical data +- Be able to manipulate a factor, including subsetting and reordering +- Be able to apply an arithmetic function to a data frame +- Be able to coerce the class of an object (including variables in a data frame) +- Be able to import data from Excel +- Be able to save a data frame as a delimited file + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How do I get started with tabular data (e.g. spreadsheets) in R? +- What are some best practices for reading data into R? +- How do I save tabular data generated in R? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + +## Working with spreadsheets (tabular data) + +A substantial amount of the data we work with in genomics will be tabular data, +this is data arranged in rows and columns - also known as spreadsheets. We could +write a whole lesson on how to work with spreadsheets effectively ([actually we did](https://datacarpentry.org/organization-genomics/)). For our +purposes, we want to remind you of a few principles before we work with our +first set of example data: + +**1\) Keep raw data separate from analyzed data** + +This is principle number one because if you can't tell which files are the +original raw data, you risk making some serious mistakes (e.g. drawing conclusion +from data which have been manipulated in some unknown way). + +**2\) Keep spreadsheet data Tidy** + +The simplest principle of **Tidy data** is that we have one row in our +spreadsheet for each observation or sample, and one column for every variable +that we measure or report on. As simple as this sounds, it's very easily +violated. Most data scientists agree that significant amounts of their time is +spent tidying data for analysis. Read more about data organization in +[our lesson](https://datacarpentry.org/organization-genomics/) and +in [this paper](https://www.jstatsoft.org/article/view/v059i10). + +**3\) Trust but verify** + +Finally, while you don't need to be paranoid about data, you should have a plan +for how you will prepare it for analysis. **This a focus of this lesson.** +You probably already have a lot of intuition, expectations, assumptions about +your data - the range of values you expect, how many values should have +been recorded, etc. Of course, as the data get larger our human ability to +keep track will start to fail (and yes, it can fail for small data sets too). +R will help you to examine your data so that you can have greater confidence +in your analysis, and its reproducibility. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Keep your raw data separate + +When you work with data in R, you are not changing the original file you +loaded that data from. This is different than (for example) working with +a spreadsheet program where changing the value of the cell leaves you one +"save"-click away from overwriting the original file. You have to purposely +use a writing function (e.g. `write.csv()`) to save data loaded into R. In +that case, be sure to save the manipulated data into a new file. More on this +later in the lesson. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Importing tabular data into R + +There are several ways to import data into R. For our purpose here, we will +focus on using the tools every R installation comes with (so called "base" R) to +import a comma-delimited file containing the results of our variant calling workflow. +We will need to load the sheet using a function called `read.csv()`. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Review the arguments of the `read.csv()` function + +**Before using the `read.csv()` function, use R's help feature to answer the +following questions**. + +*Hint*: Entering '?' before the function name and then running that line will +bring up the help documentation. Also, when reading this particular help +be careful to pay attention to the 'read.csv' expression under the 'Usage' +heading. Other answers will be in the 'Arguments' heading. + +A) What is the default parameter for 'header' in the `read.csv()` function? + +B) What argument would you have to change to read a file that was delimited +by semicolons (;) rather than commas? + +C) What argument would you have to change to read file in which numbers +used commas for decimal separation (i.e. 1,00)? + +D) What argument would you have to change to read in only the first 10,000 rows +of a very large file? + +::::::::::::::: solution + +## Solution + +A) The `read.csv()` function has the argument 'header' set to TRUE by default, +this means the function always assumes the first row is header information, +(i.e. column names) + +B) The `read.csv()` function has the argument 'sep' set to ",". This means +the function assumes commas are used as delimiters, as you would expect. +Changing this parameter (e.g. `sep=";"`) would now interpret semicolons as +delimiters. + +C) Although it is not listed in the `read.csv()` usage, `read.csv()` is +a "version" of the function `read.table()` and accepts all its arguments. +If you set `dec=","` you could change the decimal operator. We'd probably +assume the delimiter is some other character. + +D) You can set `nrow` to a numeric value (e.g. `nrow=10000`) to choose how +many rows of a file you read in. This may be useful for very large files +where not all the data is needed to test some data cleaning steps you are +applying. + +Hopefully, this exercise gets you thinking about using the provided help +documentation in R. There are many arguments that exist, but which we wont +have time to cover. Look here to get familiar with functions you use +frequently, you may be surprised at what you find they can do. + + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Why does ?read.csv open the documentations to read.table? + +The reason for this is because `read.csv` is actually a short cut +for `read.table("file.csv", sep = ",")`. You can see in the help +documentation that there are several additional variations of +`read.table`, such as `read.csv2` to read tables separated by `;` +and `read.delim` to read in tables separated by `\t` (tabs). If you know how your table is separated, you can use one of the provided short cuts, +but case you run into an unconventional separator you are now equipt with the knowledge to define it in the `sep = ` arugument of `read.table`! + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +Now, let's read in the file `combined_tidy_vcf.csv` which will be located in +`/home/dcuser/r_data/`. Call this data `variants`. The +first argument to pass to our `read.csv()` function is the file path for our +data. The file path must be in quotes and now is a good time to remember to +use tab autocompletion. **If you use tab autocompletion you avoid typos and +errors in file paths.** Use it! + + +```r +## read in a CSV file and save it as 'variants' + +variants <- read.csv("/home/dcuser/r_data/combined_tidy_vcf.csv") +``` + + + +One of the first things you should notice is that in the Environment window, +you have the `variants` object, listed as 801 obs. (observations/rows) +of 29 variables (columns). Double-clicking on the name of the object will open +a view of the data in a new tab. + +rstudio data frame view + +## Summarizing, subsetting, and determining the structure of a data frame. + +A **data frame is the standard way in R to store tabular data**. A data fame +could also be thought of as a collection of vectors, all of which have the same +length. Using only two functions, we can learn a lot about out data frame +including some summary statistics as well as well as the "structure" of the data +frame. Let's examine what each of these functions can tell us: + + +```r +## get summary statistics on a data frame + +summary(variants) +``` + +```{.output} + sample_id CHROM POS ID + Length:801 Length:801 Min. : 1521 Mode:logical + Class :character Class :character 1st Qu.:1115970 NA's:801 + Mode :character Mode :character Median :2290361 + Mean :2243682 + 3rd Qu.:3317082 + Max. :4629225 + + REF ALT QUAL FILTER + Length:801 Length:801 Min. : 4.385 Mode:logical + Class :character Class :character 1st Qu.:139.000 NA's:801 + Mode :character Mode :character Median :195.000 + Mean :172.276 + 3rd Qu.:225.000 + Max. :228.000 + + INDEL IDV IMF DP + Mode :logical Min. : 2.000 Min. :0.5714 Min. : 2.00 + FALSE:700 1st Qu.: 7.000 1st Qu.:0.8824 1st Qu.: 7.00 + TRUE :101 Median : 9.000 Median :1.0000 Median :10.00 + Mean : 9.396 Mean :0.9219 Mean :10.57 + 3rd Qu.:11.000 3rd Qu.:1.0000 3rd Qu.:13.00 + Max. :20.000 Max. :1.0000 Max. :79.00 + NA's :700 NA's :700 + VDB RPB MQB BQB + Min. :0.0005387 Min. :0.0000 Min. :0.0000 Min. :0.1153 + 1st Qu.:0.2180410 1st Qu.:0.3776 1st Qu.:0.1070 1st Qu.:0.6963 + Median :0.4827410 Median :0.8663 Median :0.2872 Median :0.8615 + Mean :0.4926291 Mean :0.6970 Mean :0.5330 Mean :0.7784 + 3rd Qu.:0.7598940 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 + Max. :0.9997130 Max. :1.0000 Max. :1.0000 Max. :1.0000 + NA's :773 NA's :773 NA's :773 + MQSB SGB MQ0F ICB + Min. :0.01348 Min. :-0.6931 Min. :0.00000 Mode:logical + 1st Qu.:0.95494 1st Qu.:-0.6762 1st Qu.:0.00000 NA's:801 + Median :1.00000 Median :-0.6620 Median :0.00000 + Mean :0.96428 Mean :-0.6444 Mean :0.01127 + 3rd Qu.:1.00000 3rd Qu.:-0.6364 3rd Qu.:0.00000 + Max. :1.01283 Max. :-0.4536 Max. :0.66667 + NA's :48 + HOB AC AN DP4 MQ + Mode:logical Min. :1 Min. :1 Length:801 Min. :10.00 + NA's:801 1st Qu.:1 1st Qu.:1 Class :character 1st Qu.:60.00 + Median :1 Median :1 Mode :character Median :60.00 + Mean :1 Mean :1 Mean :58.19 + 3rd Qu.:1 3rd Qu.:1 3rd Qu.:60.00 + Max. :1 Max. :1 Max. :60.00 + + Indiv gt_PL gt_GT gt_GT_alleles + Length:801 Length:801 Min. :1 Length:801 + Class :character Class :character 1st Qu.:1 Class :character + Mode :character Mode :character Median :1 Mode :character + Mean :1 + 3rd Qu.:1 + Max. :1 + +``` + +Our data frame had 29 variables, so we get 29 fields that summarize the data. +The `QUAL`, `IMF`, and `VDB` variables (and several others) are +numerical data and so you get summary statistics on the min and max values for +these columns, as well as mean, median, and interquartile ranges. Many of the +other variables (e.g. `sample_id`) are treated as characters data (more on this +in a bit). + +There is a lot to work with, so we will subset the first three columns into a +new data frame using the `data.frame()` function. + + +```r +## put the first three columns of variants into a new data frame called subset + +subset<-data.frame(variants[,c(1:3,6)]) +``` + +Now, let's use the `str()` (structure) function to look a little more closely +at how data frames work: + + +```r +## get the structure of a data frame + +str(subset) +``` + +```{.output} +'data.frame': 801 obs. of 4 variables: + $ sample_id: chr "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" ... + $ CHROM : chr "CP000819.1" "CP000819.1" "CP000819.1" "CP000819.1" ... + $ POS : int 9972 263235 281923 433359 473901 648692 1331794 1733343 2103887 2333538 ... + $ ALT : chr "G" "T" "T" "CTTTTTTTT" ... +``` + +Ok, thats a lot up unpack! Some things to notice. + +- the object type `data.frame` is displayed in the first row along with its + dimensions, in this case 801 observations (rows) and 4 variables (columns) +- Each variable (column) has a name (e.g. `sample_id`). This is followed + by the object mode (e.g. chr, int, etc.). Notice that before each + variable name there is a `$` - this will be important later. + +## Introducing Factors + +Factors are the final major data structure we will introduce in our R genomics +lessons. Factors can be thought of as vectors which are specialized for +categorical data. Given R's specialization for statistics, this make sense since +categorial and continuous variables are usually treated differently. Sometimes +you may want to have data treated as a factor, but in other cases, this may be +undesirable. + +Let's see the value of treating some of which are categorical in nature as +factors. Let's take a look at just the alternate alleles + + +```r +## extract the "ALT" column to a new object + +alt_alleles <- subset$ALT +``` + +Let's look at the first few items in our factor using `head()`: + + +```r +head(alt_alleles) +``` + +```{.output} +[1] "G" "T" "T" "CTTTTTTTT" "CCGCGC" "T" +``` + +There are 801 alleles (one for each row). To simplify, lets look at just the +single-nuleotide alleles (SNPs). We can use some of the vector indexing skills +from the last episode. + + +```r +snps <- c(alt_alleles[alt_alleles=="A"], + alt_alleles[alt_alleles=="T"], + alt_alleles[alt_alleles=="G"], + alt_alleles[alt_alleles=="C"]) +``` + +This leaves us with a vector of the 701 alternative alleles which were single +nucleotides. Right now, they are being treated a characters, but we could treat +them as categories of SNP. Doing this will enable some nice features. For +example, we can try to generate a plot of this character vector as it is right +now: + + +```r +plot(snps) +``` + +```{.warning} +Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by coercion +``` + +```{.warning} +Warning in min(x): no non-missing arguments to min; returning Inf +``` + +```{.warning} +Warning in max(x): no non-missing arguments to max; returning -Inf +``` + +```{.error} +Error in plot.window(...): need finite 'ylim' values +``` + +Whoops! Though the `plot()` function will do its best to give us a quick plot, +it is unable to do so here. One way to fix this it to tell R to treat the SNPs +as categories (i.e. a factor vector); we will create a new object to avoid +confusion using the `factor()` function: + + +```r +factor_snps <- factor(snps) +``` + +Let's learn a little more about this new type of vector: + + +```r +str(factor_snps) +``` + +```{.output} + Factor w/ 4 levels "A","C","G","T": 1 1 1 1 1 1 1 1 1 1 ... +``` + +What we get back are the categories ("A","C","G","T") in our factor; these are +called "Levels". **Levels are the different categories contained in a factor**. +By default, R will organize the levels in a factor in alphabetical order. So the +first level in this factor is "A". + +For the sake of efficiency, R stores the content of a factor as a vector of +integers, which an integer is assigned to each of the possible levels. Recall +levels are assigned in alphabetical order. In this case, the first item in our +`factor_snps` object is "A", which happens to be the 1st level of our factor, +ordered alphabetically. This explains the sequence of "1"s ("Factor w/ 4 levels +"A","C","G","T": 1 1 1 1 1 1 1 1 1 1 ..."), since "A" is the first level, and +the first few items in our factor are all "A"s. + +We can see how many items in our vector fall into each category: + + +```r +summary(factor_snps) +``` + +```{.output} + A C G T +211 139 154 203 +``` + +As you can imagine, this is already useful when you want to generate a tally. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: treating objects as categories without changing their mode + +You don't have to make an object a factor to get the benefits of treating +an object as a factor. See what happens when you use the `as.factor()` +function on `factor_snps`. To generate a tally, you can sometimes also use +the `table()` function; though sometimes you may need to combine both (i.e. +`table(as.factor(object))`) + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Plotting and ordering factors + +One of the most common uses for factors will be when you plot categorical +values. For example, suppose we want to know how many of our variants had each +possible SNP we could generate a plot: + + +```r +plot(factor_snps) +``` + + + +This isn't a particularly pretty example of a plot but it works. We'll be +learning much more about creating nice, publication-quality graphics later in +this lesson. + +If you recall, factors are ordered alphabetically. That might make sense, but +categories (e.g., "red", "blue", "green") often do not have an intrinsic order. +What if we wanted to order our plot according to the numerical value (i.e., +in descending order of SNP frequency)? We can enforce an order on our factors: + + +```r +ordered_factor_snps <- factor(factor_snps, levels = names(sort(table(factor_snps)))) +``` + +Let's deconstruct this from the inside out (you can try each of these commands +to see why this works): + +1. We create a table of `factor_snps` to get the frequency of each SNP: + `table(factor_snps)` +2. We sort this table: `sort(table(factor_snps))`; use the `decreasing =` + parameter for this function if you wanted to change from the default of + FALSE +3. Using the `names` function gives us just the character names of the table + sorted by frequencies:`names(sort(table(factor_snps)))` +4. The `factor` function is what allows us to create a factor. We give it the + `factor_snps` object as input, and use the `levels=` parameter to enforce the + ordering of the levels. + +Now we see our plot has be reordered: + + +```r +plot(ordered_factor_snps) +``` + + + +Factors come in handy in many places when using R. Even using more +sophisticated plotting packages such as ggplot2 will sometimes require you +to understand how to manipulate factors. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Packages in R -- what are they and why do we use them? + +Packages are simply collections of functions and/or data that can be used to extend the +capabilities of R beyond the core functionality that comes with it by default. There are +useful R packages available that span all types of statistical analysis, data visualization, +and more. The main place that R packages are installed from is a website called +[CRAN](https://cran.r-project.org/) (the Comprehensive R Archive Network). Many thousands +of R packages are available there, and when you use the built-in R function `install.packages()`, +it will look for a CRAN repository to install from. So, for example, to install +[tidyverse](https://www.tidyverse.org) packages such as `dplyr` and `ggplot2` +(which you'll do in the next few lessons), you would use the following command: + + +```r +# install a package from CRAN +install.packages("ggplot2") +``` + +```{.output} +The following package(s) will be installed: +- ggplot2 [3.4.4] +These packages will be installed into "~/work/genomics-r-intro/genomics-r-intro/renv/profiles/lesson-requirements/renv/library/R-4.3/x86_64-pc-linux-gnu". + +# Installing packages -------------------------------------------------------- +- Installing ggplot2 ... OK [linked from cache] +Successfully installed 1 package in 5.2 milliseconds. +``` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Subsetting data frames + +Next, we are going to talk about how you can get specific values from data frames, and where necessary, change the mode of a column of values. + +The first thing to remember is that a data frame is two-dimensional (rows and +columns). Therefore, to select a specific value we will will once again use +`[]` (bracket) notation, but we will specify more than one value (except in some cases +where we are taking a range). + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Subsetting a data frame + +**Try the following indices and functions and try to figure out what they return** + +a. `variants[1,1]` + +b. `variants[2,4]` + +c. `variants[801,29]` + +d. `variants[2, ]` + +e. `variants[-1, ]` + +f. `variants[1:4,1]` + +g. `variants[1:10,c("REF","ALT")]` + +h. `variants[,c("sample_id")]` + +i. `head(variants)` + +j. `tail(variants)` + +k. `variants$sample_id` + +l. `variants[variants$REF == "A",]` + +::::::::::::::: solution + +## Solution + +a. + + +```r +variants[1,1] +``` + +```{.output} +[1] "SRR2584863" +``` + +b. + + +```r +variants[2,4] +``` + +```{.output} +[1] NA +``` + +c. + + +```r +variants[801,29] +``` + +```{.output} +[1] "T" +``` + +d. + + +```r +variants[2, ] +``` + +```{.output} + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP VDB +2 SRR2584863 CP000819.1 263235 NA G T 85 NA FALSE NA NA 6 0.096133 + RPB MQB BQB MQSB SGB MQ0F ICB HOB AC AN DP4 MQ +2 1 1 1 NA -0.590765 0.166667 NA NA 1 1 0,1,0,5 33 + Indiv gt_PL +2 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 112,0 + gt_GT gt_GT_alleles +2 1 T +``` + +e. + + +```r +variants[-1, ] +``` + + +```{.output} + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF +2 SRR2584863 CP000819.1 263235 NA G T 85 NA FALSE NA NA +3 SRR2584863 CP000819.1 281923 NA G T 217 NA FALSE NA NA +4 SRR2584863 CP000819.1 433359 NA CTTTTTTT CTTTTTTTT 64 NA TRUE 12 1.0 +5 SRR2584863 CP000819.1 473901 NA CCGC CCGCGC 228 NA TRUE 9 0.9 +6 SRR2584863 CP000819.1 648692 NA C T 210 NA FALSE NA NA +7 SRR2584863 CP000819.1 1331794 NA C A 178 NA FALSE NA NA + DP VDB RPB MQB BQB MQSB SGB MQ0F ICB HOB AC AN DP4 MQ +2 6 0.096133 1 1 1 NA -0.590765 0.166667 NA NA 1 1 0,1,0,5 33 +3 10 0.774083 NA NA NA 0.974597 -0.662043 0.000000 NA NA 1 1 0,0,4,5 60 +4 12 0.477704 NA NA NA 1.000000 -0.676189 0.000000 NA NA 1 1 0,1,3,8 60 +5 10 0.659505 NA NA NA 0.916482 -0.662043 0.000000 NA NA 1 1 1,0,2,7 60 +6 10 0.268014 NA NA NA 0.916482 -0.670168 0.000000 NA NA 1 1 0,0,7,3 60 +7 8 0.624078 NA NA NA 0.900802 -0.651104 0.000000 NA NA 1 1 0,0,3,5 60 + Indiv gt_PL +2 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 112,0 +3 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 247,0 +4 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 91,0 +5 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 255,0 +6 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 240,0 +7 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 208,0 + gt_GT gt_GT_alleles +2 1 T +3 1 T +4 1 CTTTTTTTT +5 1 CCGCGC +6 1 T +7 1 A +``` + +f. + + +```r +variants[1:4,1] +``` + +```{.output} +[1] "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" +``` + +g. + + +```r +variants[1:10,c("REF","ALT")] +``` + +```{.output} + REF +1 T +2 G +3 G +4 CTTTTTTT +5 CCGC +6 C +7 C +8 G +9 ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG +10 AT + ALT +1 G +2 T +3 T +4 CTTTTTTTT +5 CCGCGC +6 T +7 A +8 A +9 ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG +10 ATT +``` + +h. + + +```r +variants[,c("sample_id")] +``` + + +```{.output} +[1] "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" +[6] "SRR2584863" +``` + +i. + + +```r +head(variants) +``` + +```{.output} + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF +1 SRR2584863 CP000819.1 9972 NA T G 91 NA FALSE NA NA +2 SRR2584863 CP000819.1 263235 NA G T 85 NA FALSE NA NA +3 SRR2584863 CP000819.1 281923 NA G T 217 NA FALSE NA NA +4 SRR2584863 CP000819.1 433359 NA CTTTTTTT CTTTTTTTT 64 NA TRUE 12 1.0 +5 SRR2584863 CP000819.1 473901 NA CCGC CCGCGC 228 NA TRUE 9 0.9 +6 SRR2584863 CP000819.1 648692 NA C T 210 NA FALSE NA NA + DP VDB RPB MQB BQB MQSB SGB MQ0F ICB HOB AC AN DP4 MQ +1 4 0.0257451 NA NA NA NA -0.556411 0.000000 NA NA 1 1 0,0,0,4 60 +2 6 0.0961330 1 1 1 NA -0.590765 0.166667 NA NA 1 1 0,1,0,5 33 +3 10 0.7740830 NA NA NA 0.974597 -0.662043 0.000000 NA NA 1 1 0,0,4,5 60 +4 12 0.4777040 NA NA NA 1.000000 -0.676189 0.000000 NA NA 1 1 0,1,3,8 60 +5 10 0.6595050 NA NA NA 0.916482 -0.662043 0.000000 NA NA 1 1 1,0,2,7 60 +6 10 0.2680140 NA NA NA 0.916482 -0.670168 0.000000 NA NA 1 1 0,0,7,3 60 + Indiv gt_PL +1 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 121,0 +2 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 112,0 +3 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 247,0 +4 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 91,0 +5 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 255,0 +6 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 240,0 + gt_GT gt_GT_alleles +1 1 G +2 1 T +3 1 T +4 1 CTTTTTTTT +5 1 CCGCGC +6 1 T +``` + +j. + + +```r +tail(variants) +``` + +```{.output} + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP +796 SRR2589044 CP000819.1 3444175 NA G T 184 NA FALSE NA NA 9 +797 SRR2589044 CP000819.1 3481820 NA A G 225 NA FALSE NA NA 12 +798 SRR2589044 CP000819.1 3893550 NA AG AGG 101 NA TRUE 4 1 4 +799 SRR2589044 CP000819.1 3901455 NA A AC 70 NA TRUE 3 1 3 +800 SRR2589044 CP000819.1 4100183 NA A G 177 NA FALSE NA NA 8 +801 SRR2589044 CP000819.1 4431393 NA TGG T 225 NA TRUE 10 1 10 + VDB RPB MQB BQB MQSB SGB MQ0F ICB HOB AC AN DP4 MQ +796 0.4714620 NA NA NA 0.992367 -0.651104 0 NA NA 1 1 0,0,4,4 60 +797 0.8707240 NA NA NA 1.000000 -0.680642 0 NA NA 1 1 0,0,4,8 60 +798 0.9182970 NA NA NA 1.000000 -0.556411 0 NA NA 1 1 0,0,3,1 52 +799 0.0221621 NA NA NA NA -0.511536 0 NA NA 1 1 0,0,3,0 60 +800 0.9272700 NA NA NA 0.900802 -0.651104 0 NA NA 1 1 0,0,3,5 60 +801 0.7488140 NA NA NA 1.007750 -0.670168 0 NA NA 1 1 0,0,4,6 60 + Indiv gt_PL +796 /home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam 214,0 +797 /home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam 255,0 +798 /home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam 131,0 +799 /home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam 100,0 +800 /home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam 207,0 +801 /home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam 255,0 + gt_GT gt_GT_alleles +796 1 T +797 1 G +798 1 AGG +799 1 AC +800 1 G +801 1 T +``` + +k. + + +```r +variants$sample_id +``` + + +```{.output} +[1] "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" +[6] "SRR2584863" +``` + +l. + + +```r +variants[variants$REF == "A",] +``` + + +```{.output} + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP +11 SRR2584863 CP000819.1 2407766 NA A C 104 NA FALSE NA NA 9 +12 SRR2584863 CP000819.1 2446984 NA A C 225 NA FALSE NA NA 20 +14 SRR2584863 CP000819.1 2665639 NA A T 225 NA FALSE NA NA 19 +16 SRR2584863 CP000819.1 3339313 NA A C 211 NA FALSE NA NA 10 +18 SRR2584863 CP000819.1 3481820 NA A G 200 NA FALSE NA NA 9 +19 SRR2584863 CP000819.1 3488669 NA A C 225 NA FALSE NA NA 13 + VDB RPB MQB BQB MQSB SGB MQ0F ICB HOB AC +11 0.0230738 0.900802 0.150134 0.750668 0.500000 -0.590765 0.333333 NA NA 1 +12 0.0714027 NA NA NA 1.000000 -0.689466 0.000000 NA NA 1 +14 0.9960390 NA NA NA 1.000000 -0.690438 0.000000 NA NA 1 +16 0.4059360 NA NA NA 1.007750 -0.670168 0.000000 NA NA 1 +18 0.1070810 NA NA NA 0.974597 -0.662043 0.000000 NA NA 1 +19 0.0162706 NA NA NA 1.000000 -0.680642 0.000000 NA NA 1 + AN DP4 MQ +11 1 3,0,3,2 25 +12 1 0,0,10,6 60 +14 1 0,0,12,5 60 +16 1 0,0,4,6 60 +18 1 0,0,4,5 60 +19 1 0,0,8,4 60 + Indiv gt_PL +11 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 131,0 +12 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 255,0 +14 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 255,0 +16 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 241,0 +18 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 230,0 +19 /home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam 255,0 + gt_GT gt_GT_alleles +11 1 C +12 1 C +14 1 T +16 1 C +18 1 G +19 1 C +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +The subsetting notation is very similar to what we learned for +vectors. The key differences include: + +- Typically provide two values separated by commas: data.frame[row, column] +- In cases where you are taking a continuous range of numbers use a colon + between the numbers (start:stop, inclusive) +- For a non continuous set of numbers, pass a vector using `c()` +- Index using the name of a column(s) by passing them as vectors using `c()` + +Finally, in all of the subsetting exercises above, we printed values to +the screen. You can create a new data frame object by assigning +them to a new object name: + + +```r +# create a new data frame containing only observations from SRR2584863 + +SRR2584863_variants <- variants[variants$sample_id == "SRR2584863",] + +# check the dimension of the data frame + +dim(SRR2584863_variants) +``` + +```{.output} +[1] 25 29 +``` + +```r +# get a summary of the data frame + +summary(SRR2584863_variants) +``` + +```{.output} + sample_id CHROM POS ID + Length:25 Length:25 Min. : 9972 Mode:logical + Class :character Class :character 1st Qu.:1331794 NA's:25 + Mode :character Mode :character Median :2618472 + Mean :2464989 + 3rd Qu.:3488669 + Max. :4616538 + + REF ALT QUAL FILTER + Length:25 Length:25 Min. : 31.89 Mode:logical + Class :character Class :character 1st Qu.:104.00 NA's:25 + Mode :character Mode :character Median :211.00 + Mean :172.97 + 3rd Qu.:225.00 + Max. :228.00 + + INDEL IDV IMF DP + Mode :logical Min. : 2.00 Min. :0.6667 Min. : 2.0 + FALSE:19 1st Qu.: 3.25 1st Qu.:0.9250 1st Qu.: 9.0 + TRUE :6 Median : 8.00 Median :1.0000 Median :10.0 + Mean : 7.00 Mean :0.9278 Mean :10.4 + 3rd Qu.: 9.75 3rd Qu.:1.0000 3rd Qu.:12.0 + Max. :12.00 Max. :1.0000 Max. :20.0 + NA's :19 NA's :19 + VDB RPB MQB BQB + Min. :0.01627 Min. :0.9008 Min. :0.04979 Min. :0.7507 + 1st Qu.:0.07140 1st Qu.:0.9275 1st Qu.:0.09996 1st Qu.:0.7627 + Median :0.37674 Median :0.9542 Median :0.15013 Median :0.7748 + Mean :0.40429 Mean :0.9517 Mean :0.39997 Mean :0.8418 + 3rd Qu.:0.65951 3rd Qu.:0.9771 3rd Qu.:0.57507 3rd Qu.:0.8874 + Max. :0.99604 Max. :1.0000 Max. :1.00000 Max. :1.0000 + NA's :22 NA's :22 NA's :22 + MQSB SGB MQ0F ICB + Min. :0.5000 Min. :-0.6904 Min. :0.00000 Mode:logical + 1st Qu.:0.9599 1st Qu.:-0.6762 1st Qu.:0.00000 NA's:25 + Median :0.9962 Median :-0.6620 Median :0.00000 + Mean :0.9442 Mean :-0.6341 Mean :0.04667 + 3rd Qu.:1.0000 3rd Qu.:-0.6168 3rd Qu.:0.00000 + Max. :1.0128 Max. :-0.4536 Max. :0.66667 + NA's :3 + HOB AC AN DP4 MQ + Mode:logical Min. :1 Min. :1 Length:25 Min. :10.00 + NA's:25 1st Qu.:1 1st Qu.:1 Class :character 1st Qu.:60.00 + Median :1 Median :1 Mode :character Median :60.00 + Mean :1 Mean :1 Mean :55.52 + 3rd Qu.:1 3rd Qu.:1 3rd Qu.:60.00 + Max. :1 Max. :1 Max. :60.00 + + Indiv gt_PL gt_GT gt_GT_alleles + Length:25 Length:25 Min. :1 Length:25 + Class :character Class :character 1st Qu.:1 Class :character + Mode :character Mode :character Median :1 Mode :character + Mean :1 + 3rd Qu.:1 + Max. :1 + +``` + +## Coercing values in data frames + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: coercion isn't limited to data frames + +While we are going to address coercion in the context of data frames +most of these methods apply to other data structures, such as vectors + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +Sometimes, it is possible that R will misinterpret the type of data represented +in a data frame, or store that data in a mode which prevents you from +operating on the data the way you wish. For example, a long list of gene names +isn't usually thought of as a categorical variable, the way that your +experimental condition (e.g. control, treatment) might be. More importantly, +some R packages you use to analyze your data may expect characters as input, +not factors. At other times (such as plotting or some statistical analyses) a +factor may be more appropriate. Ultimately, you should know how to change the +mode of an object. + +First, its very important to recognize that coercion happens in R all the time. +This can be a good thing when R gets it right, or a bad thing when the result +is not what you expect. Consider: + + +```r +snp_chromosomes <- c('3', '11', 'X', '6') +typeof(snp_chromosomes) +``` + +```{.output} +[1] "character" +``` + +Although there are several numbers in our vector, they are all in quotes, so +we have explicitly told R to consider them as characters. However, even if we removed +the quotes from the numbers, R would coerce everything into a character: + + +```r +snp_chromosomes_2 <- c(3, 11, 'X', 6) +typeof(snp_chromosomes_2) +``` + +```{.output} +[1] "character" +``` + +```r +snp_chromosomes_2[1] +``` + +```{.output} +[1] "3" +``` + +We can use the `as.` functions to explicitly coerce values from one form into +another. Consider the following vector of characters, which all happen to be +valid numbers: + + +```r +snp_positions_2 <- c("8762685", "66560624", "67545785", "154039662") +typeof(snp_positions_2) +``` + +```{.output} +[1] "character" +``` + +```r +snp_positions_2[1] +``` + +```{.output} +[1] "8762685" +``` + +Now we can coerce `snp_positions_2` into a numeric type using `as.numeric()`: + + +```r +snp_positions_2 <- as.numeric(snp_positions_2) +typeof(snp_positions_2) +``` + +```{.output} +[1] "double" +``` + +```r +snp_positions_2[1] +``` + +```{.output} +[1] 8762685 +``` + +Sometimes coercion is straight forward, but what would happen if we tried +using `as.numeric()` on `snp_chromosomes_2` + + +```r +snp_chromosomes_2 <- as.numeric(snp_chromosomes_2) +``` + +```{.warning} +Warning: NAs introduced by coercion +``` + +If we check, we will see that an `NA` value (R's default value for missing +data) has been introduced. + + +```r +snp_chromosomes_2 +``` + +```{.output} +[1] 3 11 NA 6 +``` + +Trouble can really start when we try to coerce a factor. For example, when we +try to coerce the `factor_snps` vector into a numeric mode +look at the result: + + +```r +as.numeric(factor_snps) +``` + +```{.output} + [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 +[112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 +[149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 +[186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 +[223] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 +[260] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 +[297] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 +[334] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 +[371] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 +[408] 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 +[445] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 +[482] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 +[519] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 +[556] 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +[593] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +[630] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +[667] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +[704] 2 2 2 2 +``` + +Strangely, it works! Almost. Instead of giving an error message, R returns +numeric values, which in this case are the integers assigned to the levels in +this factor. This kind of behavior can lead to hard-to-find bugs, for example +when we do have numbers in a factor, and we get numbers from a coercion. If +we don't look carefully, we may not notice a problem. + +If you need to coerce an entire column you can overwrite it using an expression +like this one: + + +```r +# make the 'REF' column a character type column + +variants$REF <- as.character(variants$REF) + +# check the type of the column +typeof(variants$REF) +``` + +```{.output} +[1] "character" +``` + +## StringsAsFactors = ? + +Lets summarize this section on coercion with a few take home messages. + +- When you explicitly coerce one data type into another (this is known as + **explicit coercion**), be careful to check the result. Ideally, you should try to see if its possible to avoid steps in your analysis that force you to + coerce. +- R will sometimes coerce without you asking for it. This is called + (appropriately) **implicit coercion**. For example when we tried to create + a vector with multiple data types, R chose one type through implicit + coercion. +- Check the structure (`str()`) of your data frames before working with them! + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: coercion isn't limited to data frames + +Prior to R 4.0 when importing a data frame using any one of the `read.table()` +functions such as `read.csv()` , the argument `StringsAsFactors` was by +default +set to true TRUE. Setting it to FALSE will treat any non-numeric column to +a character type. `read.csv()` documentation, you will also see you can +explicitly type your columns using the `colClasses` argument. Other R packages +(such as the Tidyverse "readr") don't have this particular conversion issue, +but many packages will still try to guess a data type. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Data frame bonus material: math, sorting, renaming + +Here are a few operations that don't need much explanation, but which are good +to know. + +There are lots of arithmetic functions you may want to apply to your data +frame, covering those would be a course in itself (there is some starting +material in [the _Loops in R_ episode of one of the Software Carpentry lessons](https://swcarpentry.github.io/r-novice-inflammation/15-supp-loops-in-depth.html)). Our lessons will cover some additional summary statistical functions in +a subsequent lesson, but overall we will focus on data cleaning and +visualization. + +You can use functions like `mean()`, `min()`, `max()` on an +individual column. Let's look at the "DP" or filtered depth. This value shows the number of filtered +reads that support each of the reported variants. + + +```r +max(variants$DP) +``` + +```{.output} +[1] 79 +``` + +You can sort a data frame using the `order()` function: + + +```r +sorted_by_DP <- variants[order(variants$DP), ] +head(sorted_by_DP$DP) +``` + +```{.output} +[1] 2 2 2 2 2 2 +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise + +The `order()` function lists values in increasing order by default. Look at +the documentation for this function and change `sorted_by_DP` to start with +variants with the greatest filtered depth ("DP"). + +::::::::::::::: solution + +## Solution + + +```r + sorted_by_DP <- variants[order(variants$DP, decreasing = TRUE), ] + head(sorted_by_DP$DP) +``` + +```{.output} +[1] 79 46 41 29 29 27 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +You can rename columns: + + +```r +colnames(variants)[colnames(variants) == "sample_id"] <- "strain" + +# check the column name (hint names are returned as a vector) +colnames(variants) +``` + +```{.output} + [1] "strain" "CHROM" "POS" "ID" + [5] "REF" "ALT" "QUAL" "FILTER" + [9] "INDEL" "IDV" "IMF" "DP" +[13] "VDB" "RPB" "MQB" "BQB" +[17] "MQSB" "SGB" "MQ0F" "ICB" +[21] "HOB" "AC" "AN" "DP4" +[25] "MQ" "Indiv" "gt_PL" "gt_GT" +[29] "gt_GT_alleles" +``` + +## Saving your data frame to a file + +We can save data to a file. We will save our `SRR2584863_variants` object +to a .csv file using the `write.csv()` function: + + +```r +write.csv(SRR2584863_variants, file = "data/SRR2584863_variants.csv") +``` + +The `write.csv()` function has some additional arguments listed in the help, but +at a minimum you need to tell it what data frame to write to file, and give a +path to a file name in quotes (if you only provide a file name, the file will +be written in the current working directory). + +## Importing data from Excel + +Excel is one of the most common formats, so we need to discuss how to make +these files play nicely with R. The simplest way to import data from Excel is +to **save your Excel file in .csv format**\*. You can then import into R right +away. Sometimes you may not be able to do this (imagine you have data in 300 +Excel files, are you going to open and export all of them?). + +One common R package (a set of code with features you can download and add to +your R installation) is the [readxl package](https://CRAN.R-project.org/package=readxl) which can open and import Excel +files. Rather than addressing package installation this second (we'll discuss this soon!), we can take +advantage of RStudio's import feature which integrates this package. (Note: +this feature is available only in the latest versions of RStudio such as is +installed on our cloud instance). + +First, in the RStudio menu go to **File**, select **Import Dataset**, and +choose **From Excel...** (notice there are several other options you can +explore). + +rstudio import menu + +Next, under **File/Url:** click the Browse button and navigate to the **Ecoli\_metadata.xlsx** file located at `/home/dcuser/dc_sample_data/R`. +You should now see a preview of the data to be imported: + +rstudio import screen + +Notice that you have the option to change the data type of each variable by +clicking arrow (drop-down menu) next to each column title. Under **Import +Options** you may also rename the data, choose a different sheet to import, and +choose how you will handle headers and skipped rows. Under **Code Preview** you +can see the code that will be used to import this file. We could have written +this code and imported the Excel file without the RStudio import function, but +now you can choose your preference. + +In this exercise, we will leave the title of the data frame as +**Ecoli\_metadata**, and there are no other options we need to adjust. Click the +Import button to import the data. + +Finally, let's check the first few lines of the `Ecoli_metadata` data +frame: + + + + +```r +head(Ecoli_metadata) +``` + +```{.output} +# A tibble: 6 × 7 + sample generation clade strain cit run genome_size + +1 REL606 0 NA REL606 unknown 4.62 +2 REL1166A 2000 unknown REL606 unknown SRR098028 4.63 +3 ZDB409 5000 unknown REL606 unknown SRR098281 4.6 +4 ZDB429 10000 UC REL606 unknown SRR098282 4.59 +5 ZDB446 15000 UC REL606 unknown SRR098283 4.66 +6 ZDB458 20000 (C1,C2) REL606 unknown SRR098284 4.63 +``` + +The type of this object is 'tibble', a type of data +frame we will talk more about in the 'dplyr' section. If you needed +a true R data frame you could coerce with `as.data.frame()`. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Putting it all together - data frames + +**Using the `Ecoli_metadata` data frame created above, answer the following questions** + +A) What are the dimensions (# rows, # columns) of the data frame? + +B) What are categories are there in the `cit` column? *hint*: treat column as factor + +C) How many of each of the `cit` categories are there? + +D) What is the genome size for the 7th observation in this data set? + +E) What is the median value of the variable `genome_size` + +F) Rename the column `sample` to `sample_id` + +G) Create a new column (name genome\_size\_bp) and set it equal to the genome\_size multiplied by 1,000,000 + +H) Save the edited Ecoli\_metadata data frame as "exercise\_solution.csv" in your current working directory. + +::::::::::::::: solution + +## Solution + + +```r +dim(Ecoli_metadata) +``` + +```{.output} +[1] 30 7 +``` + +```r +levels(as.factor(Ecoli_metadata$cit)) +``` + +```{.output} +[1] "minus" "plus" "unknown" +``` + +```r +table(as.factor(Ecoli_metadata$cit)) +``` + +```{.output} + + minus plus unknown + 9 9 12 +``` + +```r +Ecoli_metadata[7,7] +``` + +```{.output} +# A tibble: 1 × 1 + genome_size + +1 4.62 +``` + +```r +median(Ecoli_metadata$genome_size) +``` + +```{.output} +[1] 4.625 +``` + +```r +colnames(Ecoli_metadata)[colnames(Ecoli_metadata) == "sample"] <- "sample_id" +Ecoli_metadata$genome_size_bp <- Ecoli_metadata$genome_size * 1000000 +write.csv(Ecoli_metadata, file = "exercise_solution.csv") +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- It is easy to import data into R from tabular formats including Excel. However, you still need to check that R has imported and interpreted your data correctly +- There are best practices for organizing your data (keeping it tidy) and R is great for this +- Base R has many useful functions for manipulating your data, but all of R's capabilities are greatly enhanced by software packages developed by the community + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/04-bioconductor-vcfr.md b/04-bioconductor-vcfr.md new file mode 100644 index 000000000..8e290e5dd --- /dev/null +++ b/04-bioconductor-vcfr.md @@ -0,0 +1,148 @@ +--- +title: Using packages from Bioconductor +teaching: 10 +exercises: 3 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Describe what the Bioconductor repository is and what it is used for +- Describe how Bioconductor differs from CRAN +- Search Bioconductor for relevant packages +- Install a package from Bioconductor + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How do I use packages from the Bioconductor repository? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + +## Installing packages from somewhere else besides CRAN? + +So far we have told you about using packages that are included in the base installation of R (this is what comes with R 'out of the box'), and packages that you can install from [CRAN](https://cran.r-project.org/) (the Comprehensive R Archive Network), which is the primary place many people look for supplemental R packages to install. However, not all R packages are available on CRAN. For bioinformatics-related packages in particular, there is another repository that has many powerful packages that you can install. It is called [Bioconductor](https://bioconductor.org/) and it is a repository specifically focused on bioinformatics packages. [Bioconductor](https://bioconductor.org/) has a mission of "promot[ing] the statistical analysis and comprehension of current and emerging high-throughput biological assays." This means that many if not all of the packages available on Bioconductor are focused on the analysis of biological data, and that it can be a great place to look for tools to help you analyze your -omics datasets! + +## So how do I use it? + +Since access to the [Bioconductor](https://bioconductor.org/) repository is not built in to base R 'out of the box', there are a couple steps needed to install packages from this alternative source. We will work through the steps (only 2!) to install a package to help with the VCF analysis we are working on, but you can use the same approach to install any of the many thousands of available packages. + +![](fig/bioconductor_website_screenshot.jpg){alt='screenshot of bioconductor homepage'} + +## First, install the `BiocManager` package + +The first step is to install a package that *is* on CRAN, `BiocManager`. This package will allow us to use it to install packages from Bioconductor. You can think of Bioconductor kind of like an alternative app store for your phone, except instead of apps you are installing packages, and instead of your phone it's your local R package library. + + +```r +# install the BiocManager from CRAN using the base R install.packages() function +install.packages("BiocManager") +``` + +To check if this worked (and also so you can make a note of the version for reproducibility purposes), you can run `BiocManager::version()` and it should give you the version number. + + +```r +# to make sure it worked, check the version +BiocManager::version() +``` + +## Second, install the vcfR package from Bioconductor using `BiocManager` + +::::::::::::::::::::::::::::::::::::::::: callout + +## Head's Up: Installing vcfR may take a while due to numerous dependencies + +Just be aware that installing packages that have many dependencies can take a while. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + +```r +# install the vcfR package from bioconductor using BiocManager::install() +BiocManager::install("vcfR") +``` + +Depending on your particular system, you may need to also allow it to install some dependencies or update installed packages in order to successfully complete the process. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Note: Installing packages from Bioconductor vs from CRAN + +Some packages begin by being available only on Bioconductor, and then later +move to CRAN. `vcfR` is one such package, which originally was only available +from Bioconductor, but is currently available from CRAN. The other thing to +know is that `BiocManager::install()` will also install packages from CRAN (it +is a wrapper around `install.packages()` that adds some extra features). There +are [other benefits to using `BiocManager::install()` for Bioconductor +packages](https://www.bioconductor.org/install/). +In short, Bioconductor packages +have a release cycle that is different from CRAN and the `install()` function +is aware of that difference, so it helps to keep package versions in line with +one another in a way that doesn't generally happen with the base R +`install.packages()`. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Search for Bioconductor packages based on your analysis needs + +While we are only focusing in this workshop on VCF analyses, there are hundreds or thousands of different types of data and analyses that bioinformaticians may want to work with. Sometimes you may get a new dataset and not know exactly where to start with analyzing or visualizing it. The Bioconductor package search view can be a great way to browse through the packages that are available. + +![](fig/bioconductor_search.jpg){alt='screenshot of bioconductor search'} + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Searching for packages on the Bioconductor website + +There are several thousand packages available through the Bioconductor website. +It can be a bit of a challenge to find what you want, but one helpful resource +is the [package search page](https://bioconductor.org/packages/release/BiocViews.html#___Software). + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +In bioinformatics, there are often many different tools that can be used in a +particular instance. The authors of `vcfR` have [compiled some of +them](https://github.com/knausb/vcfR#software-that-produce-vcf-files). One of +those packages that [is available from +Bioconductor](https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html) +is called `VariantAnnotation` and may also be of interest to those working with +vcf files in R. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +- Use the `BiocManager::available()` function to see what packages are available matching a search term. +- Use the [biocViews](https://bioconductor.org/packages/release/BiocViews.html#___Software) interface to search for packages of interest. + +You may or may not want to try installing the package, since not all dependencies always install easily. However, this will at least let you see what is available. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Refreshing the RStudio package view after installing + +If you install a package from Bioconductor, you may need to refresh the RStudio package view to see it in your list. You can do this by clicking the "Refresh" button in the Packages pane of RStudio. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Resources + +- [Bioconductor](https://bioconductor.org/) +- [Bioconductor package search](https://bioconductor.org/packages/release/BiocViews.html#___Software) +- [CRAN](https://cran.r-project.org/) + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Bioconductor is an alternative package repository for bioinformatics packages. +- Installing packages from Bioconductor requires a new method, since it is not compatible with the `install.packages()` function used for CRAN. +- Check Bioconductor to see if there is a package relevant to your analysis before writing code yourself. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/05-dplyr.md b/05-dplyr.md new file mode 100644 index 000000000..571c6982a --- /dev/null +++ b/05-dplyr.md @@ -0,0 +1,988 @@ +--- +title: Data Wrangling and Analyses with Tidyverse +teaching: 40 +exercises: 15 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Describe what the `dplyr` package in R is used for. +- Apply common `dplyr` functions to manipulate data in R. +- Employ the ‘pipe’ operator to link together a sequence of functions. +- Employ the ‘mutate’ function to apply other chosen functions to existing columns and create new columns of data. +- Employ the ‘split-apply-combine’ concept to split the data into groups, apply analysis to each group, and combine the results. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I manipulate data frames without repeating myself? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + +Bracket subsetting is handy, but it can be cumbersome and difficult to read, especially for complicated operations. + +Luckily, the [`dplyr`](https://cran.r-project.org/package=dplyr) +package provides a number of very useful functions for manipulating data frames +in a way that will reduce repetition, reduce the probability of making +errors, and probably even save you some typing. As an added bonus, you might +even find the `dplyr` grammar easier to read. + +Here we're going to cover some of the most commonly used functions as well as using +pipes (`%>%`) to combine them: + +1. `glimpse()` +2. `select()` +3. `filter()` +4. `group_by()` +5. `summarize()` +6. `mutate()` +7. `pivot_longer` and `pivot_wider` + +Packages in R are sets of additional functions that let you do more +stuff in R. The functions we've been using, like `str()`, come built into R; +packages give you access to more functions. You need to install a package and +then load it to be able to use it. + + +```r +install.packages("dplyr") ## installs dplyr package +install.packages("tidyr") ## installs tidyr package +install.packages("ggplot2") ## installs ggplot2 package +install.packages("readr") ## install readr package +``` + +You might get asked to choose a CRAN mirror -- this is asking you to +choose a site to download the package from. The choice doesn't matter too much; I'd recommend choosing the RStudio mirror. + + +```r +library("dplyr") ## loads in dplyr package to use +library("tidyr") ## loads in tidyr package to use +library("ggplot2") ## loads in ggplot2 package to use +library("readr") ## load in readr package to use +``` + +You only need to install a package once per computer, but you need to load it +every time you open a new R session and want to use that package. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Installing packages + +It may be temping to install the `tidyverse` package, as it contains many +useful collection of packages for this lesson and beyond. However, when +teaching or following this lesson, we advise that participants install +`dplyr`, `readr`, `ggplot2`, and `tidyr` individually as shown above. +Otherwise, a substaial amount of the lesson will be spend waiting for the +installation to complete. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## What is dplyr? + +The package `dplyr` is a fairly new (2014) package that tries to provide easy +tools for the most common data manipulation tasks. This package is also included in the [`tidyverse` package](https://www.tidyverse.org/), which is a collection of eight different packages (`dplyr`, `ggplot2`, `tibble`, `tidyr`, `readr`, `purrr`, `stringr`, and `forcats`). It is built to work directly +with data frames. The thinking behind it was largely inspired by the package +`plyr` which has been in use for some time but suffered from being slow in some +cases.` dplyr` addresses this by porting much of the computation to C++. An +additional feature is the ability to work with data stored directly in an +external database. The benefits of doing this are that the data can be managed +natively in a relational database, queries can be conducted on that database, +and only the results of the query returned. + +This addresses a common problem with R in that all operations are conducted in +memory and thus the amount of data you can work with is limited by available +memory. The database connections essentially remove that limitation in that you +can have a database that is over 100s of GB, conduct queries on it directly and pull +back just what you need for analysis in R. + +### Loading .csv files in tidy style + +The Tidyverse's `readr` package provides its own unique way of loading .csv files in to R using `read_csv()`, which is similar to `read.csv()`. `read_csv()` allows users to load in their data faster, doesn't create row names, and allows you to access non-standard variable names (ie. variables that start with numbers of contain spaces), and outputs your data on the R console in a tidier way. In short, it's a much friendlier way of loading in potentially messy data. + +Now let's load our vcf .csv file using `read_csv()`: + + + +### Taking a quick look at data frames + +Similar to `str()`, which comes built into R, `glimpse()` is a `dplyr` function that (as the name suggests) gives a glimpse of the data frame. + + +```{.output} +Rows: 801 +Columns: 29 +$ sample_id "SRR2584863", "SRR2584863", "SRR2584863", "SRR2584863", … +$ CHROM "CP000819.1", "CP000819.1", "CP000819.1", "CP000819.1", … +$ POS 9972, 263235, 281923, 433359, 473901, 648692, 1331794, 1… +$ ID NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ REF "T", "G", "G", "CTTTTTTT", "CCGC", "C", "C", "G", "ACAGC… +$ ALT "G", "T", "T", "CTTTTTTTT", "CCGCGC", "T", "A", "A", "AC… +$ QUAL 91.0000, 85.0000, 217.0000, 64.0000, 228.0000, 210.0000,… +$ FILTER NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ INDEL FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TR… +$ IDV NA, NA, NA, 12, 9, NA, NA, NA, 2, 7, NA, NA, NA, NA, NA,… +$ IMF NA, NA, NA, 1.000000, 0.900000, NA, NA, NA, 0.666667, 1.… +$ DP 4, 6, 10, 12, 10, 10, 8, 11, 3, 7, 9, 20, 12, 19, 15, 10… +$ VDB 0.0257451, 0.0961330, 0.7740830, 0.4777040, 0.6595050, 0… +$ RPB NA, 1.000000, NA, NA, NA, NA, NA, NA, NA, NA, 0.900802, … +$ MQB NA, 1.0000000, NA, NA, NA, NA, NA, NA, NA, NA, 0.1501340… +$ BQB NA, 1.000000, NA, NA, NA, NA, NA, NA, NA, NA, 0.750668, … +$ MQSB NA, NA, 0.974597, 1.000000, 0.916482, 0.916482, 0.900802… +$ SGB -0.556411, -0.590765, -0.662043, -0.676189, -0.662043, -… +$ MQ0F 0.000000, 0.166667, 0.000000, 0.000000, 0.000000, 0.0000… +$ ICB NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ HOB NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ AC 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… +$ AN 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… +$ DP4 "0,0,0,4", "0,1,0,5", "0,0,4,5", "0,1,3,8", "1,0,2,7", "… +$ MQ 60, 33, 60, 60, 60, 60, 60, 60, 60, 60, 25, 60, 10, 60, … +$ Indiv "/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned… +$ gt_PL 1210, 1120, 2470, 910, 2550, 2400, 2080, 2550, 11128, 19… +$ gt_GT 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… +$ gt_GT_alleles "G", "T", "T", "CTTTTTTTT", "CCGCGC", "T", "A", "A", "AC… +``` + +In the above output, we can already gather some information about `variants`, such as the number of rows and columns, column names, type of vector in the columns, and the first few entries of each column. Although what we see is similar to outputs of `str()`, this method gives a cleaner visual output. + +### Selecting columns and filtering rows + +To select columns of a data frame, use `select()`. The first argument to this function is the data frame (`variants`), and the subsequent arguments are the columns to keep. + + +```r +select(variants, sample_id, REF, ALT, DP) +``` + +```{.output} +# A tibble: 801 × 4 + sample_id REF ALT DP + + 1 SRR2584863 T G 4 + 2 SRR2584863 G T 6 + 3 SRR2584863 G T 10 + 4 SRR2584863 CTTTTTTT CTTTTTTTT 12 + 5 SRR2584863 CCGC CCGCGC 10 + 6 SRR2584863 C T 10 + 7 SRR2584863 C A 8 + 8 SRR2584863 G A 11 + 9 SRR2584863 ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG ACAGCCAGCCAGCCAGCCAGCCAGCC… 3 +10 SRR2584863 AT ATT 7 +# ℹ 791 more rows +``` + +To select all columns *except* certain ones, put a "-" in front of +the variable to exclude it. + + +```r +select(variants, -CHROM) +``` + +```{.output} +# A tibble: 801 × 28 + sample_id POS ID REF ALT QUAL FILTER INDEL IDV IMF DP + + 1 SRR2584863 9972 NA T G 91 NA FALSE NA NA 4 + 2 SRR2584863 263235 NA G T 85 NA FALSE NA NA 6 + 3 SRR2584863 281923 NA G T 217 NA FALSE NA NA 10 + 4 SRR2584863 433359 NA CTTTTTTT CTTT… 64 NA TRUE 12 1 12 + 5 SRR2584863 473901 NA CCGC CCGC… 228 NA TRUE 9 0.9 10 + 6 SRR2584863 648692 NA C T 210 NA FALSE NA NA 10 + 7 SRR2584863 1331794 NA C A 178 NA FALSE NA NA 8 + 8 SRR2584863 1733343 NA G A 225 NA FALSE NA NA 11 + 9 SRR2584863 2103887 NA ACAGCCA… ACAG… 56 NA TRUE 2 0.667 3 +10 SRR2584863 2333538 NA AT ATT 167 NA TRUE 7 1 7 +# ℹ 791 more rows +# ℹ 17 more variables: VDB , RPB , MQB , BQB , MQSB , +# SGB , MQ0F , ICB , HOB , AC , AN , DP4 , +# MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles +``` + +`dplyr` also provides useful functions to select columns based on their names. For instance, `ends_with()` allows you to select columns that ends with specific letters. For instance, if you wanted to select columns that end with the letter "B": + + +```r +select(variants, ends_with("B")) +``` + +```{.output} +# A tibble: 801 × 8 + VDB RPB MQB BQB MQSB SGB ICB HOB + + 1 0.0257 NA NA NA NA -0.556 NA NA + 2 0.0961 1 1 1 NA -0.591 NA NA + 3 0.774 NA NA NA 0.975 -0.662 NA NA + 4 0.478 NA NA NA 1 -0.676 NA NA + 5 0.660 NA NA NA 0.916 -0.662 NA NA + 6 0.268 NA NA NA 0.916 -0.670 NA NA + 7 0.624 NA NA NA 0.901 -0.651 NA NA + 8 0.992 NA NA NA 1.01 -0.670 NA NA + 9 0.902 NA NA NA 1 -0.454 NA NA +10 0.568 NA NA NA 1.01 -0.617 NA NA +# ℹ 791 more rows +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +Create a table that contains all the columns with the letter "i" and column "POS", +without columns "Indiv" and "FILTER". +Hint: look at for a function called `contains()`, which can be found in the help documentation for ends with we just covered (`?ends_with`). Note that contains() is not case sensistive. + +::::::::::::::: solution + +## Solution + + +```r +# First, we select "POS" and all columns with letter "i". This will contain columns Indiv and FILTER. +variants_subset <- select(variants, POS, contains("i")) +# Next, we remove columns Indiv and FILTER +variants_result <- select(variants_subset, -Indiv, -FILTER) +variants_result +``` + +```{.output} +# A tibble: 801 × 7 + POS sample_id ID INDEL IDV IMF ICB + + 1 9972 SRR2584863 NA FALSE NA NA NA + 2 263235 SRR2584863 NA FALSE NA NA NA + 3 281923 SRR2584863 NA FALSE NA NA NA + 4 433359 SRR2584863 NA TRUE 12 1 NA + 5 473901 SRR2584863 NA TRUE 9 0.9 NA + 6 648692 SRR2584863 NA FALSE NA NA NA + 7 1331794 SRR2584863 NA FALSE NA NA NA + 8 1733343 SRR2584863 NA FALSE NA NA NA + 9 2103887 SRR2584863 NA TRUE 2 0.667 NA +10 2333538 SRR2584863 NA TRUE 7 1 NA +# ℹ 791 more rows +``` + +::::::::::::::::::::::::: + +We can also get to `variants_result` in one line of code: + +::::::::::::::: solution + +## Alternative solution + + +```r +variants_result <- select(variants, POS, contains("i"), -Indiv, -FILTER) +variants_result +``` + +```{.output} +# A tibble: 801 × 7 + POS sample_id ID INDEL IDV IMF ICB + + 1 9972 SRR2584863 NA FALSE NA NA NA + 2 263235 SRR2584863 NA FALSE NA NA NA + 3 281923 SRR2584863 NA FALSE NA NA NA + 4 433359 SRR2584863 NA TRUE 12 1 NA + 5 473901 SRR2584863 NA TRUE 9 0.9 NA + 6 648692 SRR2584863 NA FALSE NA NA NA + 7 1331794 SRR2584863 NA FALSE NA NA NA + 8 1733343 SRR2584863 NA FALSE NA NA NA + 9 2103887 SRR2584863 NA TRUE 2 0.667 NA +10 2333538 SRR2584863 NA TRUE 7 1 NA +# ℹ 791 more rows +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +To choose rows, use `filter()`: + + +```r +filter(variants, sample_id == "SRR2584863") +``` + +```{.output} +# A tibble: 25 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF + + 1 SRR2584863 CP000819… 9.97e3 NA T G 91 NA FALSE NA NA + 2 SRR2584863 CP000819… 2.63e5 NA G T 85 NA FALSE NA NA + 3 SRR2584863 CP000819… 2.82e5 NA G T 217 NA FALSE NA NA + 4 SRR2584863 CP000819… 4.33e5 NA CTTT… CTTT… 64 NA TRUE 12 1 + 5 SRR2584863 CP000819… 4.74e5 NA CCGC CCGC… 228 NA TRUE 9 0.9 + 6 SRR2584863 CP000819… 6.49e5 NA C T 210 NA FALSE NA NA + 7 SRR2584863 CP000819… 1.33e6 NA C A 178 NA FALSE NA NA + 8 SRR2584863 CP000819… 1.73e6 NA G A 225 NA FALSE NA NA + 9 SRR2584863 CP000819… 2.10e6 NA ACAG… ACAG… 56 NA TRUE 2 0.667 +10 SRR2584863 CP000819… 2.33e6 NA AT ATT 167 NA TRUE 7 1 +# ℹ 15 more rows +# ℹ 18 more variables: DP , VDB , RPB , MQB , BQB , +# MQSB , SGB , MQ0F , ICB , HOB , AC , +# AN , DP4 , MQ , Indiv , gt_PL , gt_GT , +# gt_GT_alleles +``` + +`filter()` will keep all the rows that match the conditions that are provided. +Here are a few examples: + + +```r +# rows for which the reference genome has T or G +filter(variants, REF %in% c("T", "G")) +``` + +```{.output} +# A tibble: 340 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP + + 1 SRR25848… CP00… 9.97e3 NA T G 91 NA FALSE NA NA 4 + 2 SRR25848… CP00… 2.63e5 NA G T 85 NA FALSE NA NA 6 + 3 SRR25848… CP00… 2.82e5 NA G T 217 NA FALSE NA NA 10 + 4 SRR25848… CP00… 1.73e6 NA G A 225 NA FALSE NA NA 11 + 5 SRR25848… CP00… 2.62e6 NA G T 31.9 NA FALSE NA NA 12 + 6 SRR25848… CP00… 3.00e6 NA G A 225 NA FALSE NA NA 15 + 7 SRR25848… CP00… 3.91e6 NA G T 225 NA FALSE NA NA 10 + 8 SRR25848… CP00… 9.97e3 NA T G 214 NA FALSE NA NA 10 + 9 SRR25848… CP00… 1.06e4 NA G A 225 NA FALSE NA NA 11 +10 SRR25848… CP00… 6.40e4 NA G A 225 NA FALSE NA NA 18 +# ℹ 330 more rows +# ℹ 17 more variables: VDB , RPB , MQB , BQB , MQSB , +# SGB , MQ0F , ICB , HOB , AC , AN , DP4 , +# MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles +``` + +```r +# rows that have TRUE in the column INDEL +filter(variants, INDEL) +``` + +```{.output} +# A tibble: 101 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP + + 1 SRR25848… CP00… 4.33e5 NA CTTT… CTTT… 64 NA TRUE 12 1 12 + 2 SRR25848… CP00… 4.74e5 NA CCGC CCGC… 228 NA TRUE 9 0.9 10 + 3 SRR25848… CP00… 2.10e6 NA ACAG… ACAG… 56 NA TRUE 2 0.667 3 + 4 SRR25848… CP00… 2.33e6 NA AT ATT 167 NA TRUE 7 1 7 + 5 SRR25848… CP00… 3.90e6 NA A AC 43.4 NA TRUE 2 1 2 + 6 SRR25848… CP00… 4.43e6 NA TGG T 228 NA TRUE 10 1 10 + 7 SRR25848… CP00… 1.48e5 NA AGGGG AGGG… 122 NA TRUE 8 1 8 + 8 SRR25848… CP00… 1.58e5 NA GTTT… GTTT… 19.5 NA TRUE 6 1 6 + 9 SRR25848… CP00… 1.73e5 NA CAA CA 180 NA TRUE 11 1 11 +10 SRR25848… CP00… 1.75e5 NA GAA GA 194 NA TRUE 10 1 10 +# ℹ 91 more rows +# ℹ 17 more variables: VDB , RPB , MQB , BQB , MQSB , +# SGB , MQ0F , ICB , HOB , AC , AN , DP4 , +# MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles +``` + +```r +# rows that don't have missing data in the IDV column +filter(variants, !is.na(IDV)) +``` + +```{.output} +# A tibble: 101 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP + + 1 SRR25848… CP00… 4.33e5 NA CTTT… CTTT… 64 NA TRUE 12 1 12 + 2 SRR25848… CP00… 4.74e5 NA CCGC CCGC… 228 NA TRUE 9 0.9 10 + 3 SRR25848… CP00… 2.10e6 NA ACAG… ACAG… 56 NA TRUE 2 0.667 3 + 4 SRR25848… CP00… 2.33e6 NA AT ATT 167 NA TRUE 7 1 7 + 5 SRR25848… CP00… 3.90e6 NA A AC 43.4 NA TRUE 2 1 2 + 6 SRR25848… CP00… 4.43e6 NA TGG T 228 NA TRUE 10 1 10 + 7 SRR25848… CP00… 1.48e5 NA AGGGG AGGG… 122 NA TRUE 8 1 8 + 8 SRR25848… CP00… 1.58e5 NA GTTT… GTTT… 19.5 NA TRUE 6 1 6 + 9 SRR25848… CP00… 1.73e5 NA CAA CA 180 NA TRUE 11 1 11 +10 SRR25848… CP00… 1.75e5 NA GAA GA 194 NA TRUE 10 1 10 +# ℹ 91 more rows +# ℹ 17 more variables: VDB , RPB , MQB , BQB , MQSB , +# SGB , MQ0F , ICB , HOB , AC , AN , DP4 , +# MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles +``` + +We have a column titled "QUAL". This is a Phred-scaled confidence +score that a polymorphism exists at this position given the sequencing +data. Lower QUAL scores indicate low probability of a polymorphism +existing at that site. `filter()` can be useful for selecting mutations that +have a QUAL score above a certain threshold: + + +```r +# rows with QUAL values greater than or equal to 100 +filter(variants, QUAL >= 100) +``` + +```{.output} +# A tibble: 666 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP + + 1 SRR25848… CP00… 2.82e5 NA G T 217 NA FALSE NA NA 10 + 2 SRR25848… CP00… 4.74e5 NA CCGC CCGC… 228 NA TRUE 9 0.9 10 + 3 SRR25848… CP00… 6.49e5 NA C T 210 NA FALSE NA NA 10 + 4 SRR25848… CP00… 1.33e6 NA C A 178 NA FALSE NA NA 8 + 5 SRR25848… CP00… 1.73e6 NA G A 225 NA FALSE NA NA 11 + 6 SRR25848… CP00… 2.33e6 NA AT ATT 167 NA TRUE 7 1 7 + 7 SRR25848… CP00… 2.41e6 NA A C 104 NA FALSE NA NA 9 + 8 SRR25848… CP00… 2.45e6 NA A C 225 NA FALSE NA NA 20 + 9 SRR25848… CP00… 2.67e6 NA A T 225 NA FALSE NA NA 19 +10 SRR25848… CP00… 3.00e6 NA G A 225 NA FALSE NA NA 15 +# ℹ 656 more rows +# ℹ 17 more variables: VDB , RPB , MQB , BQB , MQSB , +# SGB , MQ0F , ICB , HOB , AC , AN , DP4 , +# MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles +``` + +`filter()` allows you to combine multiple conditions. You can separate them using a `,` as arguments to the function, they will be combined using the `&` (AND) logical operator. If you need to use the `|` (OR) logical operator, you can specify it explicitly: + + +```r +# this is equivalent to: +# filter(variants, sample_id == "SRR2584863" & QUAL >= 100) +filter(variants, sample_id == "SRR2584863", QUAL >= 100) +``` + +```{.output} +# A tibble: 19 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP + + 1 SRR25848… CP00… 2.82e5 NA G T 217 NA FALSE NA NA 10 + 2 SRR25848… CP00… 4.74e5 NA CCGC CCGC… 228 NA TRUE 9 0.9 10 + 3 SRR25848… CP00… 6.49e5 NA C T 210 NA FALSE NA NA 10 + 4 SRR25848… CP00… 1.33e6 NA C A 178 NA FALSE NA NA 8 + 5 SRR25848… CP00… 1.73e6 NA G A 225 NA FALSE NA NA 11 + 6 SRR25848… CP00… 2.33e6 NA AT ATT 167 NA TRUE 7 1 7 + 7 SRR25848… CP00… 2.41e6 NA A C 104 NA FALSE NA NA 9 + 8 SRR25848… CP00… 2.45e6 NA A C 225 NA FALSE NA NA 20 + 9 SRR25848… CP00… 2.67e6 NA A T 225 NA FALSE NA NA 19 +10 SRR25848… CP00… 3.00e6 NA G A 225 NA FALSE NA NA 15 +11 SRR25848… CP00… 3.34e6 NA A C 211 NA FALSE NA NA 10 +12 SRR25848… CP00… 3.40e6 NA C A 225 NA FALSE NA NA 14 +13 SRR25848… CP00… 3.48e6 NA A G 200 NA FALSE NA NA 9 +14 SRR25848… CP00… 3.49e6 NA A C 225 NA FALSE NA NA 13 +15 SRR25848… CP00… 3.91e6 NA G T 225 NA FALSE NA NA 10 +16 SRR25848… CP00… 4.10e6 NA A G 225 NA FALSE NA NA 16 +17 SRR25848… CP00… 4.20e6 NA A C 225 NA FALSE NA NA 11 +18 SRR25848… CP00… 4.43e6 NA TGG T 228 NA TRUE 10 1 10 +19 SRR25848… CP00… 4.62e6 NA A C 185 NA FALSE NA NA 9 +# ℹ 17 more variables: VDB , RPB , MQB , BQB , MQSB , +# SGB , MQ0F , ICB , HOB , AC , AN , DP4 , +# MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles +``` + +```r +# using `|` logical operator +filter(variants, sample_id == "SRR2584863", (MQ >= 50 | QUAL >= 100)) +``` + +```{.output} +# A tibble: 23 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF + + 1 SRR2584863 CP000819… 9.97e3 NA T G 91 NA FALSE NA NA + 2 SRR2584863 CP000819… 2.82e5 NA G T 217 NA FALSE NA NA + 3 SRR2584863 CP000819… 4.33e5 NA CTTT… CTTT… 64 NA TRUE 12 1 + 4 SRR2584863 CP000819… 4.74e5 NA CCGC CCGC… 228 NA TRUE 9 0.9 + 5 SRR2584863 CP000819… 6.49e5 NA C T 210 NA FALSE NA NA + 6 SRR2584863 CP000819… 1.33e6 NA C A 178 NA FALSE NA NA + 7 SRR2584863 CP000819… 1.73e6 NA G A 225 NA FALSE NA NA + 8 SRR2584863 CP000819… 2.10e6 NA ACAG… ACAG… 56 NA TRUE 2 0.667 + 9 SRR2584863 CP000819… 2.33e6 NA AT ATT 167 NA TRUE 7 1 +10 SRR2584863 CP000819… 2.41e6 NA A C 104 NA FALSE NA NA +# ℹ 13 more rows +# ℹ 18 more variables: DP , VDB , RPB , MQB , BQB , +# MQSB , SGB , MQ0F , ICB , HOB , AC , +# AN , DP4 , MQ , Indiv , gt_PL , gt_GT , +# gt_GT_alleles +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +Select all the mutations that occurred between the positions 1e6 (one million) +and 2e6 (inclusive) that have a QUAL greater than 200, and exclude INDEL mutations. +Hint: to flip logical values such as TRUE to a FALSE, we can use to negation symbol +"!". (eg. !TRUE == FALSE). + +::::::::::::::: solution + +## Solution + + +```r +filter(variants, POS >= 1e6 & POS <= 2e6, QUAL > 200, !INDEL) +``` + +```{.output} +# A tibble: 77 × 29 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP + + 1 SRR25848… CP00… 1.73e6 NA G A 225 NA FALSE NA NA 11 + 2 SRR25848… CP00… 1.00e6 NA A G 225 NA FALSE NA NA 15 + 3 SRR25848… CP00… 1.02e6 NA A G 225 NA FALSE NA NA 12 + 4 SRR25848… CP00… 1.06e6 NA C T 225 NA FALSE NA NA 17 + 5 SRR25848… CP00… 1.06e6 NA A G 206 NA FALSE NA NA 9 + 6 SRR25848… CP00… 1.07e6 NA G T 225 NA FALSE NA NA 11 + 7 SRR25848… CP00… 1.07e6 NA T C 225 NA FALSE NA NA 12 + 8 SRR25848… CP00… 1.10e6 NA C T 225 NA FALSE NA NA 15 + 9 SRR25848… CP00… 1.11e6 NA C T 212 NA FALSE NA NA 9 +10 SRR25848… CP00… 1.11e6 NA A G 225 NA FALSE NA NA 14 +# ℹ 67 more rows +# ℹ 17 more variables: VDB , RPB , MQB , BQB , MQSB , +# SGB , MQ0F , ICB , HOB , AC , AN , DP4 , +# MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +### Pipes + +But what if you wanted to select and filter? We can do this with pipes. Pipes, are a fairly recent addition to R. Pipes let you +take the output of one function and send it directly to the next, which is +useful when you need to many things to the same data set. It was +possible to do this before pipes were added to R, but it was +much messier and more difficult. Pipes in R look like +`%>%` and are made available via the `magrittr` package, which is installed as +part of `dplyr`. If you use RStudio, you can type the pipe with +Ctrl + Shift + M if you're using a PC, +or Cmd + Shift + M if you're using a Mac. + + +```r +variants %>% + filter(sample_id == "SRR2584863") %>% + select(REF, ALT, DP) +``` + +```{.output} +# A tibble: 25 × 3 + REF ALT DP + + 1 T G 4 + 2 G T 6 + 3 G T 10 + 4 CTTTTTTT CTTTTTTTT 12 + 5 CCGC CCGCGC 10 + 6 C T 10 + 7 C A 8 + 8 G A 11 + 9 ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGC… 3 +10 AT ATT 7 +# ℹ 15 more rows +``` + +In the above code, we use the pipe to send the `variants` data set first through +`filter()`, to keep rows where `sample_id` matches a particular sample, and then through `select()` to +keep only the `REF`, `ALT`, and `DP` columns. Since `%>%` takes +the object on its left and passes it as the first argument to the function on +its right, we don't need to explicitly include the data frame as an argument +to the `filter()` and `select()` functions any more. + +Some may find it helpful to read the pipe like the word "then". For instance, +in the above example, we took the data frame `variants`, *then* we `filter`ed +for rows where `sample_id` was SRR2584863, *then* we `select`ed the `REF`, `ALT`, and `DP` columns. +The **`dplyr`** functions by themselves are somewhat simple, +but by combining them into linear workflows with the pipe, we can accomplish +more complex manipulations of data frames. + +If we want to create a new object with this smaller version of the data we +can do so by assigning it a new name: + + +```r +SRR2584863_variants <- variants %>% + filter(sample_id == "SRR2584863") %>% + select(REF, ALT, DP) +``` + +This new object includes all of the data from this sample. Let's look at just +the first six rows to confirm it's what we want: + + +```r +SRR2584863_variants +``` + +```{.output} +# A tibble: 25 × 3 + REF ALT DP + + 1 T G 4 + 2 G T 6 + 3 G T 10 + 4 CTTTTTTT CTTTTTTTT 12 + 5 CCGC CCGCGC 10 + 6 C T 10 + 7 C A 8 + 8 G A 11 + 9 ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGC… 3 +10 AT ATT 7 +# ℹ 15 more rows +``` + +Similar to `head()` and `tail()` functions, we can also look at the first or last six rows using tidyverse function `slice()`. Slice is a more versatile function that allows users to specify a range to view: + + +```r +SRR2584863_variants %>% slice(1:6) +``` + +```{.output} +# A tibble: 6 × 3 + REF ALT DP + +1 T G 4 +2 G T 6 +3 G T 10 +4 CTTTTTTT CTTTTTTTT 12 +5 CCGC CCGCGC 10 +6 C T 10 +``` + + +```r +SRR2584863_variants %>% slice(10:25) +``` + +```{.output} +# A tibble: 16 × 3 + REF ALT DP + + 1 AT ATT 7 + 2 A C 9 + 3 A C 20 + 4 G T 12 + 5 A T 19 + 6 G A 15 + 7 A C 10 + 8 C A 14 + 9 A G 9 +10 A C 13 +11 A AC 2 +12 G T 10 +13 A G 16 +14 A C 11 +15 TGG T 10 +16 A C 9 +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise: Pipe and filter + +Starting with the `variants` data frame, use pipes to subset the data +to include only observations from SRR2584863 sample, +where the filtered depth (DP) is at least 10. +Showing only 5th through 11th rows of columns `REF`, `ALT`, and `POS`. + +::::::::::::::: solution + +## Solution + + +```r + variants %>% + filter(sample_id == "SRR2584863" & DP >= 10) %>% + slice(5:11) %>% + select(sample_id, DP, REF, ALT, POS) +``` + +```{.output} +# A tibble: 7 × 5 + sample_id DP REF ALT POS + +1 SRR2584863 11 G A 1733343 +2 SRR2584863 20 A C 2446984 +3 SRR2584863 12 G T 2618472 +4 SRR2584863 19 A T 2665639 +5 SRR2584863 15 G A 2999330 +6 SRR2584863 10 A C 3339313 +7 SRR2584863 14 C A 3401754 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +### Mutate + +Frequently you'll want to create new columns based on the values in existing +columns, for example to do unit conversions or find the ratio of values in two +columns. For this we'll use the `dplyr` function `mutate()`. + +For example, we can convert the polymorphism confidence value QUAL to a +probability value according to the formula: + +Probability = 1- 10 ^ -(QUAL/10) + +We can use `mutate` to add a column (`POLPROB`) to our `variants` data frame that shows +the probability of a polymorphism at that site given the data. + + +```r +variants %>% + mutate(POLPROB = 1 - (10 ^ -(QUAL/10))) +``` + +```{.output} +# A tibble: 801 × 30 + sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF + + 1 SRR2584863 CP000819… 9.97e3 NA T G 91 NA FALSE NA NA + 2 SRR2584863 CP000819… 2.63e5 NA G T 85 NA FALSE NA NA + 3 SRR2584863 CP000819… 2.82e5 NA G T 217 NA FALSE NA NA + 4 SRR2584863 CP000819… 4.33e5 NA CTTT… CTTT… 64 NA TRUE 12 1 + 5 SRR2584863 CP000819… 4.74e5 NA CCGC CCGC… 228 NA TRUE 9 0.9 + 6 SRR2584863 CP000819… 6.49e5 NA C T 210 NA FALSE NA NA + 7 SRR2584863 CP000819… 1.33e6 NA C A 178 NA FALSE NA NA + 8 SRR2584863 CP000819… 1.73e6 NA G A 225 NA FALSE NA NA + 9 SRR2584863 CP000819… 2.10e6 NA ACAG… ACAG… 56 NA TRUE 2 0.667 +10 SRR2584863 CP000819… 2.33e6 NA AT ATT 167 NA TRUE 7 1 +# ℹ 791 more rows +# ℹ 19 more variables: DP , VDB , RPB , MQB , BQB , +# MQSB , SGB , MQ0F , ICB , HOB , AC , +# AN , DP4 , MQ , Indiv , gt_PL , gt_GT , +# gt_GT_alleles , POLPROB +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exercise + +There are a lot of columns in our data set, so let's just look at the +`sample_id`, `POS`, `QUAL`, and `POLPROB` columns for now. Add a +line to the above code to only show those columns. + +::::::::::::::: solution + +## Solution + + +```r +variants %>% + mutate(POLPROB = 1 - 10 ^ -(QUAL/10)) %>% + select(sample_id, POS, QUAL, POLPROB) +``` + +```{.output} +# A tibble: 801 × 4 + sample_id POS QUAL POLPROB + + 1 SRR2584863 9972 91 1.00 + 2 SRR2584863 263235 85 1.00 + 3 SRR2584863 281923 217 1 + 4 SRR2584863 433359 64 1.00 + 5 SRR2584863 473901 228 1 + 6 SRR2584863 648692 210 1 + 7 SRR2584863 1331794 178 1 + 8 SRR2584863 1733343 225 1 + 9 SRR2584863 2103887 56 1.00 +10 SRR2584863 2333538 167 1 +# ℹ 791 more rows +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +### group\_by() and summarize() functions + +Many data analysis tasks can be approached using the "split-apply-combine" +paradigm: split the data into groups, apply some analysis to each group, and +then combine the results. `dplyr` makes this very easy through the use of the +`group_by()` function, which splits the data into groups. + +We can use `group_by()` to tally the number of mutations detected in each sample +using the function `tally()`: + + +```r +variants %>% + group_by(sample_id) %>% + tally() +``` + +```{.output} +# A tibble: 3 × 2 + sample_id n + +1 SRR2584863 25 +2 SRR2584866 766 +3 SRR2589044 10 +``` + +Since counting or tallying values is a common use case for `group_by()`, an alternative function was created to bypasses `group_by()` using the function `count()`: + + +```r +variants %>% + count(sample_id) +``` + +```{.output} +# A tibble: 3 × 2 + sample_id n + +1 SRR2584863 25 +2 SRR2584866 766 +3 SRR2589044 10 +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +- How many mutations are INDELs? + +::::::::::::::: solution + +## Solution + + +```r +variants %>% + count(INDEL) +``` + +```{.output} +# A tibble: 2 × 2 + INDEL n + +1 FALSE 700 +2 TRUE 101 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +When the data is grouped, `summarize()` can be used to collapse each group into +a single-row summary. `summarize()` does this by applying an aggregating +or summary function to each group. + +It can be a bit tricky at first, but we can imagine physically splitting the data +frame by groups and applying a certain function to summarize the data. + +
+rstudio default session +
+^[The figure was adapted from the Software Carpentry lesson, [R for Reproducible Scientific Analysis](https://swcarpentry.github.io/r-novice-gapminder/13-dplyr/)] + +We can also apply many other functions to individual columns to get other +summary statistics. For example,we can use built-in functions like `mean()`, +`median()`, `min()`, and `max()`. These are called "built-in functions" because +they come with R and don't require that you install any additional packages. +By default, all **R functions operating on vectors that contains missing data will return NA**. +It's a way to make sure that users know they have missing data, and make a +conscious decision on how to deal with it. When dealing with simple statistics +like the mean, the easiest way to ignore `NA` (the missing data) is +to use `na.rm = TRUE` (`rm` stands for remove). + +So to view the mean, median, maximum, and minimum filtered depth (`DP`) for each sample: + + +```r +variants %>% + group_by(sample_id) %>% + summarize( + mean_DP = mean(DP), + median_DP = median(DP), + min_DP = min(DP), + max_DP = max(DP)) +``` + +```{.output} +# A tibble: 3 × 5 + sample_id mean_DP median_DP min_DP max_DP + +1 SRR2584863 10.4 10 2 20 +2 SRR2584866 10.6 10 2 79 +3 SRR2589044 9.3 9.5 3 16 +``` +::::::::::::::::::::::::::::::::::::::::: callout +## Grouped Data Frames in Tidyverse + +When you group a data frame with `group_by()`, you get a grouped data frame. This is a special type of data frame that has all the properties of a regular data frame but also has an additional attribute that describes the grouping structure. The primary advantage of a grouped data frame is that it allows you to work with each group of observations as if they were a separate data frame. + +Operations like `summarise()` and `mutate()` will be applied to each group separately. This is particularly useful when you want to perform calculations on subsets of your data. + +To remove the grouping structure from a grouped data frame, you can use the `ungroup()` function. This will return a regular data frame. + +For more details, refer to the [dplyr documentation on grouping](https://dplyr.tidyverse.org/articles/grouping.html). +:::::::::::::::::::::::::::::::::::::::::::::::::: + +### Reshaping data frames + +It can sometimes be useful to transform the "long" tidy format, into the wide format. This transformation can be done with the `pivot_wider()` function provided by the `tidyr` package (also part of the `tidyverse`). + +`pivot_wider()` takes a data frame as the first argument, and two arguments: the column name that will become the columns and the column name that will become the cells in the wide data. + + +```r +variants_wide <- variants %>% + group_by(sample_id, CHROM) %>% + summarize(mean_DP = mean(DP)) %>% + pivot_wider(names_from = sample_id, values_from = mean_DP) +``` + +```{.output} +`summarise()` has grouped output by 'sample_id'. You can override using the +`.groups` argument. +``` + +```r +variants_wide +``` + +```{.output} +# A tibble: 1 × 4 + CHROM SRR2584863 SRR2584866 SRR2589044 + +1 CP000819.1 10.4 10.6 9.3 +``` + +The opposite operation of `pivot_wider()` is taken care by `pivot_longer()`. We specify the names of the new columns, and here add `-CHROM` as this column shouldn't be affected by the reshaping: + + +```r +variants_wide %>% + pivot_longer(-CHROM, names_to = "sample_id", values_to = "mean_DP") +``` + +```{.output} +# A tibble: 3 × 3 + CHROM sample_id mean_DP + +1 CP000819.1 SRR2584863 10.4 +2 CP000819.1 SRR2584866 10.6 +3 CP000819.1 SRR2589044 9.3 +``` + +### Resources + +- [Handy dplyr cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf) + +- [Much of this lesson was copied or adapted from Jeff Hollister's materials](https://github.com/USEPA/workshops/blob/main/r/2015/introR/rmd_posts/2015-01-14-03-Clean.Rmd) + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Use the `dplyr` package to manipulate data frames. +- Use `glimpse()` to quickly look at your data frame. +- Use `select()` to choose variables from a data frame. +- Use `filter()` to choose data based on values. +- Use `mutate()` to create new variables. +- Use `group_by()` and `summarize()` to work with subsets of data. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/06-data-visualization.md b/06-data-visualization.md new file mode 100644 index 000000000..54987fd62 --- /dev/null +++ b/06-data-visualization.md @@ -0,0 +1,545 @@ +--- +title: Data Visualization with ggplot2 +teaching: 60 +exercises: 30 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Describe the role of data, aesthetics, and geoms in ggplot functions. +- Choose the correct aesthetics and alter the geom parameters for a scatter plot, histogram, or box plot. +- Layer multiple geometries in a single plot. +- Customize plot scales, titles, themes, and fonts. +- Apply a facet to a plot. +- Apply additional ggplot2-compatible plotting libraries. +- Save a ggplot to a file. +- List several resources for getting help with ggplot. +- List several resources for creating informative scientific plots. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- What is ggplot2? +- What is mapping, and what is aesthetics? +- What is the process of creating a publication-quality plots with ggplot in R? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + + + + + +## Introduction to **`ggplot2`** + +Line plot enclosed in hexagon shape with ggplot2 typed beneath and www.rstudio.com at the bottom. + +**`ggplot2`** is a plotting package, part of the tidyverse, +that makes it simple to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatter plot. This helps in creating publication-quality plots with minimal amounts of adjustments and tweaking. + +The **gg** in "**ggplot**" stands for "**G**rammar of **G**raphics," which is an elegant yet powerful way to describe the making of scientific plots. In short, the grammar of graphics breaks down every plot into a few components, namely, a dataset, a set of geoms (visual marks that represent the data points), and a coordinate system. You can imagine this is a grammar that gives unique names to each component appearing in a plot and conveys specific information about data. With **ggplot**, graphics are built step by step by adding new elements. + +The idea of **mapping** is crucial in **ggplot**. One familiar example is to *map* the value of one variable in a dataset to $x$ and the other to $y$. However, we often encounter datasets that include multiple (more than two) variables. In this case, **ggplot** allows you to *map* those other variables to visual marks such as **color** and **shape** (**aesthetics** or `aes`). One thing you may want to remember is the difference between **discrete** and **continuous** variables. Some aesthetics, such as the shape of dots, do not accept continuous variables. If forced to do so, R will give an error. This is easy to understand; we cannot create a continuum of shapes for a variable, unlike, say, color. + +**Tip:** when having doubts about whether a variable is [continuous or discrete](https://en.wikipedia.org/wiki/Continuous_or_discrete_variable), a quick way to check is to use the [`summary()`](https://www.geeksforgeeks.org/get-summary-of-results-produced-by-functions-in-r-programming-summary-function/) function. Continuous variables have descriptive statistics but not the discrete variables. + +## Installing `tidyverse` + +First, we need to install the `ggplot2` package. + + +```r +install.packages("ggplot2") +``` + +Now, let's load the `ggplot2` package: + + +```r +library(ggplot2) +``` + +We will also use some of the other tidyverse packages we used in the last episode, so we need to load them as well. + + +```r +library(readr) +library(dplyr) +``` + +```{.output} + +Attaching package: 'dplyr' +``` + +```{.output} +The following objects are masked from 'package:stats': + + filter, lag +``` + +```{.output} +The following objects are masked from 'package:base': + + intersect, setdiff, setequal, union +``` + +As we can see from above output **`ggplot2`** has been already loaded along with other packages as part of the **`tidyverse`** framework. + +## Loading the dataset + + +```r +variants <- read.csv("https://raw.githubusercontent.com/datacarpentry/genomics-r-intro/main/episodes/data/combined_tidy_vcf.csv") +``` + +Explore the *structure* (types of columns and number of rows) of the dataset using [dplyr](https://dplyr.tidyverse.org/index.html)'s [`glimpse()`](https://dplyr.tidyverse.org/reference/glimpse.html) (for more info, see the [Data Wrangling and Analyses with Tidyverse](https://datacarpentry.org/genomics-r-intro/05-dplyr/) episode) + + +```r +glimpse(variants) # Show a snapshot of the rows and columns +``` + +```{.output} +Rows: 801 +Columns: 29 +$ sample_id "SRR2584863", "SRR2584863", "SRR2584863", "SRR2584863", … +$ CHROM "CP000819.1", "CP000819.1", "CP000819.1", "CP000819.1", … +$ POS 9972, 263235, 281923, 433359, 473901, 648692, 1331794, 1… +$ ID NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ REF "T", "G", "G", "CTTTTTTT", "CCGC", "C", "C", "G", "ACAGC… +$ ALT "G", "T", "T", "CTTTTTTTT", "CCGCGC", "T", "A", "A", "AC… +$ QUAL 91.0000, 85.0000, 217.0000, 64.0000, 228.0000, 210.0000,… +$ FILTER NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ INDEL FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TR… +$ IDV NA, NA, NA, 12, 9, NA, NA, NA, 2, 7, NA, NA, NA, NA, NA,… +$ IMF NA, NA, NA, 1.000000, 0.900000, NA, NA, NA, 0.666667, 1.… +$ DP 4, 6, 10, 12, 10, 10, 8, 11, 3, 7, 9, 20, 12, 19, 15, 10… +$ VDB 0.0257451, 0.0961330, 0.7740830, 0.4777040, 0.6595050, 0… +$ RPB NA, 1.000000, NA, NA, NA, NA, NA, NA, NA, NA, 0.900802, … +$ MQB NA, 1.0000000, NA, NA, NA, NA, NA, NA, NA, NA, 0.1501340… +$ BQB NA, 1.000000, NA, NA, NA, NA, NA, NA, NA, NA, 0.750668, … +$ MQSB NA, NA, 0.974597, 1.000000, 0.916482, 0.916482, 0.900802… +$ SGB -0.556411, -0.590765, -0.662043, -0.676189, -0.662043, -… +$ MQ0F 0.000000, 0.166667, 0.000000, 0.000000, 0.000000, 0.0000… +$ ICB NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ HOB NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … +$ AC 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… +$ AN 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… +$ DP4 "0,0,0,4", "0,1,0,5", "0,0,4,5", "0,1,3,8", "1,0,2,7", "… +$ MQ 60, 33, 60, 60, 60, 60, 60, 60, 60, 60, 25, 60, 10, 60, … +$ Indiv "/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned… +$ gt_PL "121,0", "112,0", "247,0", "91,0", "255,0", "240,0", "20… +$ gt_GT 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… +$ gt_GT_alleles "G", "T", "T", "CTTTTTTTT", "CCGCGC", "T", "A", "A", "AC… +``` + +Alternatively, we can display the first a few rows (vertically) of the table using [`head()`](https://www.geeksforgeeks.org/get-the-first-parts-of-a-data-set-in-r-programming-head-function/): + + +```r +head(variants) +``` + + + +|sample_id |CHROM | POS|ID |REF |ALT | QUAL|FILTER |INDEL | IDV| IMF| DP| VDB| RPB| MQB| BQB| MQSB| SGB| MQ0F|ICB |HOB | AC| AN|DP4 | MQ|Indiv |gt_PL | gt_GT|gt_GT_alleles | +|:----------|:----------|------:|:--|:--------|:---------|----:|:------|:-----|---:|---:|--:|---------:|---:|---:|---:|--------:|---------:|--------:|:---|:---|--:|--:|:-------|--:|:------------------------------------------------------------------|:-----|-----:|:-------------| +|SRR2584863 |CP000819.1 | 9972|NA |T |G | 91|NA |FALSE | NA| NA| 4| 0.0257451| NA| NA| NA| NA| -0.556411| 0.000000|NA |NA | 1| 1|0,0,0,4 | 60|/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam |121,0 | 1|G | +|SRR2584863 |CP000819.1 | 263235|NA |G |T | 85|NA |FALSE | NA| NA| 6| 0.0961330| 1| 1| 1| NA| -0.590765| 0.166667|NA |NA | 1| 1|0,1,0,5 | 33|/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam |112,0 | 1|T | +|SRR2584863 |CP000819.1 | 281923|NA |G |T | 217|NA |FALSE | NA| NA| 10| 0.7740830| NA| NA| NA| 0.974597| -0.662043| 0.000000|NA |NA | 1| 1|0,0,4,5 | 60|/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam |247,0 | 1|T | +|SRR2584863 |CP000819.1 | 433359|NA |CTTTTTTT |CTTTTTTTT | 64|NA |TRUE | 12| 1.0| 12| 0.4777040| NA| NA| NA| 1.000000| -0.676189| 0.000000|NA |NA | 1| 1|0,1,3,8 | 60|/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam |91,0 | 1|CTTTTTTTT | +|SRR2584863 |CP000819.1 | 473901|NA |CCGC |CCGCGC | 228|NA |TRUE | 9| 0.9| 10| 0.6595050| NA| NA| NA| 0.916482| -0.662043| 0.000000|NA |NA | 1| 1|1,0,2,7 | 60|/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam |255,0 | 1|CCGCGC | +|SRR2584863 |CP000819.1 | 648692|NA |C |T | 210|NA |FALSE | NA| NA| 10| 0.2680140| NA| NA| NA| 0.916482| -0.670168| 0.000000|NA |NA | 1| 1|0,0,7,3 | 60|/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam |240,0 | 1|T | + +**`ggplot2`** functions like data in the **long** format, i.e., a column for every dimension (variable), and a row for every observation. Well-structured data will save you time when making figures with **`ggplot2`** + +**`ggplot2`** graphics are built step-by-step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots, and more equally important the readability of the code. + +To build a ggplot, we will use the following basic template that can be used for different types of plots: + + +```r +ggplot(data = , mapping = aes()) + () +``` + +- use the `ggplot()` function and bind the plot to a specific data frame using the + `data` argument + + +```r +ggplot(data = variants) +``` + +- define a mapping (using the aesthetic (`aes`) function), by selecting the variables to be plotted and specifying how to present them in the graph, e.g. as x and y positions or characteristics such as size, shape, color, etc. + + +```r +ggplot(data = variants, aes(x = POS, y = DP)) +``` + +- add 'geoms' – graphical representations of the data in the plot (points, + lines, bars). **`ggplot2`** offers many different geoms; we will use some + common ones today, including: + - [`geom_point()`](https://ggplot2.tidyverse.org/reference/geom_point.html) for scatter plots, dot plots, etc. + - [`geom_boxplot()`](https://ggplot2.tidyverse.org/reference/geom_boxplot.html) for, well, boxplots! + - [`geom_line()`](https://ggplot2.tidyverse.org/reference/geom_path.html) for trend lines, time series, etc. + +To add a geom to the plot use the `+` operator. Because we have two continuous variables, let's use [`geom_point()`](https://ggplot2.tidyverse.org/reference/geom_point.html) (i.e., a scatter plot) first: + + +```r +ggplot(data = variants, aes(x = POS, y = DP)) + + geom_point() +``` + + + +The `+` in the **`ggplot2`** package is particularly useful because it allows you to modify existing `ggplot` objects. This means you can easily set up plot templates and conveniently explore different types of plots, so the above plot can also be generated with code like this: + + +```r +# Assign plot to a variable +coverage_plot <- ggplot(data = variants, aes(x = POS, y = DP)) + +# Draw the plot +coverage_plot + + geom_point() +``` + +**Notes** + +- Anything you put in the [`ggplot()`](https://ggplot2.tidyverse.org/reference/ggplot.html) function can be seen by any geom layers that you add (i.e., these are universal plot settings). This includes the x- and y-axis mapping you set up in [`aes()`](https://ggplot2.tidyverse.org/reference/aes.html). +- You can also specify mappings for a given geom independently of the mappings defined globally in the [`ggplot()`](https://ggplot2.tidyverse.org/reference/ggplot.html) function. +- The `+` sign used to add new layers must be placed at the end of the line containing the *previous* layer. If, instead, the `+` sign is added at the beginning of the line containing the new layer, **`ggplot2`** will not add the new layer and will return an error message. + + +```r +# This is the correct syntax for adding layers +coverage_plot + + geom_point() + +# This will not add the new layer and will return an error message +coverage_plot + + geom_point() +``` + +## Building your plots iteratively + +Building plots with **`ggplot2`** is typically an iterative process. We start by defining the dataset we'll use, lay out the axes, and choose a geom: + + +```r +ggplot(data = variants, aes(x = POS, y = DP)) + + geom_point() +``` + + + +Then, we start modifying this plot to extract more information from it. For instance, we can add transparency (`alpha`) to avoid over-plotting: + + +```r +ggplot(data = variants, aes(x = POS, y = DP)) + + geom_point(alpha = 0.5) +``` + + + +We can also add colors for all the points: + + +```r +ggplot(data = variants, aes(x = POS, y = DP)) + + geom_point(alpha = 0.5, color = "blue") +``` + + + +Or to color each species in the plot differently, you could use a vector as an input to the argument **color**. **`ggplot2`** will provide a different color corresponding to different values in the vector. Here is an example where we color with **`sample_id`**: + + +```r +ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + + geom_point(alpha = 0.5) +``` + + + +Notice that we can change the geom layer and colors will be still determined by **`sample_id`** + + +```r +ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + + geom_line(alpha = 0.5) +``` + + + +To make our plot more readable, we can add axis labels: + + +```r +ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + + geom_point(alpha = 0.5) + + labs(x = "Base Pair Position", + y = "Read Depth (DP)") +``` + + + +To add a *main* title to the plot, we use [the title argument for the `labs()` function](https://ggplot2.tidyverse.org/reference/labs.html): + + +```r +ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + + geom_point(alpha = 0.5) + + labs(x = "Base Pair Position", + y = "Read Depth (DP)", + title = "Read Depth vs. Position") +``` + + + +Now the figure is complete and ready to be exported and saved to a file. This can be achieved easily using [`ggsave()`](https://ggplot2.tidyverse.org/reference/ggsave.html), which can write, by default, the most recent generated figure into different formats (e.g., `jpeg`, `png`, `pdf`) according to the file extension. So, for example, to create a pdf version of the above figure with a dimension of $6\times4$ inches: + + +```r +ggsave ("depth.pdf", width = 6, height = 4) +``` + +If we check the *current working directory*, there should be a newly created file called `depth.pdf` with the above plot. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +Use what you just learned to create a scatter plot of mapping quality (`MQ`) over +position (`POS`) with the samples showing in different colors. Make sure to give your plot +relevant axis labels. + +::::::::::::::: solution + +## Solution + + +```r + ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + + geom_point() + + labs(x = "Base Pair Position", + y = "Mapping Quality (MQ)") +``` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +To further customize the plot, we can change the default font format: + + +```r +ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + + geom_point(alpha = 0.5) + + labs(x = "Base Pair Position", + y = "Read Depth (DP)", + title = "Read Depth vs. Position") + + theme(text = element_text(family = "Bookman")) +``` + + + +## Faceting + +**`ggplot2`** has a special technique called *faceting* that allows the user to split one plot into multiple plots (panels) based on a factor (variable) included in the dataset. We will use it to split our mapping quality plot into three panels, one for each sample. + + +```r +ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + + geom_point() + + labs(x = "Base Pair Position", + y = "Mapping Quality (MQ)") + + facet_grid(~ sample_id) +``` + + + +This looks okay, but it would be easier to read if the plot facets were stacked vertically rather than horizontally. The `facet_grid` geometry allows you to explicitly specify how you want your plots to be arranged via formula notation (`rows ~ columns`; the dot (`.`) indicates every other variable in the data i.e., no faceting on that side of the formula). + + +```r +ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + + geom_point() + + labs(x = "Base Pair Position", + y = "Mapping Quality (MQ)") + + facet_grid(sample_id ~ .) +``` + + + +Usually plots with white background look more readable when printed. We can set the background to white using the function [`theme_bw()`](https://ggplot2.tidyverse.org/reference/ggtheme.html). Additionally, you can remove the grid: + + +```r +ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + + geom_point() + + labs(x = "Base Pair Position", + y = "Mapping Quality (MQ)") + + facet_grid(sample_id ~ .) + + theme_bw() + + theme(panel.grid = element_blank()) +``` + + + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +Use what you just learned to create a scatter plot of PHRED scaled quality (`QUAL`) over +position (`POS`) with the samples showing in different colors. Make sure to give your plot +relevant axis labels. + +::::::::::::::: solution + +## Solution + + +```r + ggplot(data = variants, aes(x = POS, y = QUAL, color = sample_id)) + + geom_point() + + labs(x = "Base Pair Position", + y = "PHRED-sacled Quality (QUAL)") + + facet_grid(sample_id ~ .) +``` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Barplots + +We can create barplots using the [`geom_bar`](https://ggplot2.tidyverse.org/reference/geom_bar.html) geom. Let's make a barplot showing the number of variants for each sample that are indels. + + +```r +ggplot(data = variants, aes(x = INDEL, fill = sample_id)) + + geom_bar() + + facet_grid(sample_id ~ .) +``` + + + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +Since we already have the sample\_id labels on the individual plot facets, we don't need the +legend. Use the help file for `geom_bar` and any other online resources you want to use to +remove the legend from the plot. + +::::::::::::::: solution + +## Solution + + +```r +ggplot(data = variants, aes(x = INDEL, color = sample_id)) + + geom_bar(show.legend = F) + + facet_grid(sample_id ~ .) +``` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Density + +We can create density plots using the [`geom_density`](https://ggplot2.tidyverse.org/reference/geom_density.html) geom that shows the distribution of of a variable in the dataset. Let's plot the distribution of `DP` + + +```r +ggplot(data = variants, aes(x = DP)) + + geom_density() +``` + + + +This plot tells us that the most of frequent `DP` (read depth) for the variants is about 10 reads. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +Use [`geom_density`](https://ggplot2.tidyverse.org/reference/geom_density.html) to plot the distribution of `DP` with a different fill for each sample. Use a white background for the plot. + +::::::::::::::: solution + +## Solution + + +```r +ggplot(data = variants, aes(x = DP, fill = sample_id)) + + geom_density(alpha = 0.5) + + theme_bw() +``` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## **`ggplot2`** themes + +In addition to [`theme_bw()`](https://ggplot2.tidyverse.org/reference/ggtheme.html), which changes the plot background to white, **`ggplot2`** comes with several other themes which can be useful to quickly change the look of your visualization. The complete list of themes is available at [https://ggplot2.tidyverse.org/reference/ggtheme.html](https://ggplot2.tidyverse.org/reference/ggtheme.html). `theme_minimal()` and `theme_light()` are popular, and `theme_void()` can be useful as a starting point to create a new hand-crafted theme. + +The [ggthemes](https://jrnold.github.io/ggthemes/reference/index.html) package provides a wide variety of options (including Microsoft Excel, [old](https://jrnold.github.io/ggthemes/reference/theme_excel.html) and [new](https://jrnold.github.io/ggthemes/reference/theme_excel_new.html)). The [**`ggplot2`** extensions website](https://exts.ggplot2.tidyverse.org/) provides a list of packages that extend the capabilities of **`ggplot2`**, including additional themes. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +With all of this information in hand, please take another five minutes to +either improve one of the plots generated in this exercise or create a +beautiful graph of your own. Use the RStudio [**`ggplot2`** cheat sheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf) +for inspiration. Here are some ideas: + +- See if you can change the size or shape of the plotting symbol. +- Can you find a way to change the name of the legend? What about its labels? +- Try using a different color palette (see the [Cookbook for R](https://www.cookbook-r.com/Graphs/Colors_\(ggplot2\)/)). + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## More **`ggplot2`** Plots + +**`ggplot2`** offers many more informative and beautiful plots (`geoms`) of interest for biologists (although not covered in this lesson) that are worth exploring, such as + +- [`geom_tile()`](https://ggplot2.tidyverse.org/reference/geom_tile.html), for heatmaps, +- [`geom_jitter()`](https://ggplot2.tidyverse.org/reference/geom_jitter.html), for strip charts, and +- [`geom_violin()`](https://ggplot2.tidyverse.org/reference/geom_violin.html), for violin plots + +## Resources + +- [ggplot2: Elegant Graphics for Data Analysis](https://www.amazon.com/gp/product/331924275X/) ([online version](https://ggplot2-book.org/)) +- [The Grammar of Graphics (Statistics and Computing)](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/) +- [Data Visualization: A Practical Introduction](https://www.amazon.com/gp/product/0691181624/) ([online version](https://socviz.co/)) +- [The R Graph Gallery](https://r-graph-gallery.com/) ([the book](https://bookdown.org/content/b298e479-b1ab-49fa-b83d-a57c2b034d49/)) + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- ggplot2 is a powerful tool for high-quality plots +- ggplot2 provides a flexible and readable grammar to build plots + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/07-r-help.md b/07-r-help.md new file mode 100644 index 000000000..db8c66fb1 --- /dev/null +++ b/07-r-help.md @@ -0,0 +1,186 @@ +--- +title: Getting help with R +teaching: 10 +exercises: 5 +source: Rmd +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Locate help for an R function using `?`, `??`, and `args()` +- Check the version of R +- Be able to ask effective questions when searching for help on forums or using web searches + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How do I get help using R and RStudio? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Getting help with R + +rstudio default session + +No matter how much experience you have with R, you will find yourself +needing help. There is no shame in researching how to do something in R, and +most people will find themselves looking up how to do the same things that +they "should know how to do" over and over again. Here are some tips to make +this process as helpful and efficient as possible. + +> "Never memorize something that you can look up" +> \-- A. Einstein + +## Finding help on Stackoverflow and Biostars + +Two popular websites will be of great help with many R problems. For **general** +**R questions**, [Stack Overflow](https://stackoverflow.com/) is probably the most +popular online community for developers. If you start your question "How to do X +in R" results from Stack Overflow are usually near the top of the list. For +**bioinformatics specific questions**, [Biostars](https://www.biostars.org/) is +a popular online forum. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Tip: Asking for help using online forums: + +- When searching for R help, look for [answers with the 'r' tag](https://stackoverflow.com/questions/tagged/r). +- Get an account; not required to view answers, but you need an account to post +- Put in effort to check thoroughly before you post a question; folks get + annoyed if you ask a very common question that has been answered multiple + times +- Be careful. While forums are very helpful, you can't know for sure if the + advice you are getting is correct +- See the [How to ask for R help](https://blog.revolutionanalytics.com/2014/01/how-to-ask-for-r-help.html) + blog post for more useful tips + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Help people help you + +Often, in order to duplicate the issue you are having, someone may need to see +the data you are working with or verify the versions of R or R packages you +are using. The following R functions will help with this: + +You can **check the version of R** you are working with using the `sessionInfo()` +function. Actually, it is good to save this information as part of your notes +on any analysis you are doing. When you run the same script that has worked fine +a dozen times before, looking back at these notes will remind you that you +upgraded R and forget to check your script. + +``` +sessionInfo() +``` + +``` +R version 3.2.3 (2015-12-10) +Platform: x86_64-pc-linux-gnu (64-bit) +Running under: Ubuntu 14.04.3 LTS + +locale: +[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 +[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 +[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C +[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C + +attached base packages: +[1] stats graphics grDevices utils datasets methods base + +loaded via a namespace (and not attached): +[1] tools_3.2.3 packrat_0.4.9-1 +``` + +Many times, there may be some issues with your data and the way it is formatted. +In that case, you may want to share that data with someone else. However, you +may not need to share the whole dataset; looking at a subset of your 50,000 row, +10,000 column dataframe may be TMI (too much information)! You can take an +object you have in memory such as dataframe (if you don't know what this means +yet, we will get to it!) and save it to a file. In our example we will use the +`dput()` function on the `iris` dataframe which is an example dataset that is +installed in R: + +``` +dput(head(iris)) # iris is an example data.frame that comes with R + # the `head()` function just takes the first 6 lines of the iris dataset +``` + +This generates some output (below) which you will be better able to interpret +after covering the other R lessons. This info would be helpful in understanding +how the data is formatted and possibly revealing problematic issues. + +``` +structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4), + Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9), Petal.Length = c(1.4, + 1.4, 1.3, 1.5, 1.4, 1.7), Petal.Width = c(0.2, 0.2, 0.2, + 0.2, 0.2, 0.4), Species = structure(c(1L, 1L, 1L, 1L, 1L, + 1L), .Label = c("setosa", "versicolor", "virginica"), class = "factor")), .Names = c("Sepal.Length", +"Sepal.Width", "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, +6L), class = "data.frame") +``` + +Alternatively, you can also save objects in R memory to a file by specifying +the name of the object, in this case the `iris` data frame, and passing a +filename to the `file=` argument. + + +```r +saveRDS(iris, file="iris.rds") # By convention, we use the .rds file extension +``` + +## Final FAQs on R + +Finally, here are a few pieces of introductory R knowledge that are too good to +pass up. While we won't return to them in this course, we put them here because +they come up commonly: + +**Do I need to click Run every time I want to run a script?** + +- No. In fact, the most common shortcut key allows you to run a command (or + any lines of the script that are highlighted): + + - Windows execution shortcut: Ctrl\+Enter + - Mac execution shortcut: Cmd(⌘)\+Enter + + To see a complete list of shortcuts, click on the Tools menu and + select Keyboard Shortcuts Help + +**What's with the brackets in R console output?** + +- R returns an index with your result. When your result contains multiple values, + the number tells you what ordinal number begins the line, for example: + + +```r +1:101 # generates the sequence of numbers from 1 to 101 +``` + +```{.output} + [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 + [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 + [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 + [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 + [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 + [91] 91 92 93 94 95 96 97 98 99 100 101 +``` + +In the output above, `[81]` indicates that the first value on that line is the +81st item in your result + +**Can I run my R script without RStudio?** + +- Yes, remember - RStudio is running R. You get to use lots of the enhancements + RStudio provides, but R works independent of RStudio. See [these tips](https://support.rstudio.com/hc/en-us/articles/218012917-How-to-run-R-scripts-from-the-command-line) for running your commands at the command line + +**Where else can I learn about RStudio?** + +- Check out the Help menu, especially "Cheatsheets" section + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- R provides thousands of functions for analyzing data, and provides several way to get help +- Using R will mean searching for online help, and there are tips and resources on how to search effectively + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 000000000..f19b80495 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,13 @@ +--- +title: "Contributor Code of Conduct" +--- + +As contributors and maintainers of this project, +we pledge to follow the [The Carpentries Code of Conduct][coc]. + +Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our [reporting guidelines][coc-reporting]. + + +[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html +[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html diff --git a/Ecoli_metadata.xlsx b/Ecoli_metadata.xlsx new file mode 100644 index 000000000..1df75eb50 Binary files /dev/null and b/Ecoli_metadata.xlsx differ diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 000000000..7632871ff --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,79 @@ +--- +title: "Licenses" +--- + +## Instructional Material + +All Carpentries (Software Carpentry, Data Carpentry, and Library Carpentry) +instructional material is made available under the [Creative Commons +Attribution license][cc-by-human]. The following is a human-readable summary of +(and not a substitute for) the [full legal text of the CC BY 4.0 +license][cc-by-legal]. + +You are free: + +- to **Share**---copy and redistribute the material in any medium or format +- to **Adapt**---remix, transform, and build upon the material + +for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the license +terms. + +Under the following terms: + +- **Attribution**---You must give appropriate credit (mentioning that your work + is derived from work that is Copyright (c) The Carpentries and, where + practical, linking to ), provide a [link to the + license][cc-by-human], and indicate if changes were made. You may do so in + any reasonable manner, but not in any way that suggests the licensor endorses + you or your use. + +- **No additional restrictions**---You may not apply legal terms or + technological measures that legally restrict others from doing anything the + license permits. With the understanding that: + +Notices: + +* You do not have to comply with the license for elements of the material in + the public domain or where your use is permitted by an applicable exception + or limitation. +* No warranties are given. The license may not give you all of the permissions + necessary for your intended use. For example, other rights such as publicity, + privacy, or moral rights may limit how you use the material. + +## Software + +Except where otherwise noted, the example programs and other software provided +by The Carpentries are made available under the [OSI][osi]-approved [MIT +license][mit-license]. + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +## Trademark + +"The Carpentries", "Software Carpentry", "Data Carpentry", and "Library +Carpentry" and their respective logos are registered trademarks of [Community +Initiatives][ci]. + +[cc-by-human]: https://creativecommons.org/licenses/by/4.0/ +[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode +[mit-license]: https://opensource.org/licenses/mit-license.html +[ci]: https://communityin.org/ +[osi]: https://opensource.org diff --git a/config.yaml b/config.yaml new file mode 100644 index 000000000..d35bbb686 --- /dev/null +++ b/config.yaml @@ -0,0 +1,88 @@ +#------------------------------------------------------------ +# Values for this lesson. +#------------------------------------------------------------ + +# Which carpentry is this (swc, dc, lc, or cp)? +# swc: Software Carpentry +# dc: Data Carpentry +# lc: Library Carpentry +# cp: Carpentries (to use for instructor training for instance) +# incubator: The Carpentries Incubator +carpentry: 'dc' + +# Overall title for pages. +title: 'Intro to R and RStudio for Genomics' + +# Date the lesson was created (YYYY-MM-DD, this is empty by default) +created: '2018-03-12' + +# Comma-separated list of keywords for the lesson +keywords: 'software, data, lesson, The Carpentries' + +# Life cycle stage of the lesson +# possible values: pre-alpha, alpha, beta, stable +life_cycle: 'beta' + +# License of the lesson materials (recommended CC-BY 4.0) +license: 'CC-BY 4.0' + +# Link to the source repository for this lesson +source: 'https://github.com/datacarpentry/genomics-r-intro' + +# Default branch of your lesson +branch: 'main' + +# Who to contact if there are any issues +contact: 'team@carpentries.org' + +# Navigation ------------------------------------------------ +# +# Use the following menu items to specify the order of +# individual pages in each dropdown section. Leave blank to +# include all pages in the folder. +# +# Example ------------- +# +# episodes: +# - introduction.md +# - first-steps.md +# +# learners: +# - setup.md +# +# instructors: +# - instructor-notes.md +# +# profiles: +# - one-learner.md +# - another-learner.md + +# Order of episodes in your lesson +episodes: +- 00-introduction.Rmd +- 01-r-basics.Rmd +- 02-data-prelude.Rmd +- 03-basics-factors-dataframes.Rmd +- 04-bioconductor-vcfr.Rmd +- 05-dplyr.Rmd +- 06-data-visualization.Rmd +- 07-r-help.Rmd + +# Information for Learners +learners: + +# Information for Instructors +instructors: + +# Learner Profiles +profiles: + +# Customisation --------------------------------------------- +# +# This space below is where custom yaml items (e.g. pinning +# sandpaper and varnish versions) should live + + +url: 'https://datacarpentry.github.io/genomics-r-intro' +analytics: carpentries +lang: en diff --git a/data/SRR2584863_variants.csv b/data/SRR2584863_variants.csv new file mode 100644 index 000000000..043ecd844 --- /dev/null +++ b/data/SRR2584863_variants.csv @@ -0,0 +1,26 @@ +"","sample_id","CHROM","POS","ID","REF","ALT","QUAL","FILTER","INDEL","IDV","IMF","DP","VDB","RPB","MQB","BQB","MQSB","SGB","MQ0F","ICB","HOB","AC","AN","DP4","MQ","Indiv","gt_PL","gt_GT","gt_GT_alleles" +"1","SRR2584863","CP000819.1",9972,NA,"T","G",91,NA,FALSE,NA,NA,4,0.0257451,NA,NA,NA,NA,-0.556411,0,NA,NA,1,1,"0,0,0,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","121,0",1,"G" +"2","SRR2584863","CP000819.1",263235,NA,"G","T",85,NA,FALSE,NA,NA,6,0.096133,1,1,1,NA,-0.590765,0.166667,NA,NA,1,1,"0,1,0,5",33,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","112,0",1,"T" +"3","SRR2584863","CP000819.1",281923,NA,"G","T",217,NA,FALSE,NA,NA,10,0.774083,NA,NA,NA,0.974597,-0.662043,0,NA,NA,1,1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","247,0",1,"T" +"4","SRR2584863","CP000819.1",433359,NA,"CTTTTTTT","CTTTTTTTT",64,NA,TRUE,12,1,12,0.477704,NA,NA,NA,1,-0.676189,0,NA,NA,1,1,"0,1,3,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","91,0",1,"CTTTTTTTT" +"5","SRR2584863","CP000819.1",473901,NA,"CCGC","CCGCGC",228,NA,TRUE,9,0.9,10,0.659505,NA,NA,NA,0.916482,-0.662043,0,NA,NA,1,1,"1,0,2,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"CCGCGC" +"6","SRR2584863","CP000819.1",648692,NA,"C","T",210,NA,FALSE,NA,NA,10,0.268014,NA,NA,NA,0.916482,-0.670168,0,NA,NA,1,1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","240,0",1,"T" +"7","SRR2584863","CP000819.1",1331794,NA,"C","A",178,NA,FALSE,NA,NA,8,0.624078,NA,NA,NA,0.900802,-0.651104,0,NA,NA,1,1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","208,0",1,"A" +"8","SRR2584863","CP000819.1",1733343,NA,"G","A",225,NA,FALSE,NA,NA,11,0.992403,NA,NA,NA,1.00775,-0.670168,0,NA,NA,1,1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"A" +"9","SRR2584863","CP000819.1",2103887,NA,"ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG","ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG",56,NA,TRUE,2,0.666667,3,0.901652,NA,NA,NA,1,-0.453602,0,NA,NA,1,1,"0,1,1,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","111,28",1,"ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG" +"10","SRR2584863","CP000819.1",2333538,NA,"AT","ATT",167,NA,TRUE,7,1,7,0.568173,NA,NA,NA,1.01283,-0.616816,0,NA,NA,1,1,"0,1,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","194,0",1,"ATT" +"11","SRR2584863","CP000819.1",2407766,NA,"A","C",104,NA,FALSE,NA,NA,9,0.0230738,0.900802,0.150134,0.750668,0.5,-0.590765,0.333333,NA,NA,1,1,"3,0,3,2",25,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","131,0",1,"C" +"12","SRR2584863","CP000819.1",2446984,NA,"A","C",225,NA,FALSE,NA,NA,20,0.0714027,NA,NA,NA,1,-0.689466,0,NA,NA,1,1,"0,0,10,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"C" +"13","SRR2584863","CP000819.1",2618472,NA,"G","T",31.8923,NA,FALSE,NA,NA,12,0.148533,0.954207,0.0497871,0.774755,0.655816,-0.511536,0.666667,NA,NA,1,1,"3,5,0,3",10,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","60,1",1,"T" +"14","SRR2584863","CP000819.1",2665639,NA,"A","T",225,NA,FALSE,NA,NA,19,0.996039,NA,NA,NA,1,-0.690438,0,NA,NA,1,1,"0,0,12,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"T" +"15","SRR2584863","CP000819.1",2999330,NA,"G","A",225,NA,FALSE,NA,NA,15,0.376737,NA,NA,NA,1,-0.686358,0,NA,NA,1,1,"0,0,6,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"A" +"16","SRR2584863","CP000819.1",3339313,NA,"A","C",211,NA,FALSE,NA,NA,10,0.405936,NA,NA,NA,1.00775,-0.670168,0,NA,NA,1,1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","241,0",1,"C" +"17","SRR2584863","CP000819.1",3401754,NA,"C","A",225,NA,FALSE,NA,NA,14,0.168008,NA,NA,NA,0.95494,-0.680642,0,NA,NA,1,1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"A" +"18","SRR2584863","CP000819.1",3481820,NA,"A","G",200,NA,FALSE,NA,NA,9,0.107081,NA,NA,NA,0.974597,-0.662043,0,NA,NA,1,1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","230,0",1,"G" +"19","SRR2584863","CP000819.1",3488669,NA,"A","C",225,NA,FALSE,NA,NA,13,0.0162706,NA,NA,NA,1,-0.680642,0,NA,NA,1,1,"0,0,8,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"C" +"20","SRR2584863","CP000819.1",3901455,NA,"A","AC",43.4147,NA,TRUE,2,1,2,0.02,NA,NA,NA,NA,-0.453602,0,NA,NA,1,1,"0,0,2,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","73,0",1,"AC" +"21","SRR2584863","CP000819.1",3909807,NA,"G","T",225,NA,FALSE,NA,NA,10,0.0698567,NA,NA,NA,0.974597,-0.662043,0,NA,NA,1,1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"T" +"22","SRR2584863","CP000819.1",4100183,NA,"A","G",225,NA,FALSE,NA,NA,16,0.828693,NA,NA,NA,1,-0.689466,0,NA,NA,1,1,"0,0,12,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"G" +"23","SRR2584863","CP000819.1",4201958,NA,"A","C",225,NA,FALSE,NA,NA,11,0.0491808,NA,NA,NA,0.974597,-0.662043,0,NA,NA,1,1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"C" +"24","SRR2584863","CP000819.1",4431393,NA,"TGG","T",228,NA,TRUE,10,1,10,0.86321,NA,NA,NA,1.00775,-0.662043,0,NA,NA,1,1,"0,1,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0",1,"T" +"25","SRR2584863","CP000819.1",4616538,NA,"A","C",185,NA,FALSE,NA,NA,9,0.575705,NA,NA,NA,0.992367,-0.651104,0,NA,NA,1,1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","215,0",1,"C" diff --git a/data/combined_tidy_vcf.csv b/data/combined_tidy_vcf.csv new file mode 100644 index 000000000..ebadf8f8d --- /dev/null +++ b/data/combined_tidy_vcf.csv @@ -0,0 +1,802 @@ +"sample_id","CHROM","POS","ID","REF","ALT","QUAL","FILTER","INDEL","IDV","IMF","DP","VDB","RPB","MQB","BQB","MQSB","SGB","MQ0F","ICB","HOB","AC","AN","DP4","MQ","Indiv","gt_PL","gt_GT","gt_GT_alleles" +"SRR2584863","CP000819.1",9972,NA,"T","G",91,NA,FALSE,NA,NA,4,0.0257451,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,0,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","121,0","1","G" +"SRR2584863","CP000819.1",263235,NA,"G","T",85,NA,FALSE,NA,NA,6,0.096133,1,1,1,NA,-0.590765,0.166667,NA,NA,"1",1,"0,1,0,5",33,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","112,0","1","T" +"SRR2584863","CP000819.1",281923,NA,"G","T",217,NA,FALSE,NA,NA,10,0.774083,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","247,0","1","T" +"SRR2584863","CP000819.1",433359,NA,"CTTTTTTT","CTTTTTTTT",64,NA,TRUE,12,1,12,0.477704,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,1,3,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","91,0","1","CTTTTTTTT" +"SRR2584863","CP000819.1",473901,NA,"CCGC","CCGCGC",228,NA,TRUE,9,0.9,10,0.659505,NA,NA,NA,0.916482,-0.662043,0,NA,NA,"1",1,"1,0,2,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","CCGCGC" +"SRR2584863","CP000819.1",648692,NA,"C","T",210,NA,FALSE,NA,NA,10,0.268014,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","240,0","1","T" +"SRR2584863","CP000819.1",1331794,NA,"C","A",178,NA,FALSE,NA,NA,8,0.624078,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","208,0","1","A" +"SRR2584863","CP000819.1",1733343,NA,"G","A",225,NA,FALSE,NA,NA,11,0.992403,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","A" +"SRR2584863","CP000819.1",2103887,NA,"ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG","ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG",56,NA,TRUE,2,0.666667,3,0.901652,NA,NA,NA,1,-0.453602,0,NA,NA,"1",1,"0,1,1,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","111,28","1","ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG" +"SRR2584863","CP000819.1",2333538,NA,"AT","ATT",167,NA,TRUE,7,1,7,0.568173,NA,NA,NA,1.01283,-0.616816,0,NA,NA,"1",1,"0,1,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","194,0","1","ATT" +"SRR2584863","CP000819.1",2407766,NA,"A","C",104,NA,FALSE,NA,NA,9,0.0230738,0.900802,0.150134,0.750668,0.5,-0.590765,0.333333,NA,NA,"1",1,"3,0,3,2",25,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","131,0","1","C" +"SRR2584863","CP000819.1",2446984,NA,"A","C",225,NA,FALSE,NA,NA,20,0.0714027,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,10,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","C" +"SRR2584863","CP000819.1",2618472,NA,"G","T",31.8923,NA,FALSE,NA,NA,12,0.148533,0.954207,0.0497871,0.774755,0.655816,-0.511536,0.666667,NA,NA,"1",1,"3,5,0,3",10,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","60,1","1","T" +"SRR2584863","CP000819.1",2665639,NA,"A","T",225,NA,FALSE,NA,NA,19,0.996039,NA,NA,NA,1,-0.690438,0,NA,NA,"1",1,"0,0,12,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","T" +"SRR2584863","CP000819.1",2999330,NA,"G","A",225,NA,FALSE,NA,NA,15,0.376737,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,6,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","A" +"SRR2584863","CP000819.1",3339313,NA,"A","C",211,NA,FALSE,NA,NA,10,0.405936,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","241,0","1","C" +"SRR2584863","CP000819.1",3401754,NA,"C","A",225,NA,FALSE,NA,NA,14,0.168008,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","A" +"SRR2584863","CP000819.1",3481820,NA,"A","G",200,NA,FALSE,NA,NA,9,0.107081,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","230,0","1","G" +"SRR2584863","CP000819.1",3488669,NA,"A","C",225,NA,FALSE,NA,NA,13,0.0162706,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,8,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","C" +"SRR2584863","CP000819.1",3901455,NA,"A","AC",43.4147,NA,TRUE,2,1,2,0.02,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,2,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","73,0","1","AC" +"SRR2584863","CP000819.1",3909807,NA,"G","T",225,NA,FALSE,NA,NA,10,0.0698567,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","T" +"SRR2584863","CP000819.1",4100183,NA,"A","G",225,NA,FALSE,NA,NA,16,0.828693,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,12,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","G" +"SRR2584863","CP000819.1",4201958,NA,"A","C",225,NA,FALSE,NA,NA,11,0.0491808,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","C" +"SRR2584863","CP000819.1",4431393,NA,"TGG","T",228,NA,TRUE,10,1,10,0.86321,NA,NA,NA,1.00775,-0.662043,0,NA,NA,"1",1,"0,1,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","255,0","1","T" +"SRR2584863","CP000819.1",4616538,NA,"A","C",185,NA,FALSE,NA,NA,9,0.575705,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584863.aligned.sorted.bam","215,0","1","C" +"SRR2584866","CP000819.1",1521,NA,"C","T",207,NA,FALSE,NA,NA,9,0.993024,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","T" +"SRR2584866","CP000819.1",1612,NA,"A","G",225,NA,FALSE,NA,NA,13,0.52194,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",9092,NA,"A","G",225,NA,FALSE,NA,NA,14,0.717543,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",9972,NA,"T","G",214,NA,FALSE,NA,NA,10,0.022095,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,2,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","244,0","1","G" +"SRR2584866","CP000819.1",10563,NA,"G","A",225,NA,FALSE,NA,NA,11,0.958658,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",22257,NA,"C","T",127,NA,FALSE,NA,NA,5,0.0765947,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,2,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","157,0","1","T" +"SRR2584866","CP000819.1",38971,NA,"A","G",225,NA,FALSE,NA,NA,14,0.872139,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,4,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",42306,NA,"A","G",225,NA,FALSE,NA,NA,15,0.969686,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,5,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",45277,NA,"A","G",225,NA,FALSE,NA,NA,15,0.470998,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",56613,NA,"C","G",183,NA,FALSE,NA,NA,12,0.879703,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","213,0","1","G" +"SRR2584866","CP000819.1",62118,NA,"A","G",225,NA,FALSE,NA,NA,19,0.414981,NA,NA,NA,0.906029,-0.691153,0,NA,NA,"1",1,"0,0,8,10",59,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",64042,NA,"G","A",225,NA,FALSE,NA,NA,18,0.451328,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,7,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",78808,NA,"C","T",225,NA,FALSE,NA,NA,23,0.885435,NA,NA,NA,1,-0.691153,0,NA,NA,"1",1,"0,0,13,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",80113,NA,"A","G",165,NA,FALSE,NA,NA,9,0.989691,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","195,0","1","G" +"SRR2584866","CP000819.1",81158,NA,"A","C",225,NA,FALSE,NA,NA,13,0.468318,NA,NA,NA,0.982603,-0.680642,0,NA,NA,"1",1,"0,0,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",87462,NA,"A","G",225,NA,FALSE,NA,NA,10,0.345209,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",94370,NA,"A","G",225,NA,FALSE,NA,NA,11,0.942893,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",98286,NA,"C","T",130,NA,FALSE,NA,NA,7,0.612254,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","160,0","1","T" +"SRR2584866","CP000819.1",98404,NA,"G","A",225,NA,FALSE,NA,NA,14,0.347065,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,8,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",105581,NA,"G","A",225,NA,FALSE,NA,NA,13,0.379887,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",124045,NA,"A","G",225,NA,FALSE,NA,NA,13,0.598947,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",125741,NA,"T","C",224,NA,FALSE,NA,NA,11,0.152401,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","254,0","1","C" +"SRR2584866","CP000819.1",140180,NA,"A","G",163,NA,FALSE,NA,NA,6,0.808637,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","193,0","1","G" +"SRR2584866","CP000819.1",146944,NA,"T","C",185,NA,FALSE,NA,NA,9,0.659709,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,1,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","215,0","1","C" +"SRR2584866","CP000819.1",148134,NA,"AGGGG","AGGGGG",122,NA,TRUE,8,1,8,0.0462387,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","152,0","1","AGGGGG" +"SRR2584866","CP000819.1",157998,NA,"GTTTTTTTTT","GTTTTTTTT",19.4636,NA,TRUE,6,1,6,0.358309,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,1,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","49,0","1","GTTTTTTTT" +"SRR2584866","CP000819.1",159875,NA,"A","G",142,NA,FALSE,NA,NA,7,0.582309,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,1,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","172,0","1","G" +"SRR2584866","CP000819.1",166078,NA,"A","G",90,NA,FALSE,NA,NA,4,0.0587288,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","120,0","1","G" +"SRR2584866","CP000819.1",171422,NA,"C","T",225,NA,FALSE,NA,NA,13,0.957166,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",172553,NA,"CAA","CA",180,NA,TRUE,11,1,11,0.293805,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,1,8,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","207,0","1","CA" +"SRR2584866","CP000819.1",175213,NA,"GAA","GA",194,NA,TRUE,10,1,10,0.957887,NA,NA,NA,0.952347,-0.662043,0,NA,NA,"1",1,"1,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","221,0","1","GA" +"SRR2584866","CP000819.1",180502,NA,"C","T",192,NA,FALSE,NA,NA,8,0.0555824,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","222,0","1","T" +"SRR2584866","CP000819.1",184752,NA,"C","T",149,NA,FALSE,NA,NA,6,0.0229453,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","179,0","1","T" +"SRR2584866","CP000819.1",189171,NA,"C","T",172,NA,FALSE,NA,NA,7,0.405339,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","202,0","1","T" +"SRR2584866","CP000819.1",191724,NA,"C","T",198,NA,FALSE,NA,NA,9,0.240712,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","228,0","1","T" +"SRR2584866","CP000819.1",197681,NA,"CTTTTTTTT","CTTTTTTTTT",50,NA,TRUE,13,1,13,0.283476,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,9,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","80,0","1","CTTTTTTTTT" +"SRR2584866","CP000819.1",213751,NA,"A","G",194,NA,FALSE,NA,NA,8,0.138772,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","224,0","1","G" +"SRR2584866","CP000819.1",216480,NA,"C","T",67,NA,FALSE,NA,NA,3,0.851467,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,1,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","97,0","1","T" +"SRR2584866","CP000819.1",228771,NA,"CGGGGG","CGGGGGG",39.1861,NA,TRUE,5,0.625,8,0.327269,NA,NA,NA,0.833333,-0.590765,0.375,NA,NA,"1",1,"1,2,1,4",25,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","66,0","1","CGGGGGG" +"SRR2584866","CP000819.1",233686,NA,"CTTTTT","CTTTTTT",23.8812,NA,TRUE,8,0.888889,9,0.721112,NA,NA,NA,0.974597,-0.590765,0,NA,NA,"1",1,"0,4,4,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","57,6","1","CTTTTTT" +"SRR2584866","CP000819.1",235408,NA,"T","C",225,NA,FALSE,NA,NA,11,0.644273,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",235496,NA,"C","T",161,NA,FALSE,NA,NA,7,0.787989,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","191,0","1","T" +"SRR2584866","CP000819.1",241885,NA,"G","A",72,NA,FALSE,NA,NA,10,0.964647,0.900802,0.150134,0.300267,0.300267,-0.590765,0.3,NA,NA,"1",1,"1,2,4,1",20,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","99,0","1","A" +"SRR2584866","CP000819.1",241950,NA,"T","A",155,NA,FALSE,NA,NA,13,0.29154,0.124514,0.0497871,0.152558,0.643095,-0.651104,0.230769,NA,NA,"1",1,"0,3,4,4",34,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","182,0","1","A" +"SRR2584866","CP000819.1",247796,NA,"T","C",225,NA,FALSE,NA,NA,12,0.547491,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",276145,NA,"C","T",139,NA,FALSE,NA,NA,7,0.145389,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","169,0","1","T" +"SRR2584866","CP000819.1",281923,NA,"G","T",225,NA,FALSE,NA,NA,14,0.641962,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",294206,NA,"TGGGGGGG","TGGGGGGGGG",100,NA,TRUE,8,0.8,10,0.855481,NA,NA,NA,1.00775,-0.651104,0,NA,NA,"1",1,"0,2,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","151,24","1","TGGGGGGGGG" +"SRR2584866","CP000819.1",299195,NA,"G","A",130,NA,FALSE,NA,NA,6,0.882013,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","160,0","1","A" +"SRR2584866","CP000819.1",306404,NA,"C","T",195,NA,FALSE,NA,NA,8,0.519034,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","225,0","1","T" +"SRR2584866","CP000819.1",326689,NA,"C","T",219,NA,FALSE,NA,NA,12,0.102415,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","249,0","1","T" +"SRR2584866","CP000819.1",337563,NA,"C","T",225,NA,FALSE,NA,NA,10,0.0663569,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",338770,NA,"T","C",149,NA,FALSE,NA,NA,7,0.86799,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","179,0","1","C" +"SRR2584866","CP000819.1",352369,NA,"AGGGGGGGG","AGGGGGGGGG",36.2447,NA,TRUE,16,0.941176,17,0.108809,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"1,1,12,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","63,0","1","AGGGGGGGGG" +"SRR2584866","CP000819.1",357611,NA,"T","C",102,NA,FALSE,NA,NA,4,0.984512,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,3,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","132,0","1","C" +"SRR2584866","CP000819.1",360604,NA,"TGGGGGG","TGGGGGGG",85,NA,TRUE,12,0.923077,13,0.935368,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"1,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","112,0","1","TGGGGGGG" +"SRR2584866","CP000819.1",365047,NA,"A","G",199,NA,FALSE,NA,NA,9,0.204971,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","229,0","1","G" +"SRR2584866","CP000819.1",369026,NA,"A","G",208,NA,FALSE,NA,NA,9,0.00695702,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","238,0","1","G" +"SRR2584866","CP000819.1",371056,NA,"G","T",225,NA,FALSE,NA,NA,11,0.317696,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,4,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",373014,NA,"C","T",211,NA,FALSE,NA,NA,9,0.984435,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","241,0","1","T" +"SRR2584866","CP000819.1",376267,NA,"G","A",167,NA,FALSE,NA,NA,7,0.311141,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","197,0","1","A" +"SRR2584866","CP000819.1",377000,NA,"T","G",55,NA,FALSE,NA,NA,6,0.142856,0.28717,0.28717,0.861511,0.28717,-0.511536,0.5,NA,NA,"1",1,"3,0,0,3",20,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","82,0","1","G" +"SRR2584866","CP000819.1",388160,NA,"C","T",225,NA,FALSE,NA,NA,15,0.987773,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,7,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",398061,NA,"G","C",122,NA,FALSE,NA,NA,7,0.794767,NA,NA,NA,NA,-0.590765,0,NA,NA,"1",1,"0,0,5,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","152,0","1","C" +"SRR2584866","CP000819.1",412637,NA,"C","T",87,NA,FALSE,NA,NA,4,0.118267,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,0,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","117,0","1","T" +"SRR2584866","CP000819.1",417363,NA,"C","T",202,NA,FALSE,NA,NA,9,0.202829,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,2,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","232,0","1","T" +"SRR2584866","CP000819.1",426451,NA,"A","G",225,NA,FALSE,NA,NA,15,0.761693,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,12,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",430835,NA,"C","T",228,NA,FALSE,NA,NA,16,0.0042076,0,0,0.909091,0.899815,-0.676189,0.125,NA,NA,"1",1,"0,2,6,5",49,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",433561,NA,"C","T",184,NA,FALSE,NA,NA,9,0.499424,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","214,0","1","T" +"SRR2584866","CP000819.1",447290,NA,"T","C",118,NA,FALSE,NA,NA,6,0.874813,NA,NA,NA,NA,-0.616816,0,NA,NA,"1",1,"0,0,0,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","148,0","1","C" +"SRR2584866","CP000819.1",448431,NA,"A","G",169,NA,FALSE,NA,NA,7,0.838194,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","199,0","1","G" +"SRR2584866","CP000819.1",449902,NA,"C","T",190,NA,FALSE,NA,NA,10,0.782803,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","220,0","1","T" +"SRR2584866","CP000819.1",451177,NA,"CAAAAAAA","CAAAAAAAA",56,NA,TRUE,11,0.846154,13,0.983228,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,1,9,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","83,0","1","CAAAAAAAA" +"SRR2584866","CP000819.1",461169,NA,"C","G",225,NA,FALSE,NA,NA,11,0.707824,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",462346,NA,"C","T",225,NA,FALSE,NA,NA,11,0.520343,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",466727,NA,"G","A",206,NA,FALSE,NA,NA,11,0.152186,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,8,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","236,0","1","A" +"SRR2584866","CP000819.1",473901,NA,"CCGC","CCGCGC",226,NA,TRUE,6,1,6,0.241952,NA,NA,NA,0.861511,-0.590765,0,NA,NA,"1",1,"0,1,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","253,0","1","CCGCGC" +"SRR2584866","CP000819.1",480856,NA,"C","T",166,NA,FALSE,NA,NA,11,0.187989,1,1,1,0.783809,-0.662043,0.0909091,NA,NA,"1",1,"0,1,6,3",41,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","193,0","1","T" +"SRR2584866","CP000819.1",488116,NA,"A","G",225,NA,FALSE,NA,NA,14,0.493731,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",491445,NA,"C","T",222,NA,FALSE,NA,NA,13,0.0859232,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,5,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","252,0","1","T" +"SRR2584866","CP000819.1",500697,NA,"C","T",204,NA,FALSE,NA,NA,8,0.122391,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","234,0","1","T" +"SRR2584866","CP000819.1",511535,NA,"A","G",225,NA,FALSE,NA,NA,17,0.768298,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,8,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",511618,NA,"C","T",225,NA,FALSE,NA,NA,12,0.552188,NA,NA,NA,0.8618,-0.676189,0,NA,NA,"1",1,"0,0,5,6",59,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",522180,NA,"A","G",188,NA,FALSE,NA,NA,12,0.456924,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,9,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","218,0","1","G" +"SRR2584866","CP000819.1",621324,NA,"A","G",225,NA,FALSE,NA,NA,15,0.729071,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,6,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",622005,NA,"A","G",222,NA,FALSE,NA,NA,11,0.224432,NA,NA,NA,0.895781,-0.670168,0,NA,NA,"1",1,"0,0,4,6",58,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","252,0","1","G" +"SRR2584866","CP000819.1",627610,NA,"G","A",98,NA,FALSE,NA,NA,79,0.947791,0.420317,1,0.115345,1,-0.693147,0,NA,NA,"1",1,"5,5,27,26",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,130","1","A" +"SRR2584866","CP000819.1",630279,NA,"C","T",213,NA,FALSE,NA,NA,10,0.021254,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","243,0","1","T" +"SRR2584866","CP000819.1",631472,NA,"C","T",156,NA,FALSE,NA,NA,6,0.417842,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","186,0","1","T" +"SRR2584866","CP000819.1",634603,NA,"A","G",5.60573,NA,FALSE,NA,NA,4,0.5,1,1,1,1,-0.453602,0.5,NA,NA,"1",1,"0,1,1,1",13,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","31,0","1","G" +"SRR2584866","CP000819.1",635244,NA,"C","T",215,NA,FALSE,NA,NA,10,0.251186,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","245,0","1","T" +"SRR2584866","CP000819.1",636294,NA,"AGGGGG","AGGGGGGG",91,NA,TRUE,7,0.7,10,0.597417,NA,NA,NA,0.916482,-0.636426,0,NA,NA,"1",1,"1,2,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","191,73","1","AGGGGGGG" +"SRR2584866","CP000819.1",642275,NA,"A","G",185,NA,FALSE,NA,NA,10,0.0802296,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,8,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","215,0","1","G" +"SRR2584866","CP000819.1",648692,NA,"C","T",222,NA,FALSE,NA,NA,11,0.538025,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","252,0","1","T" +"SRR2584866","CP000819.1",664684,NA,"GTTTT","GTTT",127,NA,TRUE,15,0.882353,17,0.151447,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,4,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","163,9","1","GTTT" +"SRR2584866","CP000819.1",668391,NA,"C","T",205,NA,FALSE,NA,NA,9,0.471462,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","235,0","1","T" +"SRR2584866","CP000819.1",670500,NA,"C","T",225,NA,FALSE,NA,NA,11,0.965433,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",679395,NA,"C","T",159,NA,FALSE,NA,NA,7,0.205815,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","189,0","1","T" +"SRR2584866","CP000819.1",680606,NA,"G","A",173,NA,FALSE,NA,NA,7,0.561812,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","203,0","1","A" +"SRR2584866","CP000819.1",683034,NA,"C","T",225,NA,FALSE,NA,NA,17,0.239607,NA,NA,NA,0.966012,-0.686358,0,NA,NA,"1",1,"0,0,7,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",690853,NA,"A","G",225,NA,FALSE,NA,NA,13,0.494762,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",692880,NA,"A","G",215,NA,FALSE,NA,NA,9,0.757426,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","245,0","1","G" +"SRR2584866","CP000819.1",698396,NA,"A","G",161,NA,FALSE,NA,NA,8,0.0474063,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,1,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","191,0","1","G" +"SRR2584866","CP000819.1",703730,NA,"C","T",148,NA,FALSE,NA,NA,6,0.247382,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","178,0","1","T" +"SRR2584866","CP000819.1",706523,NA,"C","T",197,NA,FALSE,NA,NA,9,0.622626,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","227,0","1","T" +"SRR2584866","CP000819.1",708824,NA,"GTTTTTTT","GTTTTTT",58,NA,TRUE,10,1,10,0.522431,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,2,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","88,0","1","GTTTTTT" +"SRR2584866","CP000819.1",710344,NA,"T","C",225,NA,FALSE,NA,NA,13,0.675773,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,9,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",712463,NA,"C","T",193,NA,FALSE,NA,NA,11,0.232284,NA,NA,NA,0.503877,-0.670168,0,NA,NA,"1",1,"0,0,6,4",38,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","T" +"SRR2584866","CP000819.1",713294,NA,"A","G",225,NA,FALSE,NA,NA,12,0.0190538,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",721872,NA,"C","T",193,NA,FALSE,NA,NA,11,0.687358,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","T" +"SRR2584866","CP000819.1",722645,NA,"T","C",181,NA,FALSE,NA,NA,12,0.866441,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","211,0","1","C" +"SRR2584866","CP000819.1",723042,NA,"C","T",102,NA,FALSE,NA,NA,4,0.740842,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","132,0","1","T" +"SRR2584866","CP000819.1",723449,NA,"T","C",156,NA,FALSE,NA,NA,9,0.915004,NA,NA,NA,NA,-0.651104,0,NA,NA,"1",1,"0,0,8,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","186,0","1","C" +"SRR2584866","CP000819.1",730596,NA,"C","T",120,NA,FALSE,NA,NA,5,0.764622,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","150,0","1","T" +"SRR2584866","CP000819.1",731936,NA,"A","G",190,NA,FALSE,NA,NA,8,0.796257,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","220,0","1","G" +"SRR2584866","CP000819.1",734488,NA,"C","T",225,NA,FALSE,NA,NA,13,0.866335,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",734890,NA,"C","T",179,NA,FALSE,NA,NA,9,0.711174,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,2,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","209,0","1","T" +"SRR2584866","CP000819.1",735775,NA,"C","T",225,NA,FALSE,NA,NA,15,0.418727,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,9,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",746323,NA,"C","T",193,NA,FALSE,NA,NA,11,0.828412,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","T" +"SRR2584866","CP000819.1",764493,NA,"C","T",162,NA,FALSE,NA,NA,7,0.568173,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","192,0","1","T" +"SRR2584866","CP000819.1",765234,NA,"A","G",150,NA,FALSE,NA,NA,6,0.28128,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","180,0","1","G" +"SRR2584866","CP000819.1",767233,NA,"C","T",204,NA,FALSE,NA,NA,10,0.89913,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","234,0","1","T" +"SRR2584866","CP000819.1",767369,NA,"C","T",225,NA,FALSE,NA,NA,12,0.924608,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",768258,NA,"T","C",225,NA,FALSE,NA,NA,14,0.274834,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,5,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",771216,NA,"A","G",137,NA,FALSE,NA,NA,6,0.361542,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","167,0","1","G" +"SRR2584866","CP000819.1",788390,NA,"T","C",207,NA,FALSE,NA,NA,9,0.188618,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","C" +"SRR2584866","CP000819.1",804446,NA,"C","T",225,NA,FALSE,NA,NA,13,0.316685,NA,NA,NA,0.982603,-0.680642,0,NA,NA,"1",1,"0,0,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",817224,NA,"A","G",225,NA,FALSE,NA,NA,15,0.622511,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,9,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",817276,NA,"C","T",225,NA,FALSE,NA,NA,14,0.107516,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",819594,NA,"A","G",189,NA,FALSE,NA,NA,8,0.618523,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","219,0","1","G" +"SRR2584866","CP000819.1",821131,NA,"A","G",225,NA,FALSE,NA,NA,11,0.546508,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",822389,NA,"T","C",146,NA,FALSE,NA,NA,6,0.262259,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","176,0","1","C" +"SRR2584866","CP000819.1",828156,NA,"C","T",166,NA,FALSE,NA,NA,7,0.0539735,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","196,0","1","T" +"SRR2584866","CP000819.1",829428,NA,"G","A",178,NA,FALSE,NA,NA,8,0.119488,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","208,0","1","A" +"SRR2584866","CP000819.1",830245,NA,"C","T",145,NA,FALSE,NA,NA,6,0.670308,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","175,0","1","T" +"SRR2584866","CP000819.1",830946,NA,"TGGGGG","TGGGGGG",46.047,NA,TRUE,7,0.875,8,0.251998,NA,NA,NA,0.992367,-0.616816,0,NA,NA,"1",1,"0,2,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","75,2","1","TGGGGGG" +"SRR2584866","CP000819.1",831556,NA,"A","G",176,NA,FALSE,NA,NA,8,0.0445763,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,4,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","206,0","1","G" +"SRR2584866","CP000819.1",845650,NA,"G","A",225,NA,FALSE,NA,NA,11,0.432988,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",849483,NA,"C","T",179,NA,FALSE,NA,NA,9,0.156836,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,2,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","209,0","1","T" +"SRR2584866","CP000819.1",863736,NA,"T","C",225,NA,FALSE,NA,NA,15,0.749111,NA,NA,NA,0.982603,-0.680642,0,NA,NA,"1",1,"0,0,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",865093,NA,"A","G",131,NA,FALSE,NA,NA,9,0.0946644,NA,NA,NA,NA,-0.651104,0,NA,NA,"1",1,"0,0,8,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","161,0","1","G" +"SRR2584866","CP000819.1",878314,NA,"T","C",225,NA,FALSE,NA,NA,13,0.152536,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",879705,NA,"C","T",225,NA,FALSE,NA,NA,11,0.640271,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",887754,NA,"C","T",225,NA,FALSE,NA,NA,11,0.175997,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",888061,NA,"C","T",67,NA,FALSE,NA,NA,3,0.327477,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,2,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","97,0","1","T" +"SRR2584866","CP000819.1",890217,NA,"C","T",69,NA,FALSE,NA,NA,3,0.5,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,2,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","99,0","1","T" +"SRR2584866","CP000819.1",894198,NA,"C","T",178,NA,FALSE,NA,NA,10,0.711173,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","208,0","1","T" +"SRR2584866","CP000819.1",901822,NA,"TGGGGGG","TGGGGGGGG",170,NA,TRUE,10,0.909091,11,0.602366,NA,NA,NA,0.964642,-0.670168,0,NA,NA,"1",1,"0,1,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","197,0","1","TGGGGGGGG" +"SRR2584866","CP000819.1",904606,NA,"C","T",152,NA,FALSE,NA,NA,7,0.228666,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","182,0","1","T" +"SRR2584866","CP000819.1",906793,NA,"C","T",186,NA,FALSE,NA,NA,8,0.179363,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","216,0","1","T" +"SRR2584866","CP000819.1",914244,NA,"A","G",60,NA,FALSE,NA,NA,4,0.917785,NA,NA,NA,NA,-0.511536,0,NA,NA,"1",1,"0,0,0,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","90,0","1","G" +"SRR2584866","CP000819.1",917183,NA,"AGGGGGG","AGGGGGGG",54,NA,TRUE,9,0.9,10,0.783268,NA,NA,NA,1.00775,-0.651104,0,NA,NA,"1",1,"0,2,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","81,0","1","AGGGGGGG" +"SRR2584866","CP000819.1",918446,NA,"A","G",120,NA,FALSE,NA,NA,5,0.077463,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,3,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","150,0","1","G" +"SRR2584866","CP000819.1",920378,NA,"T","C",225,NA,FALSE,NA,NA,18,0.357684,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,6,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",934305,NA,"T","C",225,NA,FALSE,NA,NA,13,0.840969,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,3,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",942702,NA,"ACCCCCCC","ACCCCCCCC",7.90677,NA,TRUE,6,0.857143,7,0.874061,NA,NA,NA,1,-0.590765,0.142857,NA,NA,"1",1,"0,2,1,4",20,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","34,0","1","ACCCCCCCC" +"SRR2584866","CP000819.1",943475,NA,"C","T",223,NA,FALSE,NA,NA,11,0.194649,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","253,0","1","T" +"SRR2584866","CP000819.1",946481,NA,"C","T",189,NA,FALSE,NA,NA,9,0.455044,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","219,0","1","T" +"SRR2584866","CP000819.1",976463,NA,"C","T",225,NA,FALSE,NA,NA,11,0.249967,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",981097,NA,"A","G",195,NA,FALSE,NA,NA,10,0.0893344,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","225,0","1","G" +"SRR2584866","CP000819.1",983049,NA,"T","C",225,NA,FALSE,NA,NA,10,0.971996,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",983164,NA,"G","A",225,NA,FALSE,NA,NA,19,0.226036,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,11,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",993982,NA,"C","T",225,NA,FALSE,NA,NA,13,0.0170963,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,9,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",997800,NA,"C","T",172,NA,FALSE,NA,NA,7,0.243804,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","202,0","1","T" +"SRR2584866","CP000819.1",998936,NA,"CGGGGGG","CGGGGGGG",86,NA,TRUE,13,1,13,0.664011,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,1,8,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","113,0","1","CGGGGGGG" +"SRR2584866","CP000819.1",1001295,NA,"A","G",225,NA,FALSE,NA,NA,15,0.481347,NA,NA,NA,0.966012,-0.686358,0,NA,NA,"1",1,"0,0,7,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1004451,NA,"A","G",170,NA,FALSE,NA,NA,8,0.798293,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","200,0","1","G" +"SRR2584866","CP000819.1",1007295,NA,"G","A",194,NA,FALSE,NA,NA,9,0.757552,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","224,0","1","A" +"SRR2584866","CP000819.1",1012717,NA,"C","T",151,NA,FALSE,NA,NA,7,0.311141,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","181,0","1","T" +"SRR2584866","CP000819.1",1018106,NA,"GAAAAAAAA","GAAAAAAA",54,NA,TRUE,14,1,14,0.834076,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,1,8,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","81,0","1","GAAAAAAA" +"SRR2584866","CP000819.1",1019279,NA,"A","G",225,NA,FALSE,NA,NA,12,0.242386,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,9,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1019308,NA,"AGGGGGGG","AGGGGGG",35.1457,NA,TRUE,10,0.909091,11,0.119856,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"1,1,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","62,0","1","AGGGGGG" +"SRR2584866","CP000819.1",1021771,NA,"GTTTTTTTT","GTTTTTTTTTT",51,NA,TRUE,5,0.833333,6,0.810983,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,1,1,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","85,7","1","GTTTTTTTTTT" +"SRR2584866","CP000819.1",1026088,NA,"A","G",156,NA,FALSE,NA,NA,6,0.530999,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","186,0","1","G" +"SRR2584866","CP000819.1",1032324,NA,"A","G",99,NA,FALSE,NA,NA,4,0.765925,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,1,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","129,0","1","G" +"SRR2584866","CP000819.1",1040240,NA,"C","T",176,NA,FALSE,NA,NA,7,0.576666,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","206,0","1","T" +"SRR2584866","CP000819.1",1060067,NA,"G","T",181,NA,FALSE,NA,NA,8,0.452509,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","211,0","1","T" +"SRR2584866","CP000819.1",1062285,NA,"GCCCCCCC","GCCCCCCCC",22.1209,NA,TRUE,12,0.923077,13,0.545162,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"1,2,9,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","49,0","1","GCCCCCCCC" +"SRR2584866","CP000819.1",1062372,NA,"C","T",225,NA,FALSE,NA,NA,17,0.345999,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,9,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1063259,NA,"A","G",206,NA,FALSE,NA,NA,9,0.630032,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","236,0","1","G" +"SRR2584866","CP000819.1",1065382,NA,"G","T",225,NA,FALSE,NA,NA,11,0.349492,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1074239,NA,"T","C",225,NA,FALSE,NA,NA,12,0.195999,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",1082646,NA,"A","G",122,NA,FALSE,NA,NA,7,0.598008,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,3,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","152,0","1","G" +"SRR2584866","CP000819.1",1082799,NA,"A","G",102,NA,FALSE,NA,NA,5,0.583998,NA,NA,NA,NA,-0.590765,0,NA,NA,"1",1,"0,0,5,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","132,0","1","G" +"SRR2584866","CP000819.1",1085185,NA,"C","T",197,NA,FALSE,NA,NA,9,0.681298,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","227,0","1","T" +"SRR2584866","CP000819.1",1096751,NA,"C","T",155,NA,FALSE,NA,NA,6,0.817916,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","185,0","1","T" +"SRR2584866","CP000819.1",1102367,NA,"C","T",225,NA,FALSE,NA,NA,15,0.453607,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,9,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1107611,NA,"C","T",212,NA,FALSE,NA,NA,9,0.179101,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","242,0","1","T" +"SRR2584866","CP000819.1",1110744,NA,"G","A",58,NA,FALSE,NA,NA,8,0.522837,0.868321,0.124046,0.372138,0.124046,-0.556411,0.5,NA,NA,"1",1,"4,0,0,4",20,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","85,0","1","A" +"SRR2584866","CP000819.1",1112067,NA,"TCCCCCCC","TCCCCCCCCC",46.4187,NA,TRUE,4,1,4,0.27062,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,3,1",49,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","78,2","1","TCCCCCCCCC" +"SRR2584866","CP000819.1",1113198,NA,"A","G",225,NA,FALSE,NA,NA,14,0.908875,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1115003,NA,"C","G",175,NA,FALSE,NA,NA,7,0.104392,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","205,0","1","G" +"SRR2584866","CP000819.1",1115942,NA,"C","T",188,NA,FALSE,NA,NA,8,0.618523,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","218,0","1","T" +"SRR2584866","CP000819.1",1115970,NA,"G","A",117,NA,FALSE,NA,NA,7,0.0846985,NA,NA,NA,NA,-0.616816,0,NA,NA,"1",1,"0,0,6,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","147,0","1","A" +"SRR2584866","CP000819.1",1116271,NA,"A","G",194,NA,FALSE,NA,NA,8,0.146847,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","224,0","1","G" +"SRR2584866","CP000819.1",1118258,NA,"T","C",29.4193,NA,FALSE,NA,NA,2,0.86,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,2,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","59,0","1","C" +"SRR2584866","CP000819.1",1122642,NA,"A","G",183,NA,FALSE,NA,NA,8,0.653125,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","213,0","1","G" +"SRR2584866","CP000819.1",1123337,NA,"G","A",225,NA,FALSE,NA,NA,12,0.352739,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1133570,NA,"C","T",130,NA,FALSE,NA,NA,5,0.361837,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,1,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","160,0","1","T" +"SRR2584866","CP000819.1",1141641,NA,"A","G",202,NA,FALSE,NA,NA,10,0.982834,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","232,0","1","G" +"SRR2584866","CP000819.1",1148998,NA,"A","G",181,NA,FALSE,NA,NA,8,0.262259,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","211,0","1","G" +"SRR2584866","CP000819.1",1149234,NA,"C","T",225,NA,FALSE,NA,NA,16,0.441271,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,5,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1149407,NA,"A","G",171,NA,FALSE,NA,NA,7,0.335973,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","201,0","1","G" +"SRR2584866","CP000819.1",1149750,NA,"C","T",225,NA,FALSE,NA,NA,14,0.338137,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,8,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1152346,NA,"C","T",136,NA,FALSE,NA,NA,7,0.361542,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","166,0","1","T" +"SRR2584866","CP000819.1",1152376,NA,"C","T",207,NA,FALSE,NA,NA,9,0.669846,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","T" +"SRR2584866","CP000819.1",1157397,NA,"A","G",74,NA,FALSE,NA,NA,4,0.360777,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,0,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","104,0","1","G" +"SRR2584866","CP000819.1",1166055,NA,"TGGGGGG","TGGGGGGG",99,NA,TRUE,13,0.928571,14,0.2505,NA,NA,NA,0.966012,-0.683931,0,NA,NA,"1",1,"0,1,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","126,0","1","TGGGGGGG" +"SRR2584866","CP000819.1",1166938,NA,"C","T",225,NA,FALSE,NA,NA,10,0.998483,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1169251,NA,"A","G",94,NA,FALSE,NA,NA,4,0.211317,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","124,0","1","G" +"SRR2584866","CP000819.1",1170638,NA,"A","G",137,NA,FALSE,NA,NA,6,0.679639,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","167,0","1","G" +"SRR2584866","CP000819.1",1171113,NA,"C","T",225,NA,FALSE,NA,NA,13,0.759427,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1172377,NA,"TCCCCCC","TCCCCCCCCC",225,NA,TRUE,8,1,8,0.308129,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","TCCCCCCCCC" +"SRR2584866","CP000819.1",1174705,NA,"G","A",225,NA,FALSE,NA,NA,14,0.593621,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,6,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1186699,NA,"C","T",225,NA,FALSE,NA,NA,16,0.464601,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,5,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1187366,NA,"C","T",146,NA,FALSE,NA,NA,6,0.01209,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","176,0","1","T" +"SRR2584866","CP000819.1",1192913,NA,"T","C",176,NA,FALSE,NA,NA,8,0.351291,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","206,0","1","C" +"SRR2584866","CP000819.1",1193219,NA,"A","G",195,NA,FALSE,NA,NA,9,0.447761,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","225,0","1","G" +"SRR2584866","CP000819.1",1218136,NA,"A","G",96,NA,FALSE,NA,NA,4,0.535497,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,3,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","126,0","1","G" +"SRR2584866","CP000819.1",1227598,NA,"C","T",205,NA,FALSE,NA,NA,9,0.579489,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","235,0","1","T" +"SRR2584866","CP000819.1",1238938,NA,"A","G",225,NA,FALSE,NA,NA,16,0.528561,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,6,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1248603,NA,"C","T",135,NA,FALSE,NA,NA,6,0.387746,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,1,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","165,0","1","T" +"SRR2584866","CP000819.1",1256283,NA,"TCCCCC","TCCCCCC",57,NA,TRUE,8,1,8,0.812989,NA,NA,NA,0.992367,-0.616816,0,NA,NA,"1",1,"0,2,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","84,0","1","TCCCCCC" +"SRR2584866","CP000819.1",1272099,NA,"A","G",137,NA,FALSE,NA,NA,6,0.943071,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,1,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","167,0","1","G" +"SRR2584866","CP000819.1",1276684,NA,"A","G",213,NA,FALSE,NA,NA,11,0.190636,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","243,0","1","G" +"SRR2584866","CP000819.1",1294860,NA,"A","G",225,NA,FALSE,NA,NA,11,0.189309,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1295877,NA,"A","G",151,NA,FALSE,NA,NA,6,0.0530329,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","181,0","1","G" +"SRR2584866","CP000819.1",1305452,NA,"C","T",221,NA,FALSE,NA,NA,11,0.70846,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,2,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","251,0","1","T" +"SRR2584866","CP000819.1",1307171,NA,"T","C",171,NA,FALSE,NA,NA,7,0.841706,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","201,0","1","C" +"SRR2584866","CP000819.1",1322001,NA,"A","G",186,NA,FALSE,NA,NA,11,0.513454,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","216,0","1","G" +"SRR2584866","CP000819.1",1324501,NA,"C","T",225,NA,FALSE,NA,NA,12,0.0688607,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1328687,NA,"C","T",101,NA,FALSE,NA,NA,5,0.732182,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,4,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","131,0","1","T" +"SRR2584866","CP000819.1",1329354,NA,"G","A",225,NA,FALSE,NA,NA,14,0.945862,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,5,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1330360,NA,"C","T",169,NA,FALSE,NA,NA,9,0.630851,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,1,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","199,0","1","T" +"SRR2584866","CP000819.1",1331794,NA,"C","A",102,NA,FALSE,NA,NA,5,0.968032,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,4,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","132,0","1","A" +"SRR2584866","CP000819.1",1336067,NA,"G","A",203,NA,FALSE,NA,NA,8,0.499424,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","233,0","1","A" +"SRR2584866","CP000819.1",1339424,NA,"A","G",225,NA,FALSE,NA,NA,10,0.0827521,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1357824,NA,"C","T",225,NA,FALSE,NA,NA,18,0.922214,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,10,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1364103,NA,"C","T",182,NA,FALSE,NA,NA,10,0.213741,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","212,0","1","T" +"SRR2584866","CP000819.1",1373806,NA,"A","G",225,NA,FALSE,NA,NA,11,0.765169,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1374685,NA,"G","A",136,NA,FALSE,NA,NA,7,0.095534,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","166,0","1","A" +"SRR2584866","CP000819.1",1392780,NA,"G","A",94,NA,FALSE,NA,NA,5,0.978429,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,4,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","124,0","1","A" +"SRR2584866","CP000819.1",1404452,NA,"A","G",225,NA,FALSE,NA,NA,15,0.100991,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,9,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1407377,NA,"C","T",186,NA,FALSE,NA,NA,9,0.755557,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,1,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","216,0","1","T" +"SRR2584866","CP000819.1",1414361,NA,"C","T",182,NA,FALSE,NA,NA,9,0.925988,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","212,0","1","T" +"SRR2584866","CP000819.1",1424975,NA,"GCCCCCC","GCCCCCCC",7.45233,NA,TRUE,8,0.615385,13,0.0990048,NA,NA,NA,0.531709,-0.651104,0.307692,NA,NA,"1",1,"2,3,5,3",24,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","53,19","1","GCCCCCCC" +"SRR2584866","CP000819.1",1425273,NA,"T","C",63,NA,FALSE,NA,NA,6,0.503448,NA,NA,NA,1,-0.616816,0.333333,NA,NA,"1",1,"0,0,1,5",18,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","93,0","1","C" +"SRR2584866","CP000819.1",1426142,NA,"A","G",228,NA,FALSE,NA,NA,17,0.425715,1,1,1,0.997028,-0.680642,0.0588235,NA,NA,"1",1,"0,1,9,3",39,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1426401,NA,"ATTTTTTTTT","ATTTTTTTT",29.1247,NA,TRUE,13,0.928571,14,0.897932,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"1,2,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","56,0","1","ATTTTTTTT" +"SRR2584866","CP000819.1",1428828,NA,"CTTTTTTTTT","CTTTTTTTT",17.4922,NA,TRUE,8,1,8,0.0092202,NA,NA,NA,1,-0.651104,0.125,NA,NA,"1",1,"0,0,7,1",52,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","47,0","1","CTTTTTTTT" +"SRR2584866","CP000819.1",1439576,NA,"AGGGGG","AGGGGGG",126,NA,TRUE,14,1,14,0.838356,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,8,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","156,0","1","AGGGGGG" +"SRR2584866","CP000819.1",1465314,NA,"G","A",225,NA,FALSE,NA,NA,10,0.791325,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1470490,NA,"ATTTTTTTT","ATTTTTTT",48.3091,NA,TRUE,14,1,14,0.351645,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,2,3,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","75,0","1","ATTTTTTT" +"SRR2584866","CP000819.1",1494411,NA,"T","C",225,NA,FALSE,NA,NA,14,0.577963,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,5,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",1499286,NA,"C","T",215,NA,FALSE,NA,NA,10,0.405153,NA,NA,NA,0.577865,-0.662043,0,NA,NA,"1",1,"0,0,3,6",47,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","245,0","1","T" +"SRR2584866","CP000819.1",1503556,NA,"A","G",109,NA,FALSE,NA,NA,23,0.934468,0.947033,0.22313,0.804377,0.096972,-0.686358,0.565217,NA,NA,"1",1,"1,5,11,3",14,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","136,0","1","G" +"SRR2584866","CP000819.1",1506624,NA,"C","T",228,NA,FALSE,NA,NA,27,0.759894,0.804348,0,0.978261,0.99963,-0.692717,0.0740741,NA,NA,"1",1,"1,1,12,11",36,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1511178,NA,"C","T",225,NA,FALSE,NA,NA,29,0.383059,NA,NA,NA,1,-0.692831,0,NA,NA,"1",1,"0,0,16,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1514507,NA,"G","A",225,NA,FALSE,NA,NA,17,0.0227527,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,9,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1522006,NA,"C","T",225,NA,FALSE,NA,NA,25,0.622972,NA,NA,NA,1,-0.692717,0,NA,NA,"1",1,"0,0,12,11",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1532269,NA,"C","T",225,NA,FALSE,NA,NA,17,0.865974,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,8,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1533135,NA,"A","G",225,NA,FALSE,NA,NA,16,0.118094,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,12,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1538688,NA,"G","A",225,NA,FALSE,NA,NA,17,0.817546,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,5,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1545254,NA,"C","T",174,NA,FALSE,NA,NA,9,0.310568,NA,NA,NA,0.89338,-0.662043,0,NA,NA,"1",1,"0,0,5,4",47,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","204,0","1","T" +"SRR2584866","CP000819.1",1554433,NA,"A","G",200,NA,FALSE,NA,NA,9,0.600144,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","230,0","1","G" +"SRR2584866","CP000819.1",1561513,NA,"A","G",165,NA,FALSE,NA,NA,10,0.0735697,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,9,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","195,0","1","G" +"SRR2584866","CP000819.1",1566509,NA,"C","T",80,NA,FALSE,NA,NA,4,0.913383,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","110,0","1","T" +"SRR2584866","CP000819.1",1583771,NA,"T","C",170,NA,FALSE,NA,NA,7,0.878339,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","200,0","1","C" +"SRR2584866","CP000819.1",1587505,NA,"C","T",225,NA,FALSE,NA,NA,18,0.873176,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,12,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1588307,NA,"G","A",225,NA,FALSE,NA,NA,17,0.987083,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,9,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1588640,NA,"C","T",225,NA,FALSE,NA,NA,19,0.521456,NA,NA,NA,1,-0.69168,0,NA,NA,"1",1,"0,0,8,11",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1591928,NA,"G","A",225,NA,FALSE,NA,NA,11,0.117011,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1592323,NA,"G","A",62,NA,FALSE,NA,NA,3,0.318564,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,2,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","92,0","1","A" +"SRR2584866","CP000819.1",1596202,NA,"C","T",225,NA,FALSE,NA,NA,20,0.988742,NA,NA,NA,1,-0.69168,0,NA,NA,"1",1,"0,0,11,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1606062,NA,"GCCCCCC","GCCCCCCC",37.1452,NA,TRUE,7,0.875,8,0.957667,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"1,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","64,0","1","GCCCCCCC" +"SRR2584866","CP000819.1",1607915,NA,"ATT","AT",106,NA,TRUE,7,1,7,0.565335,NA,NA,NA,NA,-0.636426,0,NA,NA,"1",1,"0,0,0,7",50,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","136,0","1","AT" +"SRR2584866","CP000819.1",1608116,NA,"G","A",108,NA,FALSE,NA,NA,6,0.96032,NA,NA,NA,NA,-0.616816,0,NA,NA,"1",1,"0,0,0,6",40,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","138,0","1","A" +"SRR2584866","CP000819.1",1616943,NA,"C","T",225,NA,FALSE,NA,NA,14,0.432226,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1619798,NA,"G","A",178,NA,FALSE,NA,NA,8,0.385012,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,2,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","208,0","1","A" +"SRR2584866","CP000819.1",1621317,NA,"C","T",146,NA,FALSE,NA,NA,6,0.910445,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","176,0","1","T" +"SRR2584866","CP000819.1",1633588,NA,"T","C",207,NA,FALSE,NA,NA,11,0.447786,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,7,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","C" +"SRR2584866","CP000819.1",1644789,NA,"C","T",225,NA,FALSE,NA,NA,14,0.177965,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1651844,NA,"TCCCCCC","TCCCCCCC",79,NA,TRUE,9,1,9,0.251355,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","109,0","1","TCCCCCCC" +"SRR2584866","CP000819.1",1653956,NA,"C","T",106,NA,FALSE,NA,NA,5,0.725151,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","136,0","1","T" +"SRR2584866","CP000819.1",1655572,NA,"G","A",163,NA,FALSE,NA,NA,8,0.715177,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","193,0","1","A" +"SRR2584866","CP000819.1",1678399,NA,"C","T",225,NA,FALSE,NA,NA,17,0.000629224,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,4,12",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1684715,NA,"G","A",225,NA,FALSE,NA,NA,13,0.431004,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,4,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1690648,NA,"T","C",224,NA,FALSE,NA,NA,14,0.470998,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,11,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","254,0","1","C" +"SRR2584866","CP000819.1",1695028,NA,"C","T",225,NA,FALSE,NA,NA,12,0.188603,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1695658,NA,"C","T",225,NA,FALSE,NA,NA,14,0.613531,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,9,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1699806,NA,"CATGATGATGATGAT","CATGATGATGAT",153,NA,TRUE,6,0.666667,9,0.495904,NA,NA,NA,0.974597,-0.616816,0,NA,NA,"1",1,"1,2,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,75","1","CATGATGATGAT" +"SRR2584866","CP000819.1",1700108,NA,"G","A",225,NA,FALSE,NA,NA,11,0.482741,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1704682,NA,"CGGGGGG","CGGGGGGG",97,NA,TRUE,15,0.9375,16,0.254916,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"1,1,4,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","124,0","1","CGGGGGGG" +"SRR2584866","CP000819.1",1706889,NA,"G","A",149,NA,FALSE,NA,NA,7,0.181258,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","179,0","1","A" +"SRR2584866","CP000819.1",1707832,NA,"G","A",143,NA,FALSE,NA,NA,7,0.202927,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","173,0","1","A" +"SRR2584866","CP000819.1",1707992,NA,"T","C",60,NA,FALSE,NA,NA,3,0.354794,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,1,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","90,0","1","C" +"SRR2584866","CP000819.1",1714708,NA,"T","C",188,NA,FALSE,NA,NA,11,0.226103,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","218,0","1","C" +"SRR2584866","CP000819.1",1717659,NA,"C","A",195,NA,FALSE,NA,NA,8,0.119488,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","225,0","1","A" +"SRR2584866","CP000819.1",1719003,NA,"TAAAAAA","TAAAAA",46.9699,NA,TRUE,10,0.833333,12,0.431004,NA,NA,NA,0.982603,-0.662043,0,NA,NA,"1",1,"0,3,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","76,2","1","TAAAAA" +"SRR2584866","CP000819.1",1722253,NA,"T","C",225,NA,FALSE,NA,NA,13,0.918712,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",1728925,NA,"G","A",225,NA,FALSE,NA,NA,10,0.691871,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1733343,NA,"G","A",218,NA,FALSE,NA,NA,9,0.95072,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","248,0","1","A" +"SRR2584866","CP000819.1",1733601,NA,"G","A",206,NA,FALSE,NA,NA,9,0.617041,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","236,0","1","A" +"SRR2584866","CP000819.1",1735484,NA,"G","A",162,NA,FALSE,NA,NA,7,0.326789,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","192,0","1","A" +"SRR2584866","CP000819.1",1736604,NA,"G","A",225,NA,FALSE,NA,NA,23,0.74467,NA,NA,NA,1,-0.692352,0,NA,NA,"1",1,"0,0,13,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1742530,NA,"G","A",157,NA,FALSE,NA,NA,7,0.622148,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","187,0","1","A" +"SRR2584866","CP000819.1",1746738,NA,"TGGGGGGG","TGGGGGGGG",8.37835,NA,TRUE,9,0.692308,13,0.926609,NA,NA,NA,0.961166,-0.651104,0,NA,NA,"1",1,"1,4,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","46,11","1","TGGGGGGGG" +"SRR2584866","CP000819.1",1748882,NA,"G","A",225,NA,FALSE,NA,NA,13,0.881738,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,4,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1750039,NA,"T","C",225,NA,FALSE,NA,NA,12,0.251186,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",1750149,NA,"C","T",225,NA,FALSE,NA,NA,14,0.142559,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,8,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1766706,NA,"G","A",162,NA,FALSE,NA,NA,7,0.230436,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","192,0","1","A" +"SRR2584866","CP000819.1",1780813,NA,"A","G",205,NA,FALSE,NA,NA,12,0.983481,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,10,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","235,0","1","G" +"SRR2584866","CP000819.1",1791609,NA,"ACCCC","ACCCCC",45.4146,NA,TRUE,4,1,4,0.131345,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,0,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","75,0","1","ACCCCC" +"SRR2584866","CP000819.1",1798872,NA,"G","A",83,NA,FALSE,NA,NA,4,0.325692,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","113,0","1","A" +"SRR2584866","CP000819.1",1805276,NA,"C","T",225,NA,FALSE,NA,NA,14,0.33436,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,6,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1817707,NA,"C","T",64,NA,FALSE,NA,NA,4,0.409395,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,3,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","94,0","1","T" +"SRR2584866","CP000819.1",1818501,NA,"G","A",195,NA,FALSE,NA,NA,9,0.872557,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","225,0","1","A" +"SRR2584866","CP000819.1",1822147,NA,"CAAAAAAAA","CAAAAAAA",34.4159,NA,TRUE,7,1,7,0.0302289,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","64,0","1","CAAAAAAA" +"SRR2584866","CP000819.1",1822930,NA,"A","G",225,NA,FALSE,NA,NA,15,0.737707,NA,NA,NA,0.966012,-0.686358,0,NA,NA,"1",1,"0,0,7,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1838393,NA,"A","G",225,NA,FALSE,NA,NA,16,0.739233,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,6,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1851417,NA,"GCCCCC","GCCCCCC",72,NA,TRUE,10,0.909091,11,0.900033,NA,NA,NA,0.964642,-0.662043,0,NA,NA,"1",1,"1,1,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","99,0","1","GCCCCCC" +"SRR2584866","CP000819.1",1854798,NA,"C","T",225,NA,FALSE,NA,NA,14,0.76967,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,6,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",1866593,NA,"CAAAAAAAA","CAAAAAAA",27.4221,NA,TRUE,11,1,11,0.10604,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,10,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","57,0","1","CAAAAAAA" +"SRR2584866","CP000819.1",1876656,NA,"C","G",225,NA,FALSE,NA,NA,10,0.268796,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",1882713,NA,"G","A",171,NA,FALSE,NA,NA,8,0.669559,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","201,0","1","A" +"SRR2584866","CP000819.1",1885487,NA,"G","A",217,NA,FALSE,NA,NA,12,0.600819,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","247,0","1","A" +"SRR2584866","CP000819.1",1885847,NA,"G","A",174,NA,FALSE,NA,NA,8,0.0348597,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","204,0","1","A" +"SRR2584866","CP000819.1",1886285,NA,"CTTTTTTT","CTTTTTT",77,NA,TRUE,14,1,14,0.17868,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,1,5,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","104,0","1","CTTTTTT" +"SRR2584866","CP000819.1",1889404,NA,"G","A",153,NA,FALSE,NA,NA,6,0.114818,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","183,0","1","A" +"SRR2584866","CP000819.1",1895196,NA,"T","C",225,NA,FALSE,NA,NA,15,0.223642,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,5,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",1900282,NA,"T","C",180,NA,FALSE,NA,NA,10,0.373295,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,8,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","210,0","1","C" +"SRR2584866","CP000819.1",1902731,NA,"A","G",90,NA,FALSE,NA,NA,5,0.397076,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","120,0","1","G" +"SRR2584866","CP000819.1",1910193,NA,"G","A",201,NA,FALSE,NA,NA,9,0.931218,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","231,0","1","A" +"SRR2584866","CP000819.1",1917089,NA,"T","C",148,NA,FALSE,NA,NA,6,0.747124,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","178,0","1","C" +"SRR2584866","CP000819.1",1936045,NA,"T","C",225,NA,FALSE,NA,NA,11,0.998189,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,4,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",1937932,NA,"GCCCCCC","GCCCCCCC",85,NA,TRUE,15,0.882353,17,0.969597,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"1,2,8,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","112,0","1","GCCCCCCC" +"SRR2584866","CP000819.1",1947570,NA,"T","C",214,NA,FALSE,NA,NA,12,0.617041,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,8,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","244,0","1","C" +"SRR2584866","CP000819.1",1948732,NA,"A","G",155,NA,FALSE,NA,NA,10,0.00514093,NA,NA,NA,NA,-0.670168,0,NA,NA,"1",1,"0,0,10,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","185,0","1","G" +"SRR2584866","CP000819.1",1953772,NA,"G","A",191,NA,FALSE,NA,NA,8,0.757552,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","221,0","1","A" +"SRR2584866","CP000819.1",1953970,NA,"G","A",160,NA,FALSE,NA,NA,8,0.47588,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","190,0","1","A" +"SRR2584866","CP000819.1",1960058,NA,"C","A",145,NA,FALSE,NA,NA,6,0.0630446,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","175,0","1","A" +"SRR2584866","CP000819.1",1960359,NA,"G","A",225,NA,FALSE,NA,NA,16,0.095834,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,5,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",1969741,NA,"CAAAAAAA","CAAAAAAAA",57,NA,TRUE,9,0.818182,11,0.3409,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","87,0","1","CAAAAAAAA" +"SRR2584866","CP000819.1",1976788,NA,"G","A",198,NA,FALSE,NA,NA,14,0.999713,NA,NA,NA,0.982603,-0.680642,0,NA,NA,"1",1,"0,0,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","228,0","1","A" +"SRR2584866","CP000819.1",1993260,NA,"TCCCCCC","TCCCCCCC",82,NA,TRUE,17,0.944444,18,0.472178,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,3,11,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","109,0","1","TCCCCCCC" +"SRR2584866","CP000819.1",2007560,NA,"GCCCCCCC","GCCCCCC",17.8695,NA,TRUE,8,0.727273,11,0.751877,NA,NA,NA,0.950952,-0.636426,0,NA,NA,"1",1,"2,2,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","49,4","1","GCCCCCC" +"SRR2584866","CP000819.1",2007865,NA,"T","C",127,NA,FALSE,NA,NA,6,0.958818,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","157,0","1","C" +"SRR2584866","CP000819.1",2009215,NA,"A","G",225,NA,FALSE,NA,NA,19,0.295676,NA,NA,NA,1,-0.691153,0,NA,NA,"1",1,"0,0,6,12",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",2012737,NA,"T","C",225,NA,FALSE,NA,NA,15,0.814382,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,6,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2017723,NA,"TGGGGGGG","TGGGGGGGG",47.1446,NA,TRUE,9,0.818182,11,0.91834,NA,NA,NA,0.964642,-0.662043,0,NA,NA,"1",1,"0,2,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","74,0","1","TGGGGGGGG" +"SRR2584866","CP000819.1",2018664,NA,"G","A",225,NA,FALSE,NA,NA,14,0.864226,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2021697,NA,"T","C",224,NA,FALSE,NA,NA,11,0.000538717,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","254,0","1","C" +"SRR2584866","CP000819.1",2055662,NA,"G","A",8.13869,NA,FALSE,NA,NA,2,0.66,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,2,0",19,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","37,0","1","A" +"SRR2584866","CP000819.1",2055692,NA,"A","G",5.75677,NA,FALSE,NA,NA,2,0.26,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,2,0",19,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","34,0","1","G" +"SRR2584866","CP000819.1",2055833,NA,"A","G",4.38466,NA,FALSE,NA,NA,3,0.509904,NA,NA,NA,NA,-0.511536,0,NA,NA,"1",1,"0,0,3,0",18,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","32,0","1","G" +"SRR2584866","CP000819.1",2075053,NA,"G","A",152,NA,FALSE,NA,NA,8,0.844031,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","182,0","1","A" +"SRR2584866","CP000819.1",2084671,NA,"C","T",225,NA,FALSE,NA,NA,12,0.280204,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2131457,NA,"CAAAAAAAAA","CAAAAAAAA",10.7919,NA,TRUE,4,1,4,0.0797974,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","40,0","1","CAAAAAAAA" +"SRR2584866","CP000819.1",2131844,NA,"C","A",225,NA,FALSE,NA,NA,14,0.331028,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2132237,NA,"C","T",225,NA,FALSE,NA,NA,15,0.998989,NA,NA,NA,0.953497,-0.683931,0,NA,NA,"1",1,"0,0,9,4",58,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2135132,NA,"T","C",225,NA,FALSE,NA,NA,12,0.218005,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2162780,NA,"T","C",225,NA,FALSE,NA,NA,12,0.649418,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2166182,NA,"G","A",225,NA,FALSE,NA,NA,14,0.231779,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,9,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2167011,NA,"T","C",119,NA,FALSE,NA,NA,5,0.677613,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,2,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","149,0","1","C" +"SRR2584866","CP000819.1",2188152,NA,"G","A",225,NA,FALSE,NA,NA,17,0.877825,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,11,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2190388,NA,"TCCCCC","TCCCCCC",95,NA,TRUE,9,0.818182,11,0.849243,NA,NA,NA,0.964642,-0.670168,0,NA,NA,"1",1,"0,1,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","122,0","1","TCCCCCC" +"SRR2584866","CP000819.1",2191094,NA,"G","A",225,NA,FALSE,NA,NA,19,0.654918,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,10,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2196365,NA,"T","C",225,NA,FALSE,NA,NA,18,0.256043,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,9,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2197386,NA,"C","T",176,NA,FALSE,NA,NA,7,0.17561,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","206,0","1","T" +"SRR2584866","CP000819.1",2198299,NA,"A","G",193,NA,FALSE,NA,NA,9,0.959258,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","G" +"SRR2584866","CP000819.1",2200070,NA,"G","A",225,NA,FALSE,NA,NA,13,0.824093,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,5,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2207851,NA,"G","A",225,NA,FALSE,NA,NA,12,0.193383,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2216447,NA,"C","T",154,NA,FALSE,NA,NA,8,0.480966,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,1,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","184,0","1","T" +"SRR2584866","CP000819.1",2221597,NA,"G","A",128,NA,FALSE,NA,NA,5,0.950351,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","158,0","1","A" +"SRR2584866","CP000819.1",2222769,NA,"G","A",225,NA,FALSE,NA,NA,15,0.476193,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,7,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2230442,NA,"G","A",172,NA,FALSE,NA,NA,7,0.041536,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","202,0","1","A" +"SRR2584866","CP000819.1",2235030,NA,"C","T",225,NA,FALSE,NA,NA,11,0.314028,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2247096,NA,"TCCCCCCC","TCCCCCCCCCC",17.8332,NA,TRUE,2,0.666667,3,0.327477,NA,NA,NA,1,-0.453602,0,NA,NA,"1",1,"1,0,0,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","70,25","1","TCCCCCCCCCC" +"SRR2584866","CP000819.1",2248449,NA,"G","A",225,NA,FALSE,NA,NA,11,0.227094,NA,NA,NA,0.89338,-0.662043,0,NA,NA,"1",1,"0,0,4,5",58,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2265986,NA,"G","A",131,NA,FALSE,NA,NA,6,0.730264,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,1,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","161,0","1","A" +"SRR2584866","CP000819.1",2269443,NA,"TAAAA","TAAA",57,NA,TRUE,9,0.75,12,0.44694,NA,NA,NA,0.95494,-0.662043,0,NA,NA,"1",1,"0,3,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","114,30","1","TAAA" +"SRR2584866","CP000819.1",2271283,NA,"A","G",131,NA,FALSE,NA,NA,6,0.537869,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","161,0","1","G" +"SRR2584866","CP000819.1",2275388,NA,"G","A",173,NA,FALSE,NA,NA,9,0.657519,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","203,0","1","A" +"SRR2584866","CP000819.1",2277269,NA,"C","T",212,NA,FALSE,NA,NA,11,0.306317,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","242,0","1","T" +"SRR2584866","CP000819.1",2283856,NA,"A","G",193,NA,FALSE,NA,NA,9,0.538025,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","G" +"SRR2584866","CP000819.1",2288549,NA,"A","G",207,NA,FALSE,NA,NA,9,0.946069,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","G" +"SRR2584866","CP000819.1",2288867,NA,"GCCCCC","GCCCCCC",53,NA,TRUE,6,1,6,0.0930359,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,1,2,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","80,0","1","GCCCCCC" +"SRR2584866","CP000819.1",2290361,NA,"C","T",225,NA,FALSE,NA,NA,15,0.18012,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2293568,NA,"G","A",181,NA,FALSE,NA,NA,8,0.123545,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,2,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","211,0","1","A" +"SRR2584866","CP000819.1",2299913,NA,"GTT","GTTT",228,NA,TRUE,15,1,15,0.84892,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,2,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","GTTT" +"SRR2584866","CP000819.1",2304987,NA,"G","T",143,NA,FALSE,NA,NA,7,0.44936,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,4,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","173,0","1","T" +"SRR2584866","CP000819.1",2316485,NA,"C","T",225,NA,FALSE,NA,NA,11,0.205475,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2319260,NA,"G","A",225,NA,FALSE,NA,NA,14,0.834894,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2321769,NA,"G","A",225,NA,FALSE,NA,NA,11,0.930012,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2333538,NA,"AT","ATT",182,NA,TRUE,7,1,7,0.610245,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","212,0","1","ATT" +"SRR2584866","CP000819.1",2346387,NA,"C","T",225,NA,FALSE,NA,NA,13,0.00873016,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2348395,NA,"G","A",223,NA,FALSE,NA,NA,12,0.513454,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","253,0","1","A" +"SRR2584866","CP000819.1",2351655,NA,"T","C",69,NA,FALSE,NA,NA,3,0.0349627,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,2,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","99,0","1","C" +"SRR2584866","CP000819.1",2358817,NA,"G","A",163,NA,FALSE,NA,NA,10,0.353119,NA,NA,NA,NA,-0.662043,0,NA,NA,"1",1,"0,0,9,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","193,0","1","A" +"SRR2584866","CP000819.1",2363952,NA,"G","A",225,NA,FALSE,NA,NA,12,0.0307945,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2373908,NA,"CAAAAAA","CAAAAAAA",69,NA,TRUE,11,0.916667,12,0.404704,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"1,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","96,0","1","CAAAAAAA" +"SRR2584866","CP000819.1",2374708,NA,"G","A",225,NA,FALSE,NA,NA,19,0.00202511,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,8,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2379466,NA,"T","C",225,NA,FALSE,NA,NA,12,0.775348,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,3,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2389645,NA,"G","A",114,NA,FALSE,NA,NA,7,0.084975,NA,NA,NA,NA,-0.636426,0,NA,NA,"1",1,"0,0,7,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","144,0","1","A" +"SRR2584866","CP000819.1",2389878,NA,"G","T",148,NA,FALSE,NA,NA,6,0.679639,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","178,0","1","T" +"SRR2584866","CP000819.1",2393070,NA,"A","G",145,NA,FALSE,NA,NA,6,0.688859,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","175,0","1","G" +"SRR2584866","CP000819.1",2401038,NA,"GTTTTTT","GTTTTTTT",65,NA,TRUE,9,1,9,0.642877,NA,NA,NA,0.974597,-0.651104,0,NA,NA,"1",1,"0,1,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","92,0","1","GTTTTTTT" +"SRR2584866","CP000819.1",2406239,NA,"G","A",173,NA,FALSE,NA,NA,7,0.944437,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","203,0","1","A" +"SRR2584866","CP000819.1",2406546,NA,"ACCCCCCC","ACCCCCCCC",20.4535,NA,TRUE,6,1,6,0.106656,NA,NA,NA,NA,-0.616816,0,NA,NA,"1",1,"0,0,0,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","50,0","1","ACCCCCCCC" +"SRR2584866","CP000819.1",2407948,NA,"A","T",32.1242,NA,FALSE,NA,NA,7,0.62656,0.574341,0.28717,0.861511,0.875,-0.511536,0.428571,NA,NA,"1",1,"2,1,2,1",18,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","59,0","1","T" +"SRR2584866","CP000819.1",2412327,NA,"C","T",55,NA,FALSE,NA,NA,3,0.56,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,2,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","85,0","1","T" +"SRR2584866","CP000819.1",2412783,NA,"T","C",184,NA,FALSE,NA,NA,9,0.419626,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","214,0","1","C" +"SRR2584866","CP000819.1",2424819,NA,"A","G",114,NA,FALSE,NA,NA,5,0.549396,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,2,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","144,0","1","G" +"SRR2584866","CP000819.1",2427998,NA,"T","C",225,NA,FALSE,NA,NA,10,0.907304,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2430565,NA,"G","A",225,NA,FALSE,NA,NA,20,0.943823,NA,NA,NA,1,-0.69168,0,NA,NA,"1",1,"0,0,11,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2433061,NA,"G","A",209,NA,FALSE,NA,NA,9,0.889084,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","239,0","1","A" +"SRR2584866","CP000819.1",2440555,NA,"A","G",123,NA,FALSE,NA,NA,5,0.728443,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,2,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","153,0","1","G" +"SRR2584866","CP000819.1",2445090,NA,"GCCCCCCC","GCCCCCC",67,NA,TRUE,14,1,14,0.209247,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,2,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","94,0","1","GCCCCCC" +"SRR2584866","CP000819.1",2446984,NA,"A","C",222,NA,FALSE,NA,NA,12,0.240286,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","252,0","1","C" +"SRR2584866","CP000819.1",2450517,NA,"G","A",225,NA,FALSE,NA,NA,12,0.134903,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2452375,NA,"G","A",80,NA,FALSE,NA,NA,4,0.0506481,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,2,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","110,0","1","A" +"SRR2584866","CP000819.1",2457654,NA,"G","A",225,NA,FALSE,NA,NA,11,0.570414,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2483043,NA,"G","A",195,NA,FALSE,NA,NA,9,0.156836,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","225,0","1","A" +"SRR2584866","CP000819.1",2484358,NA,"T","C",126,NA,FALSE,NA,NA,5,0.320361,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,2,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","156,0","1","C" +"SRR2584866","CP000819.1",2484978,NA,"G","A",123,NA,FALSE,NA,NA,5,0.60156,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,2,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","153,0","1","A" +"SRR2584866","CP000819.1",2491484,NA,"G","A",190,NA,FALSE,NA,NA,14,0.996044,NA,NA,NA,NA,-0.680642,0,NA,NA,"1",1,"0,0,12,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","220,0","1","A" +"SRR2584866","CP000819.1",2493004,NA,"G","A",214,NA,FALSE,NA,NA,10,0.00783921,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","244,0","1","A" +"SRR2584866","CP000819.1",2493247,NA,"GCCCC","GCCCCC",95,NA,TRUE,8,1,8,0.674881,NA,NA,NA,0.992367,-0.636426,0,NA,NA,"1",1,"0,1,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","122,0","1","GCCCCC" +"SRR2584866","CP000819.1",2493463,NA,"G","A",167,NA,FALSE,NA,NA,8,0.938542,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","197,0","1","A" +"SRR2584866","CP000819.1",2499317,NA,"C","T",203,NA,FALSE,NA,NA,9,0.471462,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","233,0","1","T" +"SRR2584866","CP000819.1",2517303,NA,"G","A",225,NA,FALSE,NA,NA,16,0.254982,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,10,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2523314,NA,"G","A",160,NA,FALSE,NA,NA,7,0.479322,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","190,0","1","A" +"SRR2584866","CP000819.1",2524481,NA,"G","A",225,NA,FALSE,NA,NA,16,0.760546,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,8,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2540524,NA,"T","C",178,NA,FALSE,NA,NA,9,0.598255,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,8,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","208,0","1","C" +"SRR2584866","CP000819.1",2542070,NA,"G","A",180,NA,FALSE,NA,NA,9,0.587934,1,1,1,0.900802,-0.636426,0,NA,NA,"1",1,"0,1,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","213,6","1","A" +"SRR2584866","CP000819.1",2545772,NA,"T","C",87,NA,FALSE,NA,NA,4,0.725151,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,0,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","117,0","1","C" +"SRR2584866","CP000819.1",2558409,NA,"G","A",225,NA,FALSE,NA,NA,17,0.0878282,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,6,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2567577,NA,"A","G",225,NA,FALSE,NA,NA,16,0.593461,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,6,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",2569859,NA,"A","G",189,NA,FALSE,NA,NA,11,0.387294,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,7,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","219,0","1","G" +"SRR2584866","CP000819.1",2570083,NA,"G","A",173,NA,FALSE,NA,NA,7,0.863306,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","203,0","1","A" +"SRR2584866","CP000819.1",2576255,NA,"G","A",205,NA,FALSE,NA,NA,10,0.131006,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","235,0","1","A" +"SRR2584866","CP000819.1",2578814,NA,"G","A",200,NA,FALSE,NA,NA,9,0.175997,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","230,0","1","A" +"SRR2584866","CP000819.1",2583071,NA,"G","A",207,NA,FALSE,NA,NA,18,0.493819,0.031048,0.00673795,0.965874,0.0134778,-0.680642,0.222222,NA,NA,"1",1,"5,0,4,8",35,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","234,0","1","A" +"SRR2584866","CP000819.1",2585563,NA,"G","A",167,NA,FALSE,NA,NA,7,0.489658,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","197,0","1","A" +"SRR2584866","CP000819.1",2590715,NA,"G","A",149,NA,FALSE,NA,NA,7,0.814142,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,1,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","179,0","1","A" +"SRR2584866","CP000819.1",2594448,NA,"T","C",111,NA,FALSE,NA,NA,4,0.939765,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","141,0","1","C" +"SRR2584866","CP000819.1",2597401,NA,"G","T",207,NA,FALSE,NA,NA,10,0.453107,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","T" +"SRR2584866","CP000819.1",2598446,NA,"G","A",165,NA,FALSE,NA,NA,8,0.0679081,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","195,0","1","A" +"SRR2584866","CP000819.1",2598869,NA,"G","A",225,NA,FALSE,NA,NA,21,0.888928,NA,NA,NA,1,-0.690438,0,NA,NA,"1",1,"0,0,12,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2603140,NA,"G","A",152,NA,FALSE,NA,NA,8,0.824688,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","182,0","1","A" +"SRR2584866","CP000819.1",2607090,NA,"G","A",187,NA,FALSE,NA,NA,12,0.76965,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,2,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","217,0","1","A" +"SRR2584866","CP000819.1",2611312,NA,"CT","C",204,NA,TRUE,9,1,9,0.42038,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","234,0","1","C" +"SRR2584866","CP000819.1",2616295,NA,"AGGGGGGG","AGGGGGGGG",46.4146,NA,TRUE,8,0.888889,9,0.0496803,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","76,0","1","AGGGGGGGG" +"SRR2584866","CP000819.1",2616662,NA,"A","G",225,NA,FALSE,NA,NA,14,0.686723,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,8,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",2616778,NA,"G","A",225,NA,FALSE,NA,NA,15,0.159996,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2618472,NA,"G","T",65,NA,FALSE,NA,NA,11,0.0797974,0.335918,0.0559863,0.503877,0.916482,-0.556411,0.545455,NA,NA,"1",1,"4,2,3,1",16,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","92,0","1","T" +"SRR2584866","CP000819.1",2629794,NA,"T","C",225,NA,FALSE,NA,NA,15,0.988006,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,10,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2639449,NA,"G","A",225,NA,FALSE,NA,NA,20,0.810509,NA,NA,NA,1,-0.690438,0,NA,NA,"1",1,"0,0,13,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2647160,NA,"GCCCCC","GCCCCCC",89,NA,TRUE,11,1,11,0.571148,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,9,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","119,0","1","GCCCCCC" +"SRR2584866","CP000819.1",2654896,NA,"G","A",201,NA,FALSE,NA,NA,9,0.982401,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","231,0","1","A" +"SRR2584866","CP000819.1",2655111,NA,"T","C",225,NA,FALSE,NA,NA,11,0.6952,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,8,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2658918,NA,"T","C",188,NA,FALSE,NA,NA,9,0.632198,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","218,0","1","C" +"SRR2584866","CP000819.1",2659827,NA,"T","C",225,NA,FALSE,NA,NA,12,0.314843,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2663511,NA,"T","C",173,NA,FALSE,NA,NA,7,0.291224,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","203,0","1","C" +"SRR2584866","CP000819.1",2664660,NA,"A","C",193,NA,FALSE,NA,NA,9,0.685625,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","C" +"SRR2584866","CP000819.1",2665639,NA,"A","T",225,NA,FALSE,NA,NA,13,0.95217,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,10,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2667428,NA,"A","T",184,NA,FALSE,NA,NA,7,0.915697,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","214,0","1","T" +"SRR2584866","CP000819.1",2674244,NA,"G","A",149,NA,FALSE,NA,NA,7,0.59895,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","179,0","1","A" +"SRR2584866","CP000819.1",2693113,NA,"GTTCTTCTTCTTC","GTTCTTCTTC",213,NA,TRUE,7,0.875,8,0.13615,NA,NA,NA,0.992367,-0.636426,0,NA,NA,"1",1,"0,1,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,15","1","GTTCTTCTTC" +"SRR2584866","CP000819.1",2697224,NA,"T","C",225,NA,FALSE,NA,NA,13,0.143101,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2701601,NA,"C","T",65,NA,FALSE,NA,NA,3,0.66,NA,NA,NA,1,-0.453602,0,NA,NA,"1",1,"0,0,1,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","95,0","1","T" +"SRR2584866","CP000819.1",2703592,NA,"G","A",225,NA,FALSE,NA,NA,16,0.986714,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,8,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2706032,NA,"G","A",141,NA,FALSE,NA,NA,6,0.773942,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","171,0","1","A" +"SRR2584866","CP000819.1",2709839,NA,"T","C",174,NA,FALSE,NA,NA,7,0.348672,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","204,0","1","C" +"SRR2584866","CP000819.1",2716283,NA,"G","C",141,NA,FALSE,NA,NA,6,0.0328229,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","171,0","1","C" +"SRR2584866","CP000819.1",2723460,NA,"C","T",215,NA,FALSE,NA,NA,11,0.724387,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","245,0","1","T" +"SRR2584866","CP000819.1",2724782,NA,"T","C",216,NA,FALSE,NA,NA,13,0.781363,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","246,0","1","C" +"SRR2584866","CP000819.1",2726933,NA,"G","A",224,NA,FALSE,NA,NA,11,0.0677205,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","254,0","1","A" +"SRR2584866","CP000819.1",2729399,NA,"G","A",112,NA,FALSE,NA,NA,6,0.57958,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,4,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","142,0","1","A" +"SRR2584866","CP000819.1",2740297,NA,"G","A",225,NA,FALSE,NA,NA,17,0.991771,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,12,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2747391,NA,"G","A",225,NA,FALSE,NA,NA,11,0.952197,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2749219,NA,"G","A",225,NA,FALSE,NA,NA,10,0.158012,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2753451,NA,"G","A",225,NA,FALSE,NA,NA,16,0.253219,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,6,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2753768,NA,"C","T",220,NA,FALSE,NA,NA,10,0.187989,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","250,0","1","T" +"SRR2584866","CP000819.1",2769294,NA,"T","C",146,NA,FALSE,NA,NA,6,0.92054,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","176,0","1","C" +"SRR2584866","CP000819.1",2769581,NA,"T","C",225,NA,FALSE,NA,NA,10,0.443437,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2771282,NA,"G","A",225,NA,FALSE,NA,NA,12,0.431004,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,8,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2780858,NA,"A","G",180,NA,FALSE,NA,NA,8,0.918657,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","210,0","1","G" +"SRR2584866","CP000819.1",2781436,NA,"G","A",225,NA,FALSE,NA,NA,12,0.505656,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2781587,NA,"T","C",225,NA,FALSE,NA,NA,15,0.338575,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,10,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2787379,NA,"T","C",110,NA,FALSE,NA,NA,5,0.681661,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","140,0","1","C" +"SRR2584866","CP000819.1",2788797,NA,"CAAAAAAAAA","CAAAAAAAAAA",59,NA,TRUE,14,1,14,0.460669,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,6,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","89,0","1","CAAAAAAAAAA" +"SRR2584866","CP000819.1",2789588,NA,"A","G",174,NA,FALSE,NA,NA,7,0.607474,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","204,0","1","G" +"SRR2584866","CP000819.1",2790151,NA,"A","G",225,NA,FALSE,NA,NA,16,0.307251,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,9,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",2790343,NA,"G","A",206,NA,FALSE,NA,NA,11,0.812497,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","236,0","1","A" +"SRR2584866","CP000819.1",2812552,NA,"G","A",186,NA,FALSE,NA,NA,9,0.585123,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","216,0","1","A" +"SRR2584866","CP000819.1",2812878,NA,"C","T",181,NA,FALSE,NA,NA,8,0.682525,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","211,0","1","T" +"SRR2584866","CP000819.1",2820506,NA,"A","G",225,NA,FALSE,NA,NA,13,0.920644,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,8,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",2824735,NA,"TCCCCCC","TCCCCCCC",86,NA,TRUE,11,0.916667,12,0.554125,NA,NA,NA,0.982603,-0.676189,0,NA,NA,"1",1,"0,1,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","113,0","1","TCCCCCCC" +"SRR2584866","CP000819.1",2828591,NA,"T","C",225,NA,FALSE,NA,NA,12,0.432988,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,8,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",2830053,NA,"C","T",183,NA,FALSE,NA,NA,9,0.377852,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","213,0","1","T" +"SRR2584866","CP000819.1",2843036,NA,"G","A",143,NA,FALSE,NA,NA,6,0.939067,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","173,0","1","A" +"SRR2584866","CP000819.1",2853806,NA,"A","G",199,NA,FALSE,NA,NA,10,0.923913,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","229,0","1","G" +"SRR2584866","CP000819.1",2867523,NA,"GCCCCCC","GCCCCCCC",29.0521,NA,TRUE,5,0.833333,6,0.387746,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"1,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","56,0","1","GCCCCCCC" +"SRR2584866","CP000819.1",2870212,NA,"T","C",131,NA,FALSE,NA,NA,7,0.59074,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,1,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","161,0","1","C" +"SRR2584866","CP000819.1",2923043,NA,"G","A",166,NA,FALSE,NA,NA,7,0.11383,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","196,0","1","A" +"SRR2584866","CP000819.1",2932058,NA,"C","T",225,NA,FALSE,NA,NA,15,0.80896,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",2935321,NA,"T","C",181,NA,FALSE,NA,NA,9,0.888218,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","211,0","1","C" +"SRR2584866","CP000819.1",2937844,NA,"G","A",225,NA,FALSE,NA,NA,15,0.754408,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,9,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2937968,NA,"T","C",161,NA,FALSE,NA,NA,8,0.0283055,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","191,0","1","C" +"SRR2584866","CP000819.1",2938800,NA,"A","G",130,NA,FALSE,NA,NA,6,0.252876,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","160,0","1","G" +"SRR2584866","CP000819.1",2957538,NA,"G","A",66,NA,FALSE,NA,NA,4,0.970411,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,1,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","96,0","1","A" +"SRR2584866","CP000819.1",2959717,NA,"G","A",225,NA,FALSE,NA,NA,10,0.0654841,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2977018,NA,"G","A",225,NA,FALSE,NA,NA,12,0.219224,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",2989622,NA,"GAA","GA",113,NA,TRUE,7,1,7,0.0256122,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,1,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","140,0","1","GA" +"SRR2584866","CP000819.1",2999330,NA,"G","A",225,NA,FALSE,NA,NA,16,0.260041,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,10,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3001543,NA,"T","C",200,NA,FALSE,NA,NA,11,0.905545,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","230,0","1","C" +"SRR2584866","CP000819.1",3003744,NA,"C","T",221,NA,FALSE,NA,NA,11,0.717752,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,3,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","251,0","1","T" +"SRR2584866","CP000819.1",3036875,NA,"G","A",225,NA,FALSE,NA,NA,17,0.111591,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,5,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3037295,NA,"A","G",225,NA,FALSE,NA,NA,13,0.494762,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3045831,NA,"G","A",81,NA,FALSE,NA,NA,7,0.467314,1,1,1,0.861511,-0.590765,0,NA,NA,"1",1,"1,0,2,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","146,38","1","A" +"SRR2584866","CP000819.1",3050407,NA,"G","A",184,NA,FALSE,NA,NA,9,0.666245,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,2,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","214,0","1","A" +"SRR2584866","CP000819.1",3051321,NA,"G","A",214,NA,FALSE,NA,NA,9,0.814588,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","244,0","1","A" +"SRR2584866","CP000819.1",3052305,NA,"AGCGCGCGCGCG","AGCGCGCGCG",228,NA,TRUE,9,1,9,0.0941653,NA,NA,NA,0.924584,-0.651104,0,NA,NA,"1",1,"0,1,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","AGCGCGCGCG" +"SRR2584866","CP000819.1",3054123,NA,"A","G",199,NA,FALSE,NA,NA,10,0.519034,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","229,0","1","G" +"SRR2584866","CP000819.1",3055043,NA,"T","C",225,NA,FALSE,NA,NA,11,0.603917,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3058629,NA,"T","C",225,NA,FALSE,NA,NA,12,0.0200047,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3061822,NA,"T","C",198,NA,FALSE,NA,NA,11,0.358597,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","228,0","1","C" +"SRR2584866","CP000819.1",3061923,NA,"T","C",211,NA,FALSE,NA,NA,11,0.317029,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","241,0","1","C" +"SRR2584866","CP000819.1",3069644,NA,"T","C",183,NA,FALSE,NA,NA,9,0.143887,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","213,0","1","C" +"SRR2584866","CP000819.1",3072845,NA,"T","C",225,NA,FALSE,NA,NA,15,0.213727,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,9,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3079331,NA,"G","A",189,NA,FALSE,NA,NA,8,0.536793,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","219,0","1","A" +"SRR2584866","CP000819.1",3088148,NA,"G","A",206,NA,FALSE,NA,NA,11,0.440607,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","236,0","1","A" +"SRR2584866","CP000819.1",3088253,NA,"T","C",183,NA,FALSE,NA,NA,8,0.594311,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,2,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","213,0","1","C" +"SRR2584866","CP000819.1",3088370,NA,"TAAAAAAAAA","TAAAAAAAA",38.2233,NA,TRUE,10,0.909091,11,0.0419106,NA,NA,NA,0.950952,-0.670168,0,NA,NA,"1",1,"1,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","65,0","1","TAAAAAAAA" +"SRR2584866","CP000819.1",3091899,NA,"A","G",225,NA,FALSE,NA,NA,11,0.912212,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3098931,NA,"C","A",79,NA,FALSE,NA,NA,4,0.851686,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,0,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","109,0","1","A" +"SRR2584866","CP000819.1",3104029,NA,"G","A",211,NA,FALSE,NA,NA,11,0.84126,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,8,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","241,0","1","A" +"SRR2584866","CP000819.1",3105821,NA,"G","A",220,NA,FALSE,NA,NA,11,0.626335,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","250,0","1","A" +"SRR2584866","CP000819.1",3111675,NA,"GCCCCCC","GCCCCCCC",73,NA,TRUE,9,1,9,0.395707,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","103,0","1","GCCCCCCC" +"SRR2584866","CP000819.1",3112821,NA,"G","A",146,NA,FALSE,NA,NA,7,0.355905,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","176,0","1","A" +"SRR2584866","CP000819.1",3115240,NA,"A","G",225,NA,FALSE,NA,NA,11,0.0588422,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,3,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3115913,NA,"G","A",225,NA,FALSE,NA,NA,19,0.0317393,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,10,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3117097,NA,"G","A",138,NA,FALSE,NA,NA,7,0.0511907,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","168,0","1","A" +"SRR2584866","CP000819.1",3117473,NA,"T","A",223,NA,FALSE,NA,NA,9,0.0288971,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","253,0","1","A" +"SRR2584866","CP000819.1",3127663,NA,"C","T",192,NA,FALSE,NA,NA,8,0.441597,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","222,0","1","T" +"SRR2584866","CP000819.1",3130508,NA,"G","A",193,NA,FALSE,NA,NA,8,0.947685,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","A" +"SRR2584866","CP000819.1",3132910,NA,"G","A",198,NA,FALSE,NA,NA,9,0.584965,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","228,0","1","A" +"SRR2584866","CP000819.1",3135301,NA,"T","C",166,NA,FALSE,NA,NA,10,0.156654,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,1,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","196,0","1","C" +"SRR2584866","CP000819.1",3136451,NA,"A","G",225,NA,FALSE,NA,NA,15,0.0901113,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3139252,NA,"C","A",150,NA,FALSE,NA,NA,6,0.101965,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,1,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","180,0","1","A" +"SRR2584866","CP000819.1",3145072,NA,"T","C",177,NA,FALSE,NA,NA,11,0.248437,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,1,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","207,0","1","C" +"SRR2584866","CP000819.1",3150088,NA,"A","G",199,NA,FALSE,NA,NA,9,0.358583,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","229,0","1","G" +"SRR2584866","CP000819.1",3153250,NA,"G","A",152,NA,FALSE,NA,NA,10,0.642048,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","182,0","1","A" +"SRR2584866","CP000819.1",3153751,NA,"G","A",225,NA,FALSE,NA,NA,11,0.792725,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3153996,NA,"T","C",199,NA,FALSE,NA,NA,8,0.525314,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","229,0","1","C" +"SRR2584866","CP000819.1",3174372,NA,"G","A",225,NA,FALSE,NA,NA,11,0.110731,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3199162,NA,"T","C",225,NA,FALSE,NA,NA,10,0.439579,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3206530,NA,"T","C",199,NA,FALSE,NA,NA,8,0.732068,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","229,0","1","C" +"SRR2584866","CP000819.1",3215725,NA,"A","G",69,NA,FALSE,NA,NA,3,0.102722,NA,NA,NA,1,-0.511536,0,NA,NA,"1",1,"0,0,2,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","99,0","1","G" +"SRR2584866","CP000819.1",3216436,NA,"T","C",121,NA,FALSE,NA,NA,5,0.303109,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","151,0","1","C" +"SRR2584866","CP000819.1",3240003,NA,"T","C",225,NA,FALSE,NA,NA,17,0.197212,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,7,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3240040,NA,"GCCCCCCC","GCCCCCC",81,NA,TRUE,15,1,15,0.00227358,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"1,0,6,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","108,0","1","GCCCCCC" +"SRR2584866","CP000819.1",3247026,NA,"T","C",225,NA,FALSE,NA,NA,22,0.272148,NA,NA,NA,1,-0.69168,0,NA,NA,"1",1,"0,0,10,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3248428,NA,"CAAAAAA","CAAAAA",68,NA,TRUE,8,1,8,0.674881,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","98,0","1","CAAAAA" +"SRR2584866","CP000819.1",3259432,NA,"A","G",153,NA,FALSE,NA,NA,7,0.175312,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","183,0","1","G" +"SRR2584866","CP000819.1",3260564,NA,"GTTTCGCTTTCGCT","GTTTCGCT",225,NA,TRUE,11,1,11,0.881302,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","GTTTCGCT" +"SRR2584866","CP000819.1",3264188,NA,"C","T",225,NA,FALSE,NA,NA,11,0.800627,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3267036,NA,"T","C",219,NA,FALSE,NA,NA,10,0.671615,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","249,0","1","C" +"SRR2584866","CP000819.1",3269750,NA,"A","G",198,NA,FALSE,NA,NA,8,0.194874,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","228,0","1","G" +"SRR2584866","CP000819.1",3272579,NA,"G","A",205,NA,FALSE,NA,NA,8,0.838194,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","235,0","1","A" +"SRR2584866","CP000819.1",3272938,NA,"T","C",225,NA,FALSE,NA,NA,16,0.328077,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,10,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3282260,NA,"T","C",225,NA,FALSE,NA,NA,23,0.0577088,NA,NA,NA,1,-0.692067,0,NA,NA,"1",1,"0,0,9,11",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3286879,NA,"T","C",225,NA,FALSE,NA,NA,14,0.527393,NA,NA,NA,0.896474,-0.680642,0,NA,NA,"1",1,"0,0,5,7",59,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3288025,NA,"T","A",48.4146,NA,FALSE,NA,NA,3,0.440791,NA,NA,NA,NA,-0.511536,0,NA,NA,"1",1,"0,0,0,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","78,0","1","A" +"SRR2584866","CP000819.1",3289930,NA,"T","C",207,NA,FALSE,NA,NA,10,0.89913,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","C" +"SRR2584866","CP000819.1",3290802,NA,"T","C",225,NA,FALSE,NA,NA,12,0.407615,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3297368,NA,"C","T",218,NA,FALSE,NA,NA,10,0.610755,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","248,0","1","T" +"SRR2584866","CP000819.1",3298558,NA,"C","T",190,NA,FALSE,NA,NA,8,0.988655,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","220,0","1","T" +"SRR2584866","CP000819.1",3313854,NA,"A","G",129,NA,FALSE,NA,NA,5,0.692431,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","159,0","1","G" +"SRR2584866","CP000819.1",3316561,NA,"T","C",225,NA,FALSE,NA,NA,14,0.118261,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3317082,NA,"G","A",126,NA,FALSE,NA,NA,7,0.908252,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","156,0","1","A" +"SRR2584866","CP000819.1",3322942,NA,"T","C",220,NA,FALSE,NA,NA,9,0.85857,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","250,0","1","C" +"SRR2584866","CP000819.1",3333272,NA,"A","G",225,NA,FALSE,NA,NA,19,0.89105,NA,NA,NA,1,-0.689466,0,NA,NA,"1",1,"0,0,10,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3339313,NA,"A","C",225,NA,FALSE,NA,NA,12,0.662672,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3339654,NA,"C","T",225,NA,FALSE,NA,NA,12,0.675773,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3344290,NA,"T","C",225,NA,FALSE,NA,NA,11,0.0288954,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3345160,NA,"A","G",225,NA,FALSE,NA,NA,13,0.617745,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3351577,NA,"A","G",225,NA,FALSE,NA,NA,13,0.339431,NA,NA,NA,0.200967,-0.676189,0,NA,NA,"1",1,"0,0,7,4",55,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3360621,NA,"T","C",175,NA,FALSE,NA,NA,8,0.528538,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","205,0","1","C" +"SRR2584866","CP000819.1",3360893,NA,"GCCCCCC","GCCCCCCC",87,NA,TRUE,12,1,12,0.0013348,NA,NA,NA,0.982603,-0.680642,0,NA,NA,"1",1,"0,0,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","117,0","1","GCCCCCCC" +"SRR2584866","CP000819.1",3373583,NA,"C","T",225,NA,FALSE,NA,NA,15,0.441075,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,9,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3379886,NA,"G","A",163,NA,FALSE,NA,NA,7,0.946037,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","193,0","1","A" +"SRR2584866","CP000819.1",3384702,NA,"GCCCCCC","GCCCCCCCC",46.2613,NA,TRUE,4,0.571429,7,0.962524,NA,NA,NA,1.01283,-0.616816,0,NA,NA,"1",1,"1,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","153,80","1","GCCCCCCCC" +"SRR2584866","CP000819.1",3387944,NA,"G","A",195,NA,FALSE,NA,NA,10,0.726018,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,2,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","225,0","1","A" +"SRR2584866","CP000819.1",3395533,NA,"A","G",224,NA,FALSE,NA,NA,11,0.74989,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","254,0","1","G" +"SRR2584866","CP000819.1",3401754,NA,"C","A",192,NA,FALSE,NA,NA,10,0.0980063,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","222,0","1","A" +"SRR2584866","CP000819.1",3402257,NA,"G","A",225,NA,FALSE,NA,NA,16,0.0157933,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,7,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3404346,NA,"GAAAAAAAA","GAAAAAAA",34.1781,NA,TRUE,8,0.888889,9,0.622626,NA,NA,NA,0.974597,-0.651104,0,NA,NA,"1",1,"0,1,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","61,0","1","GAAAAAAA" +"SRR2584866","CP000819.1",3410836,NA,"G","A",213,NA,FALSE,NA,NA,10,0.590673,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","243,0","1","A" +"SRR2584866","CP000819.1",3411327,NA,"G","A",220,NA,FALSE,NA,NA,9,0.414968,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","250,0","1","A" +"SRR2584866","CP000819.1",3411987,NA,"TCCCCCC","TCCCCCCC",48.4146,NA,TRUE,7,1,7,0.170074,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","78,0","1","TCCCCCCC" +"SRR2584866","CP000819.1",3430620,NA,"C","T",225,NA,FALSE,NA,NA,12,0.927598,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3431058,NA,"C","T",225,NA,FALSE,NA,NA,15,0.0907446,NA,NA,NA,0.966012,-0.686358,0,NA,NA,"1",1,"0,0,7,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3436387,NA,"T","C",225,NA,FALSE,NA,NA,15,0.739018,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3441172,NA,"A","G",189,NA,FALSE,NA,NA,8,0.452509,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","219,0","1","G" +"SRR2584866","CP000819.1",3442743,NA,"G","A",225,NA,FALSE,NA,NA,13,0.628403,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3453758,NA,"G","A",215,NA,FALSE,NA,NA,12,0.908559,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,2,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","245,0","1","A" +"SRR2584866","CP000819.1",3459613,NA,"A","T",225,NA,FALSE,NA,NA,16,0.566701,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,6,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3463900,NA,"C","T",198,NA,FALSE,NA,NA,12,0.409198,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,10,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","228,0","1","T" +"SRR2584866","CP000819.1",3465205,NA,"G","T",225,NA,FALSE,NA,NA,14,0.123939,NA,NA,NA,0.966012,-0.686358,0,NA,NA,"1",1,"0,0,7,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3474014,NA,"T","C",203,NA,FALSE,NA,NA,11,0.0611834,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","233,0","1","C" +"SRR2584866","CP000819.1",3480466,NA,"C","T",193,NA,FALSE,NA,NA,9,0.0795038,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","223,0","1","T" +"SRR2584866","CP000819.1",3481820,NA,"A","G",141,NA,FALSE,NA,NA,6,0.963757,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","171,0","1","G" +"SRR2584866","CP000819.1",3483302,NA,"G","A",151,NA,FALSE,NA,NA,7,0.031576,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","181,0","1","A" +"SRR2584866","CP000819.1",3484278,NA,"C","A",197,NA,FALSE,NA,NA,10,0.279539,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,8,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","227,0","1","A" +"SRR2584866","CP000819.1",3485103,NA,"G","A",223,NA,FALSE,NA,NA,11,0.375147,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","253,0","1","A" +"SRR2584866","CP000819.1",3488669,NA,"A","C",221,NA,FALSE,NA,NA,10,0.260793,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","251,0","1","C" +"SRR2584866","CP000819.1",3491881,NA,"G","A",225,NA,FALSE,NA,NA,10,0.00724193,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3499794,NA,"A","G",225,NA,FALSE,NA,NA,12,0.762155,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3500884,NA,"A","G",225,NA,FALSE,NA,NA,16,0.0418809,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,11,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3507565,NA,"T","C",194,NA,FALSE,NA,NA,8,0.933109,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","224,0","1","C" +"SRR2584866","CP000819.1",3516348,NA,"C","T",218,NA,FALSE,NA,NA,10,0.0738748,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","248,0","1","T" +"SRR2584866","CP000819.1",3523524,NA,"G","A",179,NA,FALSE,NA,NA,18,0.941972,0.368713,1,0.641825,1,-0.683931,0,NA,NA,"1",1,"2,1,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,49","1","A" +"SRR2584866","CP000819.1",3536603,NA,"G","A",225,NA,FALSE,NA,NA,29,0.297498,NA,NA,NA,1,-0.692976,0,NA,NA,"1",1,"0,0,13,13",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3538863,NA,"G","A",4.94782,NA,FALSE,NA,NA,15,0.753018,0.925999,0.966012,0.714506,0.966012,-0.636426,0,NA,NA,"1",1,"4,3,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","180,149","1","A" +"SRR2584866","CP000819.1",3541351,NA,"T","C",225,NA,FALSE,NA,NA,12,0.338575,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3549957,NA,"A","G",137,NA,FALSE,NA,NA,9,0.347684,NA,NA,NA,NA,-0.662043,0,NA,NA,"1",1,"0,0,0,9",49,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","167,0","1","G" +"SRR2584866","CP000819.1",3549960,NA,"T","C",133,NA,FALSE,NA,NA,9,0.347684,NA,NA,NA,NA,-0.662043,0,NA,NA,"1",1,"0,0,0,9",49,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","163,0","1","C" +"SRR2584866","CP000819.1",3550005,NA,"G","A",42.3274,NA,FALSE,NA,NA,5,0.045681,1,1,1,1,-0.511536,0.2,NA,NA,"1",1,"1,0,0,3",22,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","69,0","1","A" +"SRR2584866","CP000819.1",3550032,NA,"G","C",41.3264,NA,FALSE,NA,NA,4,0.045681,1,1,1,NA,-0.511536,0.25,NA,NA,"1",1,"0,1,0,3",22,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","68,0","1","C" +"SRR2584866","CP000819.1",3550069,NA,"A","C",5.04598,NA,FALSE,NA,NA,2,0.8,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,0,2",18,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","33,0","1","C" +"SRR2584866","CP000819.1",3550071,NA,"C","T",5.04598,NA,FALSE,NA,NA,3,0.84,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,0,2",18,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","33,0","1","T" +"SRR2584866","CP000819.1",3550082,NA,"C","G",34.4159,NA,FALSE,NA,NA,3,0.045681,NA,NA,NA,NA,-0.511536,0,NA,NA,"1",1,"0,0,0,3",30,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","64,0","1","G" +"SRR2584866","CP000819.1",3553108,NA,"A","G",75,NA,FALSE,NA,NA,4,0.585652,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",37,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","105,0","1","G" +"SRR2584866","CP000819.1",3553110,NA,"C","G",80,NA,FALSE,NA,NA,4,0.573211,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",37,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","110,0","1","G" +"SRR2584866","CP000819.1",3557410,NA,"T","C",72,NA,FALSE,NA,NA,21,0.0288131,0.864302,1,0.466776,1,-0.680642,0,NA,NA,"1",1,"3,5,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,156","1","C" +"SRR2584866","CP000819.1",3559835,NA,"A","G",225,NA,FALSE,NA,NA,26,0.269062,NA,NA,NA,1,-0.692831,0,NA,NA,"1",1,"0,0,9,15",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3573006,NA,"G","A",225,NA,FALSE,NA,NA,41,0.157371,NA,NA,NA,1,-0.693127,0,NA,NA,"1",1,"0,0,18,15",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3573640,NA,"G","A",225,NA,FALSE,NA,NA,12,0.218041,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,3,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3575602,NA,"G","A",225,NA,FALSE,NA,NA,23,0.195752,NA,NA,NA,1,-0.69168,0,NA,NA,"1",1,"0,0,14,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3602703,NA,"G","A",225,NA,FALSE,NA,NA,22,0.898472,NA,NA,NA,1,-0.69168,0,NA,NA,"1",1,"0,0,8,11",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3605955,NA,"T","C",46.49,NA,FALSE,NA,NA,19,0.201452,0.105399,1,0.957526,1,-0.662043,0,NA,NA,"1",1,"3,3,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","231,157","1","C" +"SRR2584866","CP000819.1",3612959,NA,"C","T",225,NA,FALSE,NA,NA,25,0.537902,NA,NA,NA,1,-0.692717,0,NA,NA,"1",1,"0,0,13,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3613010,NA,"G","A",225,NA,FALSE,NA,NA,27,0.702889,NA,NA,NA,1,-0.692717,0,NA,NA,"1",1,"0,0,16,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3616490,NA,"C","T",225,NA,FALSE,NA,NA,16,0.179528,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,11,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3635762,NA,"T","C",225,NA,FALSE,NA,NA,16,0.752305,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,6,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3651129,NA,"G","A",211,NA,FALSE,NA,NA,46,0.659505,0.380612,1.71141e-05,0.942684,0.99961,-0.670168,0.652174,NA,NA,"1",1,"19,11,6,4",14,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","238,0","1","A" +"SRR2584866","CP000819.1",3660304,NA,"T","C",225,NA,FALSE,NA,NA,10,0.48031,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3675292,NA,"G","A",225,NA,FALSE,NA,NA,16,0.993147,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,5,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3686883,NA,"A","G",207,NA,FALSE,NA,NA,9,0.803731,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","237,0","1","G" +"SRR2584866","CP000819.1",3688320,NA,"G","A",205,NA,FALSE,NA,NA,9,0.100842,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","235,0","1","A" +"SRR2584866","CP000819.1",3693509,NA,"T","C",163,NA,FALSE,NA,NA,8,0.00154198,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","193,0","1","C" +"SRR2584866","CP000819.1",3715587,NA,"TGGGGGG","TGGGGGGG",81,NA,TRUE,8,0.888889,9,0.980421,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","111,0","1","TGGGGGGG" +"SRR2584866","CP000819.1",3754155,NA,"CAAAAAAAA","CAAAAAAA",38.3376,NA,TRUE,9,1,9,0.472523,NA,NA,NA,0.924584,-0.651104,0,NA,NA,"1",1,"0,1,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","65,0","1","CAAAAAAA" +"SRR2584866","CP000819.1",3758443,NA,"C","T",225,NA,FALSE,NA,NA,16,0.155841,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,10,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3771026,NA,"T","C",177,NA,FALSE,NA,NA,9,0.3669,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","207,0","1","C" +"SRR2584866","CP000819.1",3790391,NA,"T","C",225,NA,FALSE,NA,NA,15,0.35295,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3793199,NA,"C","T",225,NA,FALSE,NA,NA,13,0.00162196,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,3,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3800073,NA,"A","G",225,NA,FALSE,NA,NA,12,0.413877,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3814755,NA,"G","A",108,NA,FALSE,NA,NA,6,0.0445763,NA,NA,NA,NA,-0.590765,0,NA,NA,"1",1,"0,0,5,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","138,0","1","A" +"SRR2584866","CP000819.1",3817441,NA,"G","A",117,NA,FALSE,NA,NA,7,0.413797,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","147,0","1","A" +"SRR2584866","CP000819.1",3818588,NA,"C","T",225,NA,FALSE,NA,NA,14,0.639611,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,3,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3819257,NA,"G","A",185,NA,FALSE,NA,NA,10,0.461975,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","215,0","1","A" +"SRR2584866","CP000819.1",3827368,NA,"A","G",225,NA,FALSE,NA,NA,14,0.792953,NA,NA,NA,0.964642,-0.676189,0,NA,NA,"1",1,"0,0,7,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3837268,NA,"G","A",51,NA,FALSE,NA,NA,2,0.66,NA,NA,NA,1,-0.453602,0,NA,NA,"1",1,"0,0,1,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","81,0","1","A" +"SRR2584866","CP000819.1",3841457,NA,"G","A",221,NA,FALSE,NA,NA,10,0.637028,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","251,0","1","A" +"SRR2584866","CP000819.1",3845230,NA,"C","T",178,NA,FALSE,NA,NA,8,0.578041,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","208,0","1","T" +"SRR2584866","CP000819.1",3850177,NA,"GA","G",222,NA,TRUE,12,1,12,0.217648,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,1,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","249,0","1","G" +"SRR2584866","CP000819.1",3862134,NA,"G","A",219,NA,FALSE,NA,NA,10,0.396463,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","249,0","1","A" +"SRR2584866","CP000819.1",3864940,NA,"G","A",157,NA,FALSE,NA,NA,7,0.898035,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","187,0","1","A" +"SRR2584866","CP000819.1",3865084,NA,"G","A",218,NA,FALSE,NA,NA,11,0.435867,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","248,0","1","A" +"SRR2584866","CP000819.1",3866570,NA,"T","C",225,NA,FALSE,NA,NA,13,0.967056,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,7,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3867544,NA,"A","G",214,NA,FALSE,NA,NA,10,0.920203,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","244,0","1","G" +"SRR2584866","CP000819.1",3869813,NA,"G","A",169,NA,FALSE,NA,NA,9,0.414654,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","199,0","1","A" +"SRR2584866","CP000819.1",3872302,NA,"G","A",225,NA,FALSE,NA,NA,12,0.0177502,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3874844,NA,"G","A",225,NA,FALSE,NA,NA,23,0.996664,NA,NA,NA,1,-0.692352,0,NA,NA,"1",1,"0,0,11,10",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",3876298,NA,"T","C",150,NA,FALSE,NA,NA,6,0.551573,NA,NA,NA,0.861511,-0.616816,0,NA,NA,"1",1,"0,0,3,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","180,0","1","C" +"SRR2584866","CP000819.1",3876316,NA,"T","C",118,NA,FALSE,NA,NA,6,0.728443,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","148,0","1","C" +"SRR2584866","CP000819.1",3881184,NA,"T","C",225,NA,FALSE,NA,NA,14,0.582439,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,3,11",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3882675,NA,"G","C",206,NA,FALSE,NA,NA,9,0.0352033,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","236,0","1","C" +"SRR2584866","CP000819.1",3887330,NA,"C","T",203,NA,FALSE,NA,NA,10,0.646494,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","233,0","1","T" +"SRR2584866","CP000819.1",3887630,NA,"C","T",225,NA,FALSE,NA,NA,11,0.286863,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3889768,NA,"A","G",109,NA,FALSE,NA,NA,6,0.170249,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","139,0","1","G" +"SRR2584866","CP000819.1",3891537,NA,"AGGGGGGG","AGGGGG",87,NA,TRUE,9,0.75,12,0.666096,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"1,2,3,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","152,38","1","AGGGGG" +"SRR2584866","CP000819.1",3892335,NA,"C","T",225,NA,FALSE,NA,NA,15,0.754234,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,0,4,11",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3893548,NA,"CCA","C",225,NA,TRUE,11,1,11,0.264184,NA,NA,NA,0.716531,-0.676189,0,NA,NA,"1",1,"0,0,3,8",55,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3903461,NA,"G","A",59,NA,FALSE,NA,NA,3,0.102722,NA,NA,NA,NA,-0.511536,0,NA,NA,"1",1,"0,0,0,3",34,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","89,0","1","A" +"SRR2584866","CP000819.1",3903693,NA,"G","A",22.1294,NA,FALSE,NA,NA,7,0.943964,0.861511,0.28717,0.861511,0.25,-0.511536,0.428571,NA,NA,"1",1,"2,1,0,3",12,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","49,0","1","A" +"SRR2584866","CP000819.1",3903704,NA,"C","T",22.1294,NA,FALSE,NA,NA,7,0.943964,0.861511,0.28717,0.861511,0.25,-0.511536,0.428571,NA,NA,"1",1,"2,1,0,3",12,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","49,0","1","T" +"SRR2584866","CP000819.1",3909807,NA,"G","T",225,NA,FALSE,NA,NA,15,0.612432,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3920984,NA,"C","T",225,NA,FALSE,NA,NA,13,0.0288079,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",3945856,NA,"C","T",105,NA,FALSE,NA,NA,4,0.106137,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,1,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","135,0","1","T" +"SRR2584866","CP000819.1",3947416,NA,"A","G",225,NA,FALSE,NA,NA,12,0.884681,NA,NA,NA,0.952347,-0.670168,0,NA,NA,"1",1,"0,0,5,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",3948303,NA,"CG","C",225,NA,TRUE,12,1,12,0.775148,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",3952766,NA,"T","C",134,NA,FALSE,NA,NA,9,0.15244,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","164,0","1","C" +"SRR2584866","CP000819.1",4004862,NA,"A","G",216,NA,FALSE,NA,NA,10,0.598931,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","246,0","1","G" +"SRR2584866","CP000819.1",4022380,NA,"GTTTTTTTT","GTTTTTTTTT",11.1284,NA,TRUE,9,0.818182,11,0.860217,NA,NA,NA,0.950952,-0.636426,0,NA,NA,"1",1,"2,2,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","39,1","1","GTTTTTTTTT" +"SRR2584866","CP000819.1",4038529,NA,"GTTTTTTTTT","GTTTTTTTT",15.0048,NA,TRUE,20,0.952381,21,0.978787,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"1,6,8,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","42,0","1","GTTTTTTTT" +"SRR2584866","CP000819.1",4046586,NA,"T","A",225,NA,FALSE,NA,NA,15,0.465148,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,8,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",4048980,NA,"C","T",225,NA,FALSE,NA,NA,15,0.974316,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,10,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4100183,NA,"A","G",215,NA,FALSE,NA,NA,11,0.258129,NA,NA,NA,0.916482,-0.670168,0,NA,NA,"1",1,"0,0,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","245,0","1","G" +"SRR2584866","CP000819.1",4107018,NA,"T","A",169,NA,FALSE,NA,NA,7,0.0619854,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","199,0","1","A" +"SRR2584866","CP000819.1",4134227,NA,"C","T",225,NA,FALSE,NA,NA,15,0.0975645,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,10,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4154522,NA,"A","G",157,NA,FALSE,NA,NA,6,0.752618,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","187,0","1","G" +"SRR2584866","CP000819.1",4163775,NA,"C","T",225,NA,FALSE,NA,NA,12,0.993137,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4164379,NA,"A","G",169,NA,FALSE,NA,NA,7,0.264054,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","199,0","1","G" +"SRR2584866","CP000819.1",4165619,NA,"G","A",213,NA,FALSE,NA,NA,11,0.346875,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","243,0","1","A" +"SRR2584866","CP000819.1",4180200,NA,"T","G",142,NA,FALSE,NA,NA,7,0.81988,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","172,0","1","G" +"SRR2584866","CP000819.1",4196682,NA,"C","A",225,NA,FALSE,NA,NA,16,0.361518,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,10,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",4199326,NA,"A","G",161,NA,FALSE,NA,NA,7,0.931218,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","191,0","1","G" +"SRR2584866","CP000819.1",4201958,NA,"A","C",191,NA,FALSE,NA,NA,8,0.479284,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","221,0","1","C" +"SRR2584866","CP000819.1",4207843,NA,"G","A",225,NA,FALSE,NA,NA,16,0.267024,NA,NA,NA,0.982603,-0.680642,0,NA,NA,"1",1,"0,0,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",4211840,NA,"C","A",155,NA,FALSE,NA,NA,8,0.645707,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,4,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","185,0","1","A" +"SRR2584866","CP000819.1",4212514,NA,"T","C",225,NA,FALSE,NA,NA,14,0.539469,NA,NA,NA,0.961166,-0.683931,0,NA,NA,"1",1,"0,0,7,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",4212600,NA,"G","A",123,NA,FALSE,NA,NA,9,0.149847,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","153,0","1","A" +"SRR2584866","CP000819.1",4216922,NA,"G","T",194,NA,FALSE,NA,NA,8,0.536793,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,5,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","224,0","1","T" +"SRR2584866","CP000819.1",4221712,NA,"G","A",162,NA,FALSE,NA,NA,8,0.424295,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","192,0","1","A" +"SRR2584866","CP000819.1",4227623,NA,"A","G",171,NA,FALSE,NA,NA,8,0.516921,NA,NA,NA,1,-0.590765,0,NA,NA,"1",1,"0,0,3,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","201,0","1","G" +"SRR2584866","CP000819.1",4234851,NA,"T","C",201,NA,FALSE,NA,NA,8,0.083858,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","231,0","1","C" +"SRR2584866","CP000819.1",4245184,NA,"C","T",223,NA,FALSE,NA,NA,11,0.0933471,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","253,0","1","T" +"SRR2584866","CP000819.1",4248049,NA,"T","C",206,NA,FALSE,NA,NA,9,0.633718,NA,NA,NA,0.924584,-0.662043,0,NA,NA,"1",1,"0,0,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","236,0","1","C" +"SRR2584866","CP000819.1",4250292,NA,"C","T",225,NA,FALSE,NA,NA,13,0.00888554,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4265633,NA,"G","A",225,NA,FALSE,NA,NA,12,0.708194,NA,NA,NA,0.8618,-0.676189,0,NA,NA,"1",1,"0,0,6,5",59,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",4286515,NA,"C","T",190,NA,FALSE,NA,NA,9,0.267186,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,5,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","220,0","1","T" +"SRR2584866","CP000819.1",4342010,NA,"A","G",225,NA,FALSE,NA,NA,13,0.406086,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,10,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",4347725,NA,"C","T",116,NA,FALSE,NA,NA,5,0.348933,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,1,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","146,0","1","T" +"SRR2584866","CP000819.1",4347803,NA,"A","G",143,NA,FALSE,NA,NA,6,0.368038,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,2,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","173,0","1","G" +"SRR2584866","CP000819.1",4349281,NA,"ATTTTTTTT","ATTTTTTT",44.2229,NA,TRUE,9,0.818182,11,0.95493,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,1,3,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","71,0","1","ATTTTTTT" +"SRR2584866","CP000819.1",4356652,NA,"G","A",217,NA,FALSE,NA,NA,9,0.162453,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","247,0","1","A" +"SRR2584866","CP000819.1",4363338,NA,"C","A",176,NA,FALSE,NA,NA,10,0.399237,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,1,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","206,0","1","A" +"SRR2584866","CP000819.1",4377265,NA,"A","G",225,NA,FALSE,NA,NA,16,0.921692,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,4,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","G" +"SRR2584866","CP000819.1",4380317,NA,"A","G",181,NA,FALSE,NA,NA,7,0.228249,NA,NA,NA,1.01283,-0.636426,0,NA,NA,"1",1,"0,0,3,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","211,0","1","G" +"SRR2584866","CP000819.1",4380609,NA,"C","T",225,NA,FALSE,NA,NA,12,0.568173,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,6,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4382294,NA,"C","T",212,NA,FALSE,NA,NA,9,0.598931,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","242,0","1","T" +"SRR2584866","CP000819.1",4384556,NA,"C","T",225,NA,FALSE,NA,NA,13,0.691465,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,5,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4392683,NA,"ACCCC","ACCCCC",155,NA,TRUE,16,0.888889,18,0.989366,NA,NA,NA,1,-0.688148,0,NA,NA,"1",1,"0,3,8,7",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","182,0","1","ACCCCC" +"SRR2584866","CP000819.1",4423016,NA,"C","T",225,NA,FALSE,NA,NA,12,0.789033,NA,NA,NA,0.982603,-0.680642,0,NA,NA,"1",1,"0,0,6,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4423593,NA,"G","T",225,NA,FALSE,NA,NA,13,0.815149,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,10,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4424649,NA,"T","C",153,NA,FALSE,NA,NA,8,0.493668,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,1,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","183,0","1","C" +"SRR2584866","CP000819.1",4429654,NA,"G","A",149,NA,FALSE,NA,NA,7,0.561812,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,1,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","179,0","1","A" +"SRR2584866","CP000819.1",4431393,NA,"TGG","T",228,NA,TRUE,11,1,11,0.138406,NA,NA,NA,0.950952,-0.662043,0,NA,NA,"1",1,"0,2,6,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4433347,NA,"A","G",149,NA,FALSE,NA,NA,7,0.427991,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,5,1",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","179,0","1","G" +"SRR2584866","CP000819.1",4439519,NA,"A","G",162,NA,FALSE,NA,NA,7,0.810254,NA,NA,NA,1,-0.636426,0,NA,NA,"1",1,"0,0,2,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","192,0","1","G" +"SRR2584866","CP000819.1",4461084,NA,"G","A",186,NA,FALSE,NA,NA,9,0.651328,NA,NA,NA,1,-0.616816,0,NA,NA,"1",1,"0,0,4,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","216,0","1","A" +"SRR2584866","CP000819.1",4462040,NA,"C","T",103,NA,FALSE,NA,NA,5,0.510154,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,1,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","133,0","1","T" +"SRR2584866","CP000819.1",4472216,NA,"C","T",190,NA,FALSE,NA,NA,9,0.295648,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,5,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","220,0","1","T" +"SRR2584866","CP000819.1",4486277,NA,"C","T",212,NA,FALSE,NA,NA,9,0.887079,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","242,0","1","T" +"SRR2584866","CP000819.1",4491013,NA,"A","G",96,NA,FALSE,NA,NA,5,0.0456508,NA,NA,NA,NA,-0.590765,0,NA,NA,"1",1,"0,0,0,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","126,0","1","G" +"SRR2584866","CP000819.1",4498157,NA,"G","A",40.4148,NA,FALSE,NA,NA,2,0.26,NA,NA,NA,NA,-0.453602,0,NA,NA,"1",1,"0,0,0,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","70,0","1","A" +"SRR2584866","CP000819.1",4498856,NA,"T","C",225,NA,FALSE,NA,NA,12,0.945914,NA,NA,NA,1,-0.676189,0,NA,NA,"1",1,"0,0,8,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","C" +"SRR2584866","CP000819.1",4503582,NA,"T","C",111,NA,FALSE,NA,NA,6,0.446842,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,2,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","141,0","1","C" +"SRR2584866","CP000819.1",4509290,NA,"C","T",225,NA,FALSE,NA,NA,15,0.254982,NA,NA,NA,0.95494,-0.680642,0,NA,NA,"1",1,"0,0,7,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4588672,NA,"A","G",118,NA,FALSE,NA,NA,5,0.459448,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,1,3",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","148,0","1","G" +"SRR2584866","CP000819.1",4593186,NA,"G","A",225,NA,FALSE,NA,NA,18,0.269766,NA,NA,NA,1,-0.686358,0,NA,NA,"1",1,"0,0,9,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",4603391,NA,"C","T",208,NA,FALSE,NA,NA,12,0.206708,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,2,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","238,0","1","T" +"SRR2584866","CP000819.1",4603678,NA,"A","G",57,NA,FALSE,NA,NA,4,0.0298006,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",43,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","87,0","1","G" +"SRR2584866","CP000819.1",4603681,NA,"C","T",58,NA,FALSE,NA,NA,4,0.0298006,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",43,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","88,0","1","T" +"SRR2584866","CP000819.1",4603730,NA,"C","T",28.4205,NA,FALSE,NA,NA,3,0.121629,NA,NA,NA,1,-0.511536,0.333333,NA,NA,"1",1,"0,0,1,2",29,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","58,0","1","T" +"SRR2584866","CP000819.1",4603737,NA,"T","C",82,NA,FALSE,NA,NA,7,0.102049,NA,NA,NA,0.666667,-0.590765,0.142857,NA,NA,"1",1,"0,0,2,3",41,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","112,0","1","C" +"SRR2584866","CP000819.1",4604220,NA,"G","A",225,NA,FALSE,NA,NA,12,0.874935,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,8,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","A" +"SRR2584866","CP000819.1",4604577,NA,"G","C",218,NA,FALSE,NA,NA,11,0.774083,NA,NA,NA,1,-0.662043,0,NA,NA,"1",1,"0,0,7,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","248,0","1","C" +"SRR2584866","CP000819.1",4613668,NA,"C","T",196,NA,FALSE,NA,NA,9,0.671615,NA,NA,NA,0.974597,-0.662043,0,NA,NA,"1",1,"0,0,4,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","226,0","1","T" +"SRR2584866","CP000819.1",4616538,NA,"A","C",92,NA,FALSE,NA,NA,5,0.950351,NA,NA,NA,NA,-0.556411,0,NA,NA,"1",1,"0,0,4,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","122,0","1","C" +"SRR2584866","CP000819.1",4626487,NA,"C","T",225,NA,FALSE,NA,NA,13,0.608538,NA,NA,NA,0.950952,-0.676189,0,NA,NA,"1",1,"0,0,6,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2584866","CP000819.1",4629225,NA,"C","T",225,NA,FALSE,NA,NA,10,0.89913,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,6,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2584866.aligned.sorted.bam","255,0","1","T" +"SRR2589044","CP000819.1",648692,NA,"C","T",225,NA,FALSE,NA,NA,13,0.523489,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,9,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","255,0","1","T" +"SRR2589044","CP000819.1",1331794,NA,"C","A",169,NA,FALSE,NA,NA,8,0.22253,NA,NA,NA,1,-0.651104,0,NA,NA,"1",1,"0,0,6,2",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","199,0","1","A" +"SRR2589044","CP000819.1",1733343,NA,"G","A",225,NA,FALSE,NA,NA,16,0.388723,NA,NA,NA,1,-0.683931,0,NA,NA,"1",1,"0,0,5,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","255,0","1","A" +"SRR2589044","CP000819.1",2103887,NA,"ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG","ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG",225,NA,TRUE,10,1,10,0.134903,NA,NA,NA,1,-0.670168,0,NA,NA,"1",1,"0,0,1,9",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","255,0","1","ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG" +"SRR2589044","CP000819.1",3444175,NA,"G","T",184,NA,FALSE,NA,NA,9,0.471462,NA,NA,NA,0.992367,-0.651104,0,NA,NA,"1",1,"0,0,4,4",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","214,0","1","T" +"SRR2589044","CP000819.1",3481820,NA,"A","G",225,NA,FALSE,NA,NA,12,0.870724,NA,NA,NA,1,-0.680642,0,NA,NA,"1",1,"0,0,4,8",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","255,0","1","G" +"SRR2589044","CP000819.1",3893550,NA,"AG","AGG",101,NA,TRUE,4,1,4,0.918297,NA,NA,NA,1,-0.556411,0,NA,NA,"1",1,"0,0,3,1",52,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","131,0","1","AGG" +"SRR2589044","CP000819.1",3901455,NA,"A","AC",70,NA,TRUE,3,1,3,0.0221621,NA,NA,NA,NA,-0.511536,0,NA,NA,"1",1,"0,0,3,0",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","100,0","1","AC" +"SRR2589044","CP000819.1",4100183,NA,"A","G",177,NA,FALSE,NA,NA,8,0.92727,NA,NA,NA,0.900802,-0.651104,0,NA,NA,"1",1,"0,0,3,5",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","207,0","1","G" +"SRR2589044","CP000819.1",4431393,NA,"TGG","T",225,NA,TRUE,10,1,10,0.748814,NA,NA,NA,1.00775,-0.670168,0,NA,NA,"1",1,"0,0,4,6",60,"/home/dcuser/dc_workshop/results/bam/SRR2589044.aligned.sorted.bam","255,0","1","T" diff --git a/depth.pdf b/depth.pdf new file mode 100644 index 000000000..a15f23a48 Binary files /dev/null and b/depth.pdf differ diff --git a/discuss.md b/discuss.md new file mode 100644 index 000000000..3a59d6830 --- /dev/null +++ b/discuss.md @@ -0,0 +1,7 @@ +--- +title: Discussion +--- + +FIXME + + diff --git a/exercise_solution.csv b/exercise_solution.csv new file mode 100644 index 000000000..87ffb246d --- /dev/null +++ b/exercise_solution.csv @@ -0,0 +1,31 @@ +"","sample_id","generation","clade","strain","cit","run","genome_size","genome_size_bp" +"1","REL606",0,"NA","REL606","unknown",NA,4.62,4620000 +"2","REL1166A",2000,"unknown","REL606","unknown","SRR098028",4.63,4630000 +"3","ZDB409",5000,"unknown","REL606","unknown","SRR098281",4.6,4600000 +"4","ZDB429",10000,"UC","REL606","unknown","SRR098282",4.59,4590000 +"5","ZDB446",15000,"UC","REL606","unknown","SRR098283",4.66,4660000 +"6","ZDB458",20000,"(C1,C2)","REL606","unknown","SRR098284",4.63,4630000 +"7","ZDB464*",20000,"(C1,C2)","REL606","unknown","SRR098285",4.62,4620000 +"8","ZDB467",20000,"(C1,C2)","REL606","unknown","SRR098286",4.61,4610000 +"9","ZDB477",25000,"C1","REL606","unknown","SRR098287",4.65,4650000 +"10","ZDB483",25000,"C3","REL606","unknown","SRR098288",4.59,4590000 +"11","ZDB16",30000,"C1","REL606","unknown","SRR098031",4.61,4610000 +"12","ZDB357",30000,"C2","REL606","unknown","SRR098280",4.62,4620000 +"13","ZDB199*",31500,"C1","REL606","minus","SRR098044",4.62,4620000 +"14","ZDB200",31500,"C2","REL606","minus","SRR098279",4.63,4630000 +"15","ZDB564",31500,"Cit+","REL606","plus","SRR098289",4.74,4740000 +"16","ZDB30*",32000,"C3","REL606","minus","SRR098032",4.61,4610000 +"17","ZDB172",32000,"Cit+","REL606","plus","SRR098042",4.77,4770000 +"18","ZDB158",32500,"C2","REL606","minus","SRR098041",4.63,4630000 +"19","ZDB143",32500,"Cit+","REL606","plus","SRR098040",4.79,4790000 +"20","CZB199",33000,"C1","REL606","minus","SRR098027",4.59,4590000 +"21","CZB152",33000,"Cit+","REL606","plus","SRR097977",4.8,4800000 +"22","CZB154",33000,"Cit+","REL606","plus","SRR098026",4.76,4760000 +"23","ZDB83",34000,"Cit+","REL606","minus","SRR098034",4.6,4600000 +"24","ZDB87",34000,"C2","REL606","plus","SRR098035",4.75,4750000 +"25","ZDB96",36000,"Cit+","REL606","plus","SRR098036",4.74,4740000 +"26","ZDB99",36000,"C1","REL606","minus","SRR098037",4.61,4610000 +"27","ZDB107",38000,"Cit+","REL606","plus","SRR098038",4.79,4790000 +"28","ZDB111",38000,"C2","REL606","minus","SRR098039",4.62,4620000 +"29","REL10979",40000,"Cit+","REL606","plus","SRR098029",4.78,4780000 +"30","REL10988",40000,"C2","REL606","minus","SRR098030",4.62,4620000 diff --git a/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-10-1.png b/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-10-1.png new file mode 100644 index 000000000..a9cacb295 Binary files /dev/null and b/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-10-1.png differ diff --git a/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-14-1.png b/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-14-1.png new file mode 100644 index 000000000..99e25c6db Binary files /dev/null and b/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-14-1.png differ diff --git a/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-16-1.png b/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-16-1.png new file mode 100644 index 000000000..a8a85851a Binary files /dev/null and b/fig/03-basics-factors-dataframes-rendered-unnamed-chunk-16-1.png differ diff --git a/fig/06-data-visualization-rendered-add-axis-labels-1.png b/fig/06-data-visualization-rendered-add-axis-labels-1.png new file mode 100644 index 000000000..35961aef3 Binary files /dev/null and b/fig/06-data-visualization-rendered-add-axis-labels-1.png differ diff --git a/fig/06-data-visualization-rendered-add-main-title-1.png b/fig/06-data-visualization-rendered-add-main-title-1.png new file mode 100644 index 000000000..fd9b7371a Binary files /dev/null and b/fig/06-data-visualization-rendered-add-main-title-1.png differ diff --git a/fig/06-data-visualization-rendered-adding-colors-1.png b/fig/06-data-visualization-rendered-adding-colors-1.png new file mode 100644 index 000000000..4fd003da0 Binary files /dev/null and b/fig/06-data-visualization-rendered-adding-colors-1.png differ diff --git a/fig/06-data-visualization-rendered-adding-transparency-1.png b/fig/06-data-visualization-rendered-adding-transparency-1.png new file mode 100644 index 000000000..65fd82cc0 Binary files /dev/null and b/fig/06-data-visualization-rendered-adding-transparency-1.png differ diff --git a/fig/06-data-visualization-rendered-barplot-1.png b/fig/06-data-visualization-rendered-barplot-1.png new file mode 100644 index 000000000..d82c201c7 Binary files /dev/null and b/fig/06-data-visualization-rendered-barplot-1.png differ diff --git a/fig/06-data-visualization-rendered-barplot-challenge-1.png b/fig/06-data-visualization-rendered-barplot-challenge-1.png new file mode 100644 index 000000000..1eb3c7368 Binary files /dev/null and b/fig/06-data-visualization-rendered-barplot-challenge-1.png differ diff --git a/fig/06-data-visualization-rendered-change-font-family-1.png b/fig/06-data-visualization-rendered-change-font-family-1.png new file mode 100644 index 000000000..d0e547b55 Binary files /dev/null and b/fig/06-data-visualization-rendered-change-font-family-1.png differ diff --git a/fig/06-data-visualization-rendered-color-by-sample-1-1.png b/fig/06-data-visualization-rendered-color-by-sample-1-1.png new file mode 100644 index 000000000..b52aab00f Binary files /dev/null and b/fig/06-data-visualization-rendered-color-by-sample-1-1.png differ diff --git a/fig/06-data-visualization-rendered-color-by-sample-2-1.png b/fig/06-data-visualization-rendered-color-by-sample-2-1.png new file mode 100644 index 000000000..2e2ad6e77 Binary files /dev/null and b/fig/06-data-visualization-rendered-color-by-sample-2-1.png differ diff --git a/fig/06-data-visualization-rendered-create-ggplot-object-1.png b/fig/06-data-visualization-rendered-create-ggplot-object-1.png new file mode 100644 index 000000000..7cea1f270 Binary files /dev/null and b/fig/06-data-visualization-rendered-create-ggplot-object-1.png differ diff --git a/fig/06-data-visualization-rendered-density-1.png b/fig/06-data-visualization-rendered-density-1.png new file mode 100644 index 000000000..fffbd1471 Binary files /dev/null and b/fig/06-data-visualization-rendered-density-1.png differ diff --git a/fig/06-data-visualization-rendered-density-challenge-1.png b/fig/06-data-visualization-rendered-density-challenge-1.png new file mode 100644 index 000000000..7bb70c131 Binary files /dev/null and b/fig/06-data-visualization-rendered-density-challenge-1.png differ diff --git a/fig/06-data-visualization-rendered-facet-plot-white-bg-1.png b/fig/06-data-visualization-rendered-facet-plot-white-bg-1.png new file mode 100644 index 000000000..c0d43d702 Binary files /dev/null and b/fig/06-data-visualization-rendered-facet-plot-white-bg-1.png differ diff --git a/fig/06-data-visualization-rendered-first-facet-1.png b/fig/06-data-visualization-rendered-first-facet-1.png new file mode 100644 index 000000000..d07aa6439 Binary files /dev/null and b/fig/06-data-visualization-rendered-first-facet-1.png differ diff --git a/fig/06-data-visualization-rendered-first-ggplot-1.png b/fig/06-data-visualization-rendered-first-ggplot-1.png new file mode 100644 index 000000000..7cea1f270 Binary files /dev/null and b/fig/06-data-visualization-rendered-first-ggplot-1.png differ diff --git a/fig/06-data-visualization-rendered-scatter-challenge-1.png b/fig/06-data-visualization-rendered-scatter-challenge-1.png new file mode 100644 index 000000000..403970193 Binary files /dev/null and b/fig/06-data-visualization-rendered-scatter-challenge-1.png differ diff --git a/fig/06-data-visualization-rendered-scatter-challenge-2-1.png b/fig/06-data-visualization-rendered-scatter-challenge-2-1.png new file mode 100644 index 000000000..217f19b5d Binary files /dev/null and b/fig/06-data-visualization-rendered-scatter-challenge-2-1.png differ diff --git a/fig/06-data-visualization-rendered-second-facet-1.png b/fig/06-data-visualization-rendered-second-facet-1.png new file mode 100644 index 000000000..38a2982d3 Binary files /dev/null and b/fig/06-data-visualization-rendered-second-facet-1.png differ diff --git a/fig/New_R_Markdown.png b/fig/New_R_Markdown.png new file mode 100644 index 000000000..8f071502e Binary files /dev/null and b/fig/New_R_Markdown.png differ diff --git a/fig/bioconductor_search.jpg b/fig/bioconductor_search.jpg new file mode 100644 index 000000000..e556d126a Binary files /dev/null and b/fig/bioconductor_search.jpg differ diff --git a/fig/bioconductor_website_screenshot.jpg b/fig/bioconductor_website_screenshot.jpg new file mode 100644 index 000000000..ccda4fbad Binary files /dev/null and b/fig/bioconductor_website_screenshot.jpg differ diff --git a/fig/new_project_window.png b/fig/new_project_window.png new file mode 100644 index 000000000..814d94e45 Binary files /dev/null and b/fig/new_project_window.png differ diff --git a/fig/oreilly_book_covers.png b/fig/oreilly_book_covers.png new file mode 100644 index 000000000..c2b86677b Binary files /dev/null and b/fig/oreilly_book_covers.png differ diff --git a/fig/rmd-03-unnamed-chunk-12-1.png b/fig/rmd-03-unnamed-chunk-12-1.png new file mode 100644 index 000000000..454a5828a Binary files /dev/null and b/fig/rmd-03-unnamed-chunk-12-1.png differ diff --git a/fig/rmd-03-unnamed-chunk-15-1.png b/fig/rmd-03-unnamed-chunk-15-1.png new file mode 100644 index 000000000..3a751768b Binary files /dev/null and b/fig/rmd-03-unnamed-chunk-15-1.png differ diff --git a/fig/rmd-03-unnamed-chunk-9-1.png b/fig/rmd-03-unnamed-chunk-9-1.png new file mode 100644 index 000000000..454a5828a Binary files /dev/null and b/fig/rmd-03-unnamed-chunk-9-1.png differ diff --git a/fig/rstudio_dataframeview.png b/fig/rstudio_dataframeview.png new file mode 100644 index 000000000..17ba8921b Binary files /dev/null and b/fig/rstudio_dataframeview.png differ diff --git a/fig/rstudio_import_menu.png b/fig/rstudio_import_menu.png new file mode 100644 index 000000000..b47bebe0b Binary files /dev/null and b/fig/rstudio_import_menu.png differ diff --git a/fig/rstudio_import_screen.png b/fig/rstudio_import_screen.png new file mode 100644 index 000000000..b061f85ec Binary files /dev/null and b/fig/rstudio_import_screen.png differ diff --git a/fig/rstudio_login_screen.png b/fig/rstudio_login_screen.png new file mode 100644 index 000000000..6c5be5ca1 Binary files /dev/null and b/fig/rstudio_login_screen.png differ diff --git a/fig/rstudio_script_warning.png b/fig/rstudio_script_warning.png new file mode 100644 index 000000000..d95d6bb2e Binary files /dev/null and b/fig/rstudio_script_warning.png differ diff --git a/fig/rstudio_session_4pane_layout.png b/fig/rstudio_session_4pane_layout.png new file mode 100644 index 000000000..09fc9a81a Binary files /dev/null and b/fig/rstudio_session_4pane_layout.png differ diff --git a/fig/rstudio_session_default.png b/fig/rstudio_session_default.png new file mode 100644 index 000000000..a1d52b974 Binary files /dev/null and b/fig/rstudio_session_default.png differ diff --git a/fig/split_apply_combine.png b/fig/split_apply_combine.png new file mode 100644 index 000000000..43ca75d12 Binary files /dev/null and b/fig/split_apply_combine.png differ diff --git a/fig/studio_contexthelp1.png b/fig/studio_contexthelp1.png new file mode 100644 index 000000000..b5fda68c1 Binary files /dev/null and b/fig/studio_contexthelp1.png differ diff --git a/fig/studio_contexthelp2.png b/fig/studio_contexthelp2.png new file mode 100644 index 000000000..be087e6fa Binary files /dev/null and b/fig/studio_contexthelp2.png differ diff --git a/fig/variant_calling_workflow.png b/fig/variant_calling_workflow.png new file mode 100644 index 000000000..812d7b6f7 Binary files /dev/null and b/fig/variant_calling_workflow.png differ diff --git a/index.md b/index.md new file mode 100644 index 000000000..004f6596a --- /dev/null +++ b/index.md @@ -0,0 +1,62 @@ +--- +permalink: index.html +site: sandpaper::sandpaper_site +--- + +**Welcome to R!** Working with a programming language (especially if it's your +first time) often feels intimidating, but the rewards outweigh any frustrations. +An important secret of coding is that even experienced programmers find it +difficult and frustrating at times – so if even the best feel that way, why let +intimidation stop you? Given time and practice\* you will soon find it easier +and easier to accomplish what you want. + +Why learn to code? Bioinformatics – like biology – is messy. Different +organisms, different systems, different conditions, all behave differently. +Experiments at the bench require a variety of approaches – from tested protocols +to trial-and-error. Bioinformatics is also an experimental science, otherwise we +could use the same software and same parameters for every genome assembly. +Learning to code opens up the full possibilities of computing, especially given +that most bioinformatics tools exist only at the command line. Think of it this +way: if you could only do molecular biology using a kit, you could probably +accomplish a fair amount. However, if you don't understand the biochemistry of +the kit, how would you troubleshoot? How would you do experiments for which +there are no kits? + +R is one of the most widely-used and powerful programming languages in +bioinformatics. R especially shines where a variety of statistical tools are +required (e.g. RNA-Seq, population genomics, etc.) and in the generation of +publication-quality graphs and figures. Rather than get into an R vs. Python +debate (both are useful), keep in mind that many of the concepts you will learn +apply to Python and other programming languages. + +Finally, we won't lie; R is not the easiest-to-learn programming language ever +created. So, don't get discouraged! The truth is that even with the modest +amount of R we will cover today, you can start using some sophisticated R +software packages, and have a general sense of how to interpret an R script. +Get through these lessons, and you are on your way to being an accomplished R +user! + +\* We very intentionally used the word practice. One of the other "secrets" of +programming is that you can only learn so much by reading about it. Do the +exercises in class, re-do them on your own, and then work on your own problems. + +:::::::::::::::::::::::::::::::::::::::::: prereq + +## Prerequisites + +- **Experimenter's Mindset**: We define the "Experimenter's mindset" as an + approach to bioinformatics that treats it like any other experiment. There + are probably a variety of metaphors we could employ (data are our + reagents, scripts are our protocols, etc.), but the most important idea of + the mindset is to remind you that as a researcher, you need to employ all + of your training in the bench or field to working with analyses. Evaluate + results critically, and don't expect that things will always work the first + time, or that they will always work in the same way. +- **Genomics Data Carpentry Instance**: This lesson assumes you are using a + Genomics Data Carpentry instance as described on the + [Genomics Workshop setup page](https://www.datacarpentry.org/genomics-workshop/setup.html) + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/instructor-notes.md b/instructor-notes.md new file mode 100644 index 000000000..eff248601 --- /dev/null +++ b/instructor-notes.md @@ -0,0 +1,7 @@ +--- +title: Instructor Notes +--- + +FIXME + + diff --git a/learner-profiles.md b/learner-profiles.md new file mode 100644 index 000000000..434e335aa --- /dev/null +++ b/learner-profiles.md @@ -0,0 +1,5 @@ +--- +title: FIXME +--- + +This is a placeholder file. Please add content here. diff --git a/md5sum.txt b/md5sum.txt new file mode 100644 index 000000000..349fd4c8a --- /dev/null +++ b/md5sum.txt @@ -0,0 +1,19 @@ +"file" "checksum" "built" "date" +"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-01-19" +"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-01-19" +"config.yaml" "b91cd97fa3b408bd1ac0a00e67ab3219" "site/built/config.yaml" "2024-01-19" +"index.md" "7f9c30e6487338a0c3f8ecc4018873ab" "site/built/index.md" "2024-01-19" +"episodes/00-introduction.Rmd" "0703d3a4fb9cd0dfc4d54c78dea6fee6" "site/built/00-introduction.md" "2024-01-19" +"episodes/01-r-basics.Rmd" "cf0ee85e99104da0316a085faf32ae21" "site/built/01-r-basics.md" "2024-01-19" +"episodes/02-data-prelude.Rmd" "ab2b1fd3cdaae919f9e409f713a0a8ad" "site/built/02-data-prelude.md" "2024-01-19" +"episodes/03-basics-factors-dataframes.Rmd" "cab7ab3fe53143558e6af3eee5774d35" "site/built/03-basics-factors-dataframes.md" "2024-01-19" +"episodes/04-bioconductor-vcfr.Rmd" "10eb69b4697d7ecb9695d36c0d974208" "site/built/04-bioconductor-vcfr.md" "2024-01-19" +"episodes/05-dplyr.Rmd" "f74055bd8677338a213e0a0c6c430119" "site/built/05-dplyr.md" "2024-01-19" +"episodes/06-data-visualization.Rmd" "0b45534421bad05f040b24c40b6da71b" "site/built/06-data-visualization.md" "2024-01-19" +"episodes/07-r-help.Rmd" "1a7610b0efbaebfdd03ff4540125a790" "site/built/07-r-help.md" "2024-01-19" +"instructors/instructor-notes.md" "78f6fe6109a0eb19a16ec6663941da7f" "site/built/instructor-notes.md" "2024-01-19" +"learners/discuss.md" "522bcb192adf6702a2e3cb2f0d1412b5" "site/built/discuss.md" "2024-01-19" +"learners/reference.md" "4e0dcbc7892af6f9610d44d356e66617" "site/built/reference.md" "2024-01-19" +"learners/setup.md" "8e109a52a3f92f4e454de0d71bf4df11" "site/built/setup.md" "2024-01-19" +"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-01-19" +"renv/profiles/lesson-requirements/renv.lock" "a532408eb00cf170b70e5e809fe496fa" "site/built/renv.lock" "2024-01-19" diff --git a/reference.md b/reference.md new file mode 100644 index 000000000..456f94091 --- /dev/null +++ b/reference.md @@ -0,0 +1,9 @@ +--- +title: 'Glossary' +--- + +## Glossary + +FIXME + + diff --git a/renv.lock b/renv.lock new file mode 100644 index 000000000..87e575d02 --- /dev/null +++ b/renv.lock @@ -0,0 +1,986 @@ +{ + "R": { + "Version": "4.3.2", + "Repositories": [ + { + "Name": "carpentries", + "URL": "https://carpentries.r-universe.dev" + }, + { + "Name": "carpentries_archive", + "URL": "https://carpentries.github.io/drat" + }, + { + "Name": "CRAN", + "URL": "https://cran.rstudio.com" + } + ] + }, + "Packages": { + "MASS": { + "Package": "MASS", + "Version": "7.3-60", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "methods", + "stats", + "utils" + ], + "Hash": "a56a6365b3fa73293ea8d084be0d9bb0" + }, + "Matrix": { + "Package": "Matrix", + "Version": "1.6-4", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "grDevices", + "graphics", + "grid", + "lattice", + "methods", + "stats", + "utils" + ], + "Hash": "d9c655b30a2edc6bb2244c1d1e8d549d" + }, + "R6": { + "Package": "R6", + "Version": "2.5.1", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R" + ], + "Hash": "470851b6d5d0ac559e9d01bb352b4021" + }, + "RColorBrewer": { + "Package": "RColorBrewer", + "Version": "1.1-3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "45f0398006e83a5b10b72a90663d8d8c" + }, + "base64enc": { + "Package": "base64enc", + "Version": "0.1-3", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R" + ], + "Hash": "543776ae6848fde2f48ff3816d0628bc" + }, + "bit": { + "Package": "bit", + "Version": "4.0.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "d242abec29412ce988848d0294b208fd" + }, + "bit64": { + "Package": "bit64", + "Version": "4.0.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "bit", + "methods", + "stats", + "utils" + ], + "Hash": "9fe98599ca456d6552421db0d6772d8f" + }, + "bslib": { + "Package": "bslib", + "Version": "0.6.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "base64enc", + "cachem", + "grDevices", + "htmltools", + "jquerylib", + "jsonlite", + "lifecycle", + "memoise", + "mime", + "rlang", + "sass" + ], + "Hash": "c0d8599494bc7fb408cd206bbdd9cab0" + }, + "cachem": { + "Package": "cachem", + "Version": "1.0.8", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "fastmap", + "rlang" + ], + "Hash": "c35768291560ce302c0a6589f92e837d" + }, + "cellranger": { + "Package": "cellranger", + "Version": "1.1.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "rematch", + "tibble" + ], + "Hash": "f61dbaec772ccd2e17705c1e872e9e7c" + }, + "cli": { + "Package": "cli", + "Version": "3.6.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "utils" + ], + "Hash": "1216ac65ac55ec0058a6f75d7ca0fd52" + }, + "clipr": { + "Package": "clipr", + "Version": "0.8.0", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "utils" + ], + "Hash": "3f038e5ac7f41d4ac41ce658c85e3042" + }, + "colorspace": { + "Package": "colorspace", + "Version": "2.1-0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "methods", + "stats" + ], + "Hash": "f20c47fd52fae58b4e377c37bb8c335b" + }, + "cpp11": { + "Package": "cpp11", + "Version": "0.4.7", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "5a295d7d963cc5035284dcdbaf334f4e" + }, + "crayon": { + "Package": "crayon", + "Version": "1.5.2", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "grDevices", + "methods", + "utils" + ], + "Hash": "e8a1e41acf02548751f45c718d55aa6a" + }, + "digest": { + "Package": "digest", + "Version": "0.6.33", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "utils" + ], + "Hash": "b18a9cf3c003977b0cc49d5e76ebe48d" + }, + "dplyr": { + "Package": "dplyr", + "Version": "1.1.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "cli", + "generics", + "glue", + "lifecycle", + "magrittr", + "methods", + "pillar", + "rlang", + "tibble", + "tidyselect", + "utils", + "vctrs" + ], + "Hash": "fedd9d00c2944ff00a0e2696ccf048ec" + }, + "ellipsis": { + "Package": "ellipsis", + "Version": "0.3.2", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "rlang" + ], + "Hash": "bb0eec2fe32e88d9e2836c2f73ea2077" + }, + "evaluate": { + "Package": "evaluate", + "Version": "0.23", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "methods" + ], + "Hash": "daf4a1246be12c1fa8c7705a0935c1a0" + }, + "fansi": { + "Package": "fansi", + "Version": "1.0.6", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "utils" + ], + "Hash": "962174cf2aeb5b9eea581522286a911f" + }, + "farver": { + "Package": "farver", + "Version": "2.1.1", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "8106d78941f34855c440ddb946b8f7a5" + }, + "fastmap": { + "Package": "fastmap", + "Version": "1.1.1", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "f7736a18de97dea803bde0a2daaafb27" + }, + "fontawesome": { + "Package": "fontawesome", + "Version": "0.5.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "htmltools", + "rlang" + ], + "Hash": "c2efdd5f0bcd1ea861c2d4e2a883a67d" + }, + "fs": { + "Package": "fs", + "Version": "1.6.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "methods" + ], + "Hash": "47b5f30c720c23999b913a1a635cf0bb" + }, + "generics": { + "Package": "generics", + "Version": "0.1.3", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "methods" + ], + "Hash": "15e9634c0fcd294799e9b2e929ed1b86" + }, + "ggplot2": { + "Package": "ggplot2", + "Version": "3.4.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "MASS", + "R", + "cli", + "glue", + "grDevices", + "grid", + "gtable", + "isoband", + "lifecycle", + "mgcv", + "rlang", + "scales", + "stats", + "tibble", + "vctrs", + "withr" + ], + "Hash": "313d31eff2274ecf4c1d3581db7241f9" + }, + "glue": { + "Package": "glue", + "Version": "1.6.2", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "methods" + ], + "Hash": "4f2596dfb05dac67b9dc558e5c6fba2e" + }, + "gtable": { + "Package": "gtable", + "Version": "0.3.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "grid", + "lifecycle", + "rlang" + ], + "Hash": "b29cf3031f49b04ab9c852c912547eef" + }, + "highr": { + "Package": "highr", + "Version": "0.10", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "xfun" + ], + "Hash": "06230136b2d2b9ba5805e1963fa6e890" + }, + "hms": { + "Package": "hms", + "Version": "1.1.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "lifecycle", + "methods", + "pkgconfig", + "rlang", + "vctrs" + ], + "Hash": "b59377caa7ed00fa41808342002138f9" + }, + "htmltools": { + "Package": "htmltools", + "Version": "0.5.7", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "base64enc", + "digest", + "ellipsis", + "fastmap", + "grDevices", + "rlang", + "utils" + ], + "Hash": "2d7b3857980e0e0d0a1fd6f11928ab0f" + }, + "isoband": { + "Package": "isoband", + "Version": "0.2.7", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "grid", + "utils" + ], + "Hash": "0080607b4a1a7b28979aecef976d8bc2" + }, + "jquerylib": { + "Package": "jquerylib", + "Version": "0.1.4", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "htmltools" + ], + "Hash": "5aab57a3bd297eee1c1d862735972182" + }, + "jsonlite": { + "Package": "jsonlite", + "Version": "1.8.8", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "methods" + ], + "Hash": "e1b9c55281c5adc4dd113652d9e26768" + }, + "knitr": { + "Package": "knitr", + "Version": "1.45", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "evaluate", + "highr", + "methods", + "tools", + "xfun", + "yaml" + ], + "Hash": "1ec462871063897135c1bcbe0fc8f07d" + }, + "labeling": { + "Package": "labeling", + "Version": "0.4.3", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "graphics", + "stats" + ], + "Hash": "b64ec208ac5bc1852b285f665d6368b3" + }, + "lattice": { + "Package": "lattice", + "Version": "0.22-5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "grid", + "stats", + "utils" + ], + "Hash": "7c5e89f04e72d6611c77451f6331a091" + }, + "lifecycle": { + "Package": "lifecycle", + "Version": "1.0.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "rlang" + ], + "Hash": "b8552d117e1b808b09a832f589b79035" + }, + "magrittr": { + "Package": "magrittr", + "Version": "2.0.3", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R" + ], + "Hash": "7ce2733a9826b3aeb1775d56fd305472" + }, + "memoise": { + "Package": "memoise", + "Version": "2.0.1", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "cachem", + "rlang" + ], + "Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c" + }, + "mgcv": { + "Package": "mgcv", + "Version": "1.9-1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "Matrix", + "R", + "graphics", + "methods", + "nlme", + "splines", + "stats", + "utils" + ], + "Hash": "110ee9d83b496279960e162ac97764ce" + }, + "mime": { + "Package": "mime", + "Version": "0.12", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "tools" + ], + "Hash": "18e9c28c1d3ca1560ce30658b22ce104" + }, + "munsell": { + "Package": "munsell", + "Version": "0.5.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "colorspace", + "methods" + ], + "Hash": "6dfe8bf774944bd5595785e3229d8771" + }, + "nlme": { + "Package": "nlme", + "Version": "3.1-164", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "graphics", + "lattice", + "stats", + "utils" + ], + "Hash": "a623a2239e642806158bc4dc3f51565d" + }, + "pillar": { + "Package": "pillar", + "Version": "1.9.0", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "cli", + "fansi", + "glue", + "lifecycle", + "rlang", + "utf8", + "utils", + "vctrs" + ], + "Hash": "15da5a8412f317beeee6175fbc76f4bb" + }, + "pkgconfig": { + "Package": "pkgconfig", + "Version": "2.0.3", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "utils" + ], + "Hash": "01f28d4278f15c76cddbea05899c5d6f" + }, + "prettyunits": { + "Package": "prettyunits", + "Version": "1.2.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "6b01fc98b1e86c4f705ce9dcfd2f57c7" + }, + "printr": { + "Package": "printr", + "Version": "0.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "knitr" + ], + "Hash": "03e0d4cc8152eed9515f517a8153c085" + }, + "progress": { + "Package": "progress", + "Version": "1.2.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "crayon", + "hms", + "prettyunits" + ], + "Hash": "f4625e061cb2865f111b47ff163a5ca6" + }, + "purrr": { + "Package": "purrr", + "Version": "1.0.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "lifecycle", + "magrittr", + "rlang", + "vctrs" + ], + "Hash": "1cba04a4e9414bdefc9dcaa99649a8dc" + }, + "rappdirs": { + "Package": "rappdirs", + "Version": "0.3.3", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R" + ], + "Hash": "5e3c5dc0b071b21fa128676560dbe94d" + }, + "readr": { + "Package": "readr", + "Version": "2.1.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "cli", + "clipr", + "cpp11", + "crayon", + "hms", + "lifecycle", + "methods", + "rlang", + "tibble", + "tzdb", + "utils", + "vroom" + ], + "Hash": "b5047343b3825f37ad9d3b5d89aa1078" + }, + "readxl": { + "Package": "readxl", + "Version": "1.4.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cellranger", + "cpp11", + "progress", + "tibble", + "utils" + ], + "Hash": "8cf9c239b96df1bbb133b74aef77ad0a" + }, + "rematch": { + "Package": "rematch", + "Version": "2.0.0", + "Source": "Repository", + "Repository": "RSPM", + "Hash": "cbff1b666c6fa6d21202f07e2318d4f1" + }, + "renv": { + "Package": "renv", + "Version": "1.0.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "utils" + ], + "Hash": "41b847654f567341725473431dd0d5ab" + }, + "rlang": { + "Package": "rlang", + "Version": "1.1.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "utils" + ], + "Hash": "50a6dbdc522936ca35afc5e2082ea91b" + }, + "rmarkdown": { + "Package": "rmarkdown", + "Version": "2.25", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "bslib", + "evaluate", + "fontawesome", + "htmltools", + "jquerylib", + "jsonlite", + "knitr", + "methods", + "stringr", + "tinytex", + "tools", + "utils", + "xfun", + "yaml" + ], + "Hash": "d65e35823c817f09f4de424fcdfa812a" + }, + "sass": { + "Package": "sass", + "Version": "0.4.8", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R6", + "fs", + "htmltools", + "rappdirs", + "rlang" + ], + "Hash": "168f9353c76d4c4b0a0bbf72e2c2d035" + }, + "scales": { + "Package": "scales", + "Version": "1.3.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "RColorBrewer", + "cli", + "farver", + "glue", + "labeling", + "lifecycle", + "munsell", + "rlang", + "viridisLite" + ], + "Hash": "c19df082ba346b0ffa6f833e92de34d1" + }, + "stringi": { + "Package": "stringi", + "Version": "1.8.3", + "Source": "Repository", + "Repository": "https://carpentries.r-universe.dev", + "Requirements": [ + "R", + "stats", + "tools", + "utils" + ], + "Hash": "058aebddea264f4c99401515182e656a" + }, + "stringr": { + "Package": "stringr", + "Version": "1.5.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "lifecycle", + "magrittr", + "rlang", + "stringi", + "vctrs" + ], + "Hash": "960e2ae9e09656611e0b8214ad543207" + }, + "tibble": { + "Package": "tibble", + "Version": "3.2.1", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "fansi", + "lifecycle", + "magrittr", + "methods", + "pillar", + "pkgconfig", + "rlang", + "utils", + "vctrs" + ], + "Hash": "a84e2cc86d07289b3b6f5069df7a004c" + }, + "tidyr": { + "Package": "tidyr", + "Version": "1.3.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "cpp11", + "dplyr", + "glue", + "lifecycle", + "magrittr", + "purrr", + "rlang", + "stringr", + "tibble", + "tidyselect", + "utils", + "vctrs" + ], + "Hash": "e47debdc7ce599b070c8e78e8ac0cfcf" + }, + "tidyselect": { + "Package": "tidyselect", + "Version": "1.2.0", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "cli", + "glue", + "lifecycle", + "rlang", + "vctrs", + "withr" + ], + "Hash": "79540e5fcd9e0435af547d885f184fd5" + }, + "tinytex": { + "Package": "tinytex", + "Version": "0.49", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "xfun" + ], + "Hash": "5ac22900ae0f386e54f1c307eca7d843" + }, + "tzdb": { + "Package": "tzdb", + "Version": "0.4.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cpp11" + ], + "Hash": "f561504ec2897f4d46f0c7657e488ae1" + }, + "utf8": { + "Package": "utf8", + "Version": "1.2.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "62b65c52671e6665f803ff02954446e9" + }, + "vctrs": { + "Package": "vctrs", + "Version": "0.6.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "lifecycle", + "rlang" + ], + "Hash": "c03fa420630029418f7e6da3667aac4a" + }, + "viridisLite": { + "Package": "viridisLite", + "Version": "0.4.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "c826c7c4241b6fc89ff55aaea3fa7491" + }, + "vroom": { + "Package": "vroom", + "Version": "1.6.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "bit64", + "cli", + "cpp11", + "crayon", + "glue", + "hms", + "lifecycle", + "methods", + "progress", + "rlang", + "stats", + "tibble", + "tidyselect", + "tzdb", + "vctrs", + "withr" + ], + "Hash": "390f9315bc0025be03012054103d227c" + }, + "withr": { + "Package": "withr", + "Version": "2.5.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "stats" + ], + "Hash": "4b25e70111b7d644322e9513f403a272" + }, + "xfun": { + "Package": "xfun", + "Version": "0.41", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "stats", + "tools" + ], + "Hash": "460a5e0fe46a80ef87424ad216028014" + }, + "yaml": { + "Package": "yaml", + "Version": "2.3.8", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "29240487a071f535f5e5d5a323b7afbd" + } + } +} diff --git a/setup.md b/setup.md new file mode 100644 index 000000000..472d28b16 --- /dev/null +++ b/setup.md @@ -0,0 +1,54 @@ +--- +title: Setup +--- + +This lesson is an additional lesson to [the genomics workshop](https://datacarpentry.org/genomics-workshop/). Below, is a detailed setup instructions for the main workshop which can also be found on [the main setup page](https://datacarpentry.org/genomics-workshop/setup.html). If you are only here for the Intro to R and RStudio for Genomics lesson, and do not wish to work on the cloud, you can go for option B below where you will only need to [download the data files](https://figshare.com/articles/Data_Carpentry_Genomics_beta_2_0/7726454) to your local working directory where you will create the r-project in. + +## R Genomics workshop setup directions + +For general setup for the genomics lessons, see that page of [the genomics workshop](https://datacarpentry.org/genomics-workshop/). + + + + +## Installing R + +This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) +instances. With the exception of a spreadsheet program, all of the software and data used in the workshop are hosted on an Amazon. However, it is beneficial to understand how to install both R and RStudio locally (on your own machine). + +Typically we suggest you install R for your operating instructions from Comprehensive R Archive Network (CRAN) at [this page](https://cran.r-project.org/). + + +:::::::::::::::: spoiler + +## Windows + +Follow the CRAN instructions for downloading the [Windows Binary](https://cran.r-project.org/bin/windows/base/) + + +::::::::::::::::::::::::: + +:::::::::::::::: spoiler + +## Mac OS X + +Follow the CRAN instructions for downloading the [MacOS .pkg file](https://cran.r-project.org/bin/macosx/); ensure that you download the package that corresponds to your Mac CPU configuration (e.g. "For Apple silicon (M1/M2) Macs" or "For older Intel Macs"). + + + +::::::::::::::::::::::::: + +:::::::::::::::: spoiler + +## Linux + +Follow the CRAN instructions for your [Linux Distribution](https://cran.r-project.org/bin/linux/); The instructions for Ubuntu are [found here](https://cran.r-project.org/bin/linux/ubuntu/). + + +::::::::::::::::::::::::: + + + +## Installing RStudio + +The RStudio integrated development environment (IDE) is a popular way to use R. We will cover what RStudio is in the lesson. All operating systems can go to the Posit LLC [download pages](https://posit.co/download/rstudio-desktop/) to download the appropriate version for their operating system. In most cases, the Posit website will detect your operating system and suggest the correct download. This same page will link to alternatives.