-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assigning sample names, time points, etc. in phyloseq step #1973
Labels
Comments
Typically you won't be parsing the filenames to extract metadata (as is done in this example). Instead you will be reading in the sample metadata from a spread sheet or delimited text file. Do you have such a document already prepared? That is, something with rows for each sample, and columns for the different metadata associated with those samples? |
I couldn't figure out at which point throughout the process to add in the
metadata so I've actually edited the file names to somewhat match the
tutorial.
You would create the samdf object at the same point that you are doing this
string parsing. And doing it from a defined metadata file would be a better
strategy.
I am not sure how to edit the code to show me differences in treatment,
seeing as I have 8 different treatment types. Would I have to adjust the
initial code?
Probably. This goes beyond the dada2 R package though, and we can only
really help getting you to an ASV table, this R package doesn't do
subsequent statistical analysis/visualization.
…On Mon, Jun 17, 2024 at 3:47 PM backwards-charm ***@***.***> wrote:
I couldn't figure out at which point throughout the process to add in the
metadata so I've actually edited the file names to somewhat match the
tutorial.
My file naming scheme is: a letter indicating treatment type (C for
control, A for acids, etc.), a number indicating which replicate for the
specific treatment type (1 or 2), the letter "D", and the day in which the
sample was taken (0, 15, etc). So for example, one of my files is named:
C1D0.
After doing this, I was able to generate some plots where "Day" is the
focus using the following code:
# Create a new variable called samples.out from the rownames of seqtab.nochim
samples.out <- rownames(seqtab.nochim)
# Create a new variable called subject that is the part of samples.out before the first "D"
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
# Create a new variable called gender that is the first letter of subject
treatment <- substr(subject,1,1)
# Reassign subject to be the part of subject after the first character
subject <- substr(subject,2,999)
# Create a new variable called day that is the part of samples.out after the first "D"
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
# Create a new data frame called samdf with columns Subject, Gender and Day
samdf <- data.frame(Subject=subject, Treatment=treatment, Day=day)
# Create a new variable called When that is “Day 0"
samdf$When <- "Day 0"
# Change the value of When to “Day 37" for all rows where Day is greater than 36
samdf$When[samdf$Day>36] <- "Day 37"
# Add the rownmanes of seqtab.nochim to samdf
rownames(samdf) <- samples.out
This looks great for Shannon-Simpson and Bray-Curtis but when I get to the
bar graphs, they turn up looking a bit odd.
Originally I used the code:
# Create Bar plot
top20 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:20]
ps.top20 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top20 <- prune_taxa(top20, ps.top20)
plot_bar(ps.top20, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")
and ended up with:
Screenshot.2024-06-17.at.3.42.50.PM.png (view on web)
<https://github.com/benjjneb/dada2/assets/172938055/9a0fdf3d-b5ac-4821-bd34-49e83d88ecb1>
I also edited it to:
# Create Bar plot
top1000 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:1000]
ps.top1000 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top1000 <- prune_taxa(top1000, ps.top1000)
plot_bar(ps.top1000, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")
and the bar graphs look completely black rather than colorful:
Screenshot.2024-06-17.at.3.43.53.PM.png (view on web)
<https://github.com/benjjneb/dada2/assets/172938055/0fb5834f-55da-4fd9-b296-244939bad3b7>
I am also more interested in examining the differences between treatment
type, rather than between days.
I am not sure how to edit the code to show me differences in treatment,
seeing as I have 8 different treatment types. Would I have to adjust the
initial code?
—
Reply to this email directly, view it on GitHub
<#1973 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMHKVHHGLEZXFOOJ7A4CRDZH44MXAVCNFSM6AAAAABJMAGX4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZUGI4TOMBUGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm super new to R and analyzing some samples using dada2.
I'm having trouble making plots and confused about the following code
I'm not sure how to assign the right names and time points to my samples. I tried running it as is and my plots look like this:
What can I do to fix this situation? I have 16 samples. Among those 16 samples, there were 8 treatment types, with 2 replicates per treatment type.
The text was updated successfully, but these errors were encountered: