Assigning sample names, time points, etc. in phyloseq step #1973

backwards-charm · 2024-06-15T21:50:45Z

I'm super new to R and analyzing some samples using dada2.
I'm having trouble making plots and confused about the following code

samples.out <- rownames(seqtab.nochim)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
samdf$When <- "Early"
samdf$When[samdf$Day>100] <- "Late"
rownames(samdf) <- samples.out

I'm not sure how to assign the right names and time points to my samples. I tried running it as is and my plots look like this:

What can I do to fix this situation? I have 16 samples. Among those 16 samples, there were 8 treatment types, with 2 replicates per treatment type.

The text was updated successfully, but these errors were encountered:

benjjneb · 2024-06-17T17:27:34Z

Typically you won't be parsing the filenames to extract metadata (as is done in this example). Instead you will be reading in the sample metadata from a spread sheet or delimited text file.

Do you have such a document already prepared? That is, something with rows for each sample, and columns for the different metadata associated with those samples?

backwards-charm · 2024-06-17T19:47:00Z

I couldn't figure out at which point throughout the process to add in the metadata so I've actually edited the file names to somewhat match the tutorial.

My file naming scheme is: a letter indicating treatment type (C for control, A for acids, etc.), a number indicating which replicate for the specific treatment type (1 or 2), the letter "D", and the day in which the sample was taken (0, 15, etc). So for example, one of my files is named: C1D0.

After doing this, I was able to generate some plots where "Day" is the focus using the following code:

# Create a new variable called samples.out from the rownames of seqtab.nochim
samples.out <- rownames(seqtab.nochim)
# Create a new variable called subject that is the part of samples.out before the first "D"
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
# Create a new variable called gender that is the first letter of subject
treatment <- substr(subject,1,1)
# Reassign subject to be the part of subject after the first character
subject <- substr(subject,2,999)
# Create a new variable called day that is the part of samples.out after the first "D"
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
# Create a new data frame called samdf with columns Subject, Gender and Day
samdf <- data.frame(Subject=subject, Treatment=treatment, Day=day)
# Create a new variable called When that is “Day 0" 
samdf$When <- "Day 0"
# Change the value of When to “Day 37" for all rows where Day is greater than 36
samdf$When[samdf$Day>36] <- "Day 37"
# Add the rownmanes of seqtab.nochim to samdf
rownames(samdf) <- samples.out

This looks great for Shannon-Simpson and Bray-Curtis but when I get to the bar graphs, they turn up looking a bit odd.

Originally I used the code:

# Create Bar plot
top20 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:20]
ps.top20 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top20 <- prune_taxa(top20, ps.top20)
plot_bar(ps.top20, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and ended up with:

I also edited it to:

 # Create Bar plot
top1000 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:1000]
ps.top1000 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top1000 <- prune_taxa(top1000, ps.top1000)
plot_bar(ps.top1000, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and the bar graphs look completely black rather than colorful:

I am also more interested in examining the differences between treatment type, rather than between days.
I am not sure how to edit the code to show me differences in treatment, seeing as I have 8 different treatment types. Would I have to adjust the initial code?

benjjneb · 2024-06-17T22:53:02Z

I couldn't figure out at which point throughout the process to add in the

metadata so I've actually edited the file names to somewhat match the tutorial. You would create the samdf object at the same point that you are doing this string parsing. And doing it from a defined metadata file would be a better strategy.

I am not sure how to edit the code to show me differences in treatment,

seeing as I have 8 different treatment types. Would I have to adjust the initial code? Probably. This goes beyond the dada2 R package though, and we can only really help getting you to an ASV table, this R package doesn't do subsequent statistical analysis/visualization.

…

On Mon, Jun 17, 2024 at 3:47 PM backwards-charm ***@***.***> wrote: I couldn't figure out at which point throughout the process to add in the metadata so I've actually edited the file names to somewhat match the tutorial. My file naming scheme is: a letter indicating treatment type (C for control, A for acids, etc.), a number indicating which replicate for the specific treatment type (1 or 2), the letter "D", and the day in which the sample was taken (0, 15, etc). So for example, one of my files is named: C1D0. After doing this, I was able to generate some plots where "Day" is the focus using the following code: # Create a new variable called samples.out from the rownames of seqtab.nochim samples.out <- rownames(seqtab.nochim) # Create a new variable called subject that is the part of samples.out before the first "D" subject <- sapply(strsplit(samples.out, "D"), `[`, 1) # Create a new variable called gender that is the first letter of subject treatment <- substr(subject,1,1) # Reassign subject to be the part of subject after the first character subject <- substr(subject,2,999) # Create a new variable called day that is the part of samples.out after the first "D" day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2)) # Create a new data frame called samdf with columns Subject, Gender and Day samdf <- data.frame(Subject=subject, Treatment=treatment, Day=day) # Create a new variable called When that is “Day 0" samdf$When <- "Day 0" # Change the value of When to “Day 37" for all rows where Day is greater than 36 samdf$When[samdf$Day>36] <- "Day 37" # Add the rownmanes of seqtab.nochim to samdf rownames(samdf) <- samples.out This looks great for Shannon-Simpson and Bray-Curtis but when I get to the bar graphs, they turn up looking a bit odd. Originally I used the code: # Create Bar plot top20 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:20] ps.top20 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU)) ps.top20 <- prune_taxa(top20, ps.top20) plot_bar(ps.top20, x="Day", fill="Class") + facet_wrap(~When, scales="free_x") and ended up with: Screenshot.2024-06-17.at.3.42.50.PM.png (view on web) <https://github.com/benjjneb/dada2/assets/172938055/9a0fdf3d-b5ac-4821-bd34-49e83d88ecb1> I also edited it to: # Create Bar plot top1000 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:1000] ps.top1000 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU)) ps.top1000 <- prune_taxa(top1000, ps.top1000) plot_bar(ps.top1000, x="Day", fill="Class") + facet_wrap(~When, scales="free_x") and the bar graphs look completely black rather than colorful: Screenshot.2024-06-17.at.3.43.53.PM.png (view on web) <https://github.com/benjjneb/dada2/assets/172938055/0fb5834f-55da-4fd9-b296-244939bad3b7> I am also more interested in examining the differences between treatment type, rather than between days. I am not sure how to edit the code to show me differences in treatment, seeing as I have 8 different treatment types. Would I have to adjust the initial code? — Reply to this email directly, view it on GitHub <#1973 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABMHKVHHGLEZXFOOJ7A4CRDZH44MXAVCNFSM6AAAAABJMAGX4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZUGI4TOMBUGA> . You are receiving this because you commented.Message ID: ***@***.***>

benjjneb added the documentation label Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assigning sample names, time points, etc. in phyloseq step #1973

Assigning sample names, time points, etc. in phyloseq step #1973

backwards-charm commented Jun 15, 2024

benjjneb commented Jun 17, 2024

backwards-charm commented Jun 17, 2024

benjjneb commented Jun 17, 2024 via email

Assigning sample names, time points, etc. in phyloseq step #1973

Assigning sample names, time points, etc. in phyloseq step #1973

Comments

backwards-charm commented Jun 15, 2024

benjjneb commented Jun 17, 2024

backwards-charm commented Jun 17, 2024

benjjneb commented Jun 17, 2024 via email