Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning sample names, time points, etc. in phyloseq step #1973

Open
backwards-charm opened this issue Jun 15, 2024 · 3 comments
Open

Assigning sample names, time points, etc. in phyloseq step #1973

backwards-charm opened this issue Jun 15, 2024 · 3 comments

Comments

@backwards-charm
Copy link

I'm super new to R and analyzing some samples using dada2.
I'm having trouble making plots and confused about the following code

samples.out <- rownames(seqtab.nochim)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
samdf$When <- "Early"
samdf$When[samdf$Day>100] <- "Late"
rownames(samdf) <- samples.out

I'm not sure how to assign the right names and time points to my samples. I tried running it as is and my plots look like this:
Screenshot 2024-06-15 at 5 28 16 PM

What can I do to fix this situation? I have 16 samples. Among those 16 samples, there were 8 treatment types, with 2 replicates per treatment type.

@benjjneb
Copy link
Owner

Typically you won't be parsing the filenames to extract metadata (as is done in this example). Instead you will be reading in the sample metadata from a spread sheet or delimited text file.

Do you have such a document already prepared? That is, something with rows for each sample, and columns for the different metadata associated with those samples?

@backwards-charm
Copy link
Author

I couldn't figure out at which point throughout the process to add in the metadata so I've actually edited the file names to somewhat match the tutorial.

My file naming scheme is: a letter indicating treatment type (C for control, A for acids, etc.), a number indicating which replicate for the specific treatment type (1 or 2), the letter "D", and the day in which the sample was taken (0, 15, etc). So for example, one of my files is named: C1D0.

After doing this, I was able to generate some plots where "Day" is the focus using the following code:

# Create a new variable called samples.out from the rownames of seqtab.nochim
samples.out <- rownames(seqtab.nochim)
# Create a new variable called subject that is the part of samples.out before the first "D"
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
# Create a new variable called gender that is the first letter of subject
treatment <- substr(subject,1,1)
# Reassign subject to be the part of subject after the first character
subject <- substr(subject,2,999)
# Create a new variable called day that is the part of samples.out after the first "D"
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
# Create a new data frame called samdf with columns Subject, Gender and Day
samdf <- data.frame(Subject=subject, Treatment=treatment, Day=day)
# Create a new variable called When that is “Day 0" 
samdf$When <- "Day 0"
# Change the value of When to “Day 37" for all rows where Day is greater than 36
samdf$When[samdf$Day>36] <- "Day 37"
# Add the rownmanes of seqtab.nochim to samdf
rownames(samdf) <- samples.out

This looks great for Shannon-Simpson and Bray-Curtis but when I get to the bar graphs, they turn up looking a bit odd.

Originally I used the code:

# Create Bar plot
top20 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:20]
ps.top20 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top20 <- prune_taxa(top20, ps.top20)
plot_bar(ps.top20, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and ended up with:
Screenshot 2024-06-17 at 3 42 50 PM

I also edited it to:

 # Create Bar plot
top1000 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:1000]
ps.top1000 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top1000 <- prune_taxa(top1000, ps.top1000)
plot_bar(ps.top1000, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and the bar graphs look completely black rather than colorful:
Screenshot 2024-06-17 at 3 43 53 PM

I am also more interested in examining the differences between treatment type, rather than between days.
I am not sure how to edit the code to show me differences in treatment, seeing as I have 8 different treatment types. Would I have to adjust the initial code?

@benjjneb
Copy link
Owner

benjjneb commented Jun 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants