-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging two V4 runs together #1968
Comments
Not part of the development team, but here are my two cents.
You will mess up the denoising process if you merge first, because DADA2 uses the original quality scores to model the sequencing run error profile. But if the merged reads are all you've got, I guess you've gotta do the best you can with them as single-end data.
DADA2 denoising is based on modelling the sequencing run error profile, so you should not pool datasets that are from different sequencing runs prior to running
Could be. You should use a dedicated adapter removal program to remove primers though, such as cutadapt, and not |
Hi Ben and team,
I am currently working on two sets of data that I have received elsewhere, both from the V4 region. For the first dataset, I have both forward and reverse reads prior merging (2x150 bp), however the second one, they are already merged (2x150 bp. I have some questions for how to go forward with combining the two datasets.
When I look at the sequence length of the datasets prior combining them, they look as follows
First dataset:
table(nchar(getSequences(seqtab_UMAMI)))
150 151 152 153 154 155 158 159 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
2 2 2 5 4 2 4 3 3 2 6 4 8 2 5 6 5 2 4 7 2 1 4 1 3 3 3
180 181 183 184 185 186 188 189 190 191 192 193 194 195 196 197 199 200 201 202 203 204 205 206 207 208 209
3 3 4 5 10 3 5 2 2 5 2 4 1 2 5 4 3 7 3 1 5 49 353 309 494 13 1
210 211 212 213 215 216 217 218 219 220 221 222 224 225 228 231 234 240 242 243 246 249 252 253 258 259 260
34 1 42 4 5 3 38 75 3 1 2 10 20 1 2 1 2 3 72 1 3 2 1 2 1 3 1
261 262 265 266 275 276 280 281 287 289 290 291 292 293
1 7 6 1 1 13 1 1 16 4 20 1134 15085 477
Second dataset:
table(nchar(getSequences(seqtab)))
290
12259
As far as I know, the biological lenght of V4 is around 254 right, so could the reason that the majority of the be at 290 and 292 bp be due to primers not being removed and be fixed by trim left?
Thank you.
The text was updated successfully, but these errors were encountered: