You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @benjjneb,
We have a project with the merge step in dada2 pipeline. We wanted to evaluate which read merging strategy would yield the best taxonomic assignments to make an informed choice between them. We evaluated three approaches: default Merging, using Forward reads only, or justConcatenate.
Here we used a dataset of 376 samples of ITS1-ITS2 amplicon sequencing from apple tree phyllosphere using MiSeq 300x300 in paired end. We realise our reads won’t always overlap due to the variable length of ITS amplicons. When we look at the number of ASVs and unique taxa at various ranks, we found it interesting that the Forward method had lower ASV count but assigned a higher number of species, genus, family, order, class followed closely by the Concatenated method.
We think there are more ASVs with Concat because this method could create different ASVs from the same true biological sequence if the reads are not trimmed exactly the same way, which seems possible. Nonetheless, it is surprising that we get a higher taxonomic diversity than the Fwd method.
To visualize the overall agreement of assignments, we made Venn diagrams using unique taxa identified at different taxonomic ranks :
Most interesting is the fact that the Fwd and Concat methods each find several different unique genera or family. Naturally, we wonder which is most likely to be the right answer.
The following shows the percentage of ASVs in each sample that were assigned a label at various taxonomic ranks:
As you can see, overall the concatenation method has higher assignement than the other methods. It is higher, but is it better ?
Of note, we also tested this on two other independent datasets (another ITS and a trnL experiment) with similar findings.
Here are the questions that come to mind when looking at all of this.
First, why do you think the Fwd method identifies more Genera (or Families) than Concat, given that it essentially uses a subset of the information contained in the Concat method? Could it be that Fwd is prone to lower accuracy and somehow recalls taxa that aren’t really there?
Second, we are puzzled by the fact that the Concat and Fwd method each find a pretty high number of Genera or Families that the other doesn’t find. Can you think of a reason for that? Would you trust more the Concat or the Forward?
Finally, given that Concat will preserve and append some reads that should have been overlapped and merged, this means the resulting reads will contain duplicate information for some portion of the amplicon. Could this have an impact on taxonomic assignment? Maybe it causes false positive assignement and explain the higher number of ASV for this method.
We are looking forward to reading your thoughts on this. Thanks in advance for your time. Cheers!
Note: The FilterandTrim parameters were maxEE = c(4, 4), truncQ = 2, minLen = 100. We used Cutadapt to remove primers.
The text was updated successfully, but these errors were encountered:
Concatenation keeps more bases, so will result in a higher rate of taxonomic assignments. The quality of those additional assignments is unclear, and my guess is that much of what you are observing by looking at the ASV/taxa level is dealing with very low abundance variants.
More unique taxonomic assignments != better description of the measured community.
Hi @benjjneb,
We have a project with the merge step in dada2 pipeline. We wanted to evaluate which read merging strategy would yield the best taxonomic assignments to make an informed choice between them. We evaluated three approaches: default Merging, using Forward reads only, or justConcatenate.
Here we used a dataset of 376 samples of ITS1-ITS2 amplicon sequencing from apple tree phyllosphere using MiSeq 300x300 in paired end. We realise our reads won’t always overlap due to the variable length of ITS amplicons. When we look at the number of ASVs and unique taxa at various ranks, we found it interesting that the Forward method had lower ASV count but assigned a higher number of species, genus, family, order, class followed closely by the Concatenated method.
We think there are more ASVs with Concat because this method could create different ASVs from the same true biological sequence if the reads are not trimmed exactly the same way, which seems possible. Nonetheless, it is surprising that we get a higher taxonomic diversity than the Fwd method.
To visualize the overall agreement of assignments, we made Venn diagrams using unique taxa identified at different taxonomic ranks :
Most interesting is the fact that the Fwd and Concat methods each find several different unique genera or family. Naturally, we wonder which is most likely to be the right answer.
The following shows the percentage of ASVs in each sample that were assigned a label at various taxonomic ranks:
As you can see, overall the concatenation method has higher assignement than the other methods. It is higher, but is it better ?
Of note, we also tested this on two other independent datasets (another ITS and a trnL experiment) with similar findings.
Here are the questions that come to mind when looking at all of this.
First, why do you think the Fwd method identifies more Genera (or Families) than Concat, given that it essentially uses a subset of the information contained in the Concat method? Could it be that Fwd is prone to lower accuracy and somehow recalls taxa that aren’t really there?
Second, we are puzzled by the fact that the Concat and Fwd method each find a pretty high number of Genera or Families that the other doesn’t find. Can you think of a reason for that? Would you trust more the Concat or the Forward?
Finally, given that Concat will preserve and append some reads that should have been overlapped and merged, this means the resulting reads will contain duplicate information for some portion of the amplicon. Could this have an impact on taxonomic assignment? Maybe it causes false positive assignement and explain the higher number of ASV for this method.
We are looking forward to reading your thoughts on this. Thanks in advance for your time. Cheers!
Note: The FilterandTrim parameters were maxEE = c(4, 4), truncQ = 2, minLen = 100. We used Cutadapt to remove primers.
The text was updated successfully, but these errors were encountered: