Bias in assessments of marine microbial biodiversity in fosmid libraries as evaluated by pyrosequencing. Academic Article uri icon


  • On the basis of 16S rRNA gene sequencing, the SAR11 clade of marine bacteria has an almost universal distribution, being detected as abundant sequences in all marine provinces. Yet, SAR11 sequences are rarely detected in fosmid libraries, suggesting that the widespread abundance may be an artefact of PCR cloning and that SAR11 has a relatively low abundance. Here the relative abundance of SAR11 is explored in both a fosmid library and a metagenomic sequence data set from the same biological community taken from fjord surface water from Bergen, Norway. Pyrosequenced data and 16S clone data confirmed an 11-15% relative abundance of SAR11 within the community. In contrast, not a single SAR11 fosmid was identified in a pooled shotgun sequence data set of 100 fosmid clones. This underrepresentation was evidenced by comparative abundances of SAR11 sequences assessed by taxonomic annotation and fragment recruitment. Analysis revealed a similar underrepresentation of low-GC Flavobacteriaceae. We speculate that a contributing factor towards the fosmid bias may be DNA fragmentation during preparation because of the low GC content of SAR11 sequences and other underrepresented taxa. This study suggests that, although fosmid libraries can be extremely useful, caution must be taken when directly inferring community composition from metagenomic fosmid libraries.

publication date

  • July 2009