Sequencing errors and biases in metagenomic datasets affect coverage-based
assemblies and are often ignored during analysis. Here, we analyze read
connectivity in metagenomes and identify the presence of problematic and likely
a-biological connectivity within metagenome assembly graphs. Specifically, we
identify highly connected sequences which join a large proportion of reads
within each real metagenome. These sequences show position-specific bias in
shotgun reads, suggestive of sequencing artifacts, and are only minimally
incorporated into contigs by assembly. The removal of these sequences prior to
assembly results in similar assembly content for most metagenomes and enables
the use of graph partitioning to decrease assembly memory and time
requirements.