Skip to content

HMM-guided mode

The majority of SPAdes assembly modes (multicell, single-cell, rnaviral, meta and biosynthetic) also supports HMM-guided mode as implemented in biosyntheticSPAdes. The detailed description could be found in biosyntheticSPAdes paper, but in short: amino acid profile HMMs are aligned to the edges of assembly graph. After this the subgraphs containing the set of matches ("domains") are extracted and all possible paths through the domains that are supported both by paired-end data (via scaffolds) and graph topology are obtained (putative biosynthetic gene clusters).

HMM-guided mode is enabled via providing a set of HMMs (*.hmm.gz file) via --custom-hmms option. In HMM-guided mode the set of contigs and scaffolds (see SPAdes output section for more information ) is kept intact, however additional biosyntheticSPAdes output represents the output of HMM-guided assembly.

We provide an example of HMM utility in viral assembly, along with general advice on constructing HMM profile sets for various purposes, in our paper on noroviral assembly.

Note that normal biosyntheticSPAdes mode (via --bio option) is a bit different from HMM-guided mode: besides using the special set of profile HMMS representing a family of NRSP/PKS domains also includes a set of assembly graph simplification and processing settings aimed for fuller recovery of biosynthetic gene clusters.

coronaSPAdes mode

Given an increased interest in coronavirus research we developed a coronavirus assembly mode for SPAdes assembler (also known as coronaSPAdes). It allows to assemble full-length coronaviridae genomes from the transcriptomic and metatranscriptomic data. Algorithmically, coronaSPAdes is an rnaviralSPAdes that uses the set of HMMs from Pfam SARS-CoV-2 2.0 set as well as additional HMMs as outlined by (Phan et al, 2019). coronaSPAdes could be run via a dedicated coronaspades.py script. See coronaSPAdes paper for more information about rnaviralSPAdes, coronaSPAdes and HMM-guided mode. Output for any HMM-related mode (--bio, --corona, or --custom-hmms flags) is the same with biosyntheticSPAdes' output.