Skip to content

Tips on SPAdes parameters

Assembling Illumina genomic data

Isolated and multi-cell datasets

When assembling conventional multi-cell and bacterial isolated datasets with decent coverage (say 50x or higher), we strongly recommend to use --isolate option.

SPAdes is capable of detecting optimal k-mer sizes automatically. Thus, if the assembly went smoothly without any errors or warnings, there is nothing to worry about. For example, for read length 100bp the default k values are 21, 33, 55; for 150bp reads SPAdes uses k-mer sizes 21, 33, 55, 77; and for 250bp reads six iterations are used by default: 21, 33, 55, 77, 99, 127. We strongly recommend not to change -k parameter unless you are clearly aware about the effect.

Single-cell data

The default k-mer lengths are recommended. For single-cell data sets SPAdes always selects k-mer sizes 21, 33 and 55.

It might be tricky to fully utilize the advantages of longer Illumina reads (e.g. 250bp). Consider contacting us for more information and to discuss assembly strategy.

Assembling IonTorrent reads

FASTQ or BAM files are supported as input in IonTorrent mode.

The selection of k-mer sizes might be non-trivial for IonTorrent. If the dataset is more or less conventional (good coverage, moderate or low GC, etc), then you can try using larger k-mer lengths, e.g. 21, 33, 55, 77, 99, 127.

However, due to increased error rate some changes of k-mer lengths (e.g. selection of shorter ones) may be required. For example, if you ran SPAdes with k-mer lengths 21,33,55,77 and then decided to assemble the same data set using more iterations and larger values of K, you can run SPAdes once again specifying the same output folder and the following options: --restart-from k77 -k 21,33,55,77,99,127 -o <previous_output_dir>. Do not forget to copy contigs and scaffolds from the previous run.

You may need no error correction for Hi-Q enzyme at all. However, we suggest trying to assemble your data with and without error correction and select the best variant.

For non-trivial datasets (e.g. with high GC, low or uneven coverage) we suggest enabling single-cell mode (setting --sc option) and use k-mer lengths of 21, 33, 55.