Stand-alone tools released within SPAdes package

k-mer counter

To provide input data to SPAdes k-mer counting tool spades-kmercount you may just specify files in SPAdes-supported formats without any flags (after all options) or provide dataset description file in YAML format.

Output: <output_dir>/final_kmers - unordered set of kmers in binary format. Kmers from both forward and reverse-complementary reads are taken into account.

Output format: All kmers are written sequentially without any separators. Each k-mer takes the same number of bits. One k-mer of length K takes 2*K bits. Kmers are aligned by 64 bits. For example, one kmer with length=21 takes 8 bytes, with length=33 takes 16 bytes, and with length=55 takes 16 bytes. Each nucleotide is coded with 2 bits: 00 - A, 01 - C, 10 - G, 11 - T.

Example:

    For k-mer: AGCTCT
    Memory: 6 bits * 2 = 12, 64 bits (8 bytes)
    Let’s describe bytes:
    data[0] = AGCT -> 11 01 10 00 -> 0xd8
    data[1] = CT00 -> 00 00 11 01 -> 0x0d
    data[2] = 0000 -> 00 00 00 00 -> 0x00
    data[3] = 0000 -> 00 00 00 00 -> 0x00
    data[4] = 0000 -> 00 00 00 00 -> 0x00
    data[5] = 0000 -> 00 00 00 00 -> 0x00
    data[6] = 0000 -> 00 00 00 00 -> 0x00
    data[7] = 0000 -> 00 00 00 00 -> 0x00

Synopsis: spades-kmercount [OPTION...] <input files>

The options are:

-d, --dataset file <file name> dataset description (in YAML format), input files ignored

-k, --kmer <int> k-mer length (default: 21)

-t, --threads <int> number of threads to use (default: number of CPUs)

-w, --workdir <dir name> working directory to use (default: current directory)

-b, --bufsize <int> sorting buffer size in bytes, per thread (default 536870912)

-h, --help print help message

k-mer coverage read filter

spades-read-filter is a tool for filtering reads with median kmer coverage less than threshold.

To provide input data to SPAdes k-mer read filter tool spades-read-filter you should provide dataset description file in YAML format.

Synopsis: spades-read-filter [OPTION...] -d <yaml>

The options are:

-d, --dataset file <file name> dataset description (in YAML format)

-k, --kmer <int> k-mer length (default: 21)

-t, --threads <int> number of threads to use (default: number of CPUs)

-o, --outdir <dir> output directory to use (default: current directory)

-c, --cov <value> median kmer count threshold (read pairs, s.t. kmer count median for BOTH reads LESS OR EQUAL to this value will be ignored)

-h, --help print help message

k-mer cardinality estimating

spades-kmer-estimating is a tool for estimating the approximate number of unique k-mers in the provided reads. Kmers from reverse-complementary reads aren"t taken into account for k-mer cardinality estimating.

To provide input data to SPAdes k-mer cardinality estimating tool spades-kmer-estimating you should provide dataset description file in YAML format.

Synopsis: spades-kmer-estimating [OPTION...] -d <yaml>

The options are:

-d, --dataset file <file name> dataset description (in YAML format)

-k, --kmer <int> k-mer length (default: 21)

-t, --threads <int> number of threads to use (default: number of CPUs)

-h, --help print help message

Graph construction

Graph construction tool spades-gbuilder has two mandatory options: dataset description file in YAML format and an output file name.

Synopsis: spades-gbuilder <dataset description (in YAML)> <output filename> [-k <value>] [-t <value>] [-tmpdir <dir>] [-b <value>] [-unitigs|-fastg|-gfa|-spades]

Additional options are:

-k <int> k-mer length used for construction (must be odd)

-t <int> number of threads

-tmp-dir <dir_name> scratch directory to use

-b <int> sorting buffer size (per thread, in bytes)

-unitigs k-mer length used for construction (must be odd)

-fastg output graph in FASTG format

-gfa output graph in GFA1 format

-spades output graph in SPAdes internal format

Graph simplification

Graph simplification tool spades-gsimplifier has four mandatory options: the first one is an input graph in GFA format, or a prefix of the SPAdes internal graph pack format created by setting checkpoint options. The second one is the prefix of the output simplified graph. The last two are the k-mer size and the read length.

Synopsis: spades-gsimplifier <graph. In GFA (ending with .gfa) or prefix to SPAdes graph pack> <output prefix> [--gfa] [--spades-gp] [--use-cov-ratios] -k <value> --read-length <value> [OPTION...]

Additional options are:

--gfa produce GFA output (default: true)

--spades-gp produce output graph pack in the SPAdes internal format (default: false)

--use-cov-ratios enable simplification procedures based on unitig coverage ratios (default: false)

-k <value> k-mer length to use

--read-length <value> read length to use

-c, --coverage <coverage> estimated average (k+1-mer) bin coverage (default: 0.) or 'auto' (works only with '-d/--dead-ends' provided)

-t, --threads <value> number of threads to use (default: max_threads / 2)

-p, --profile <file> file with edge coverage profiles across multiple samples

-s, --stop-codons <file> file with stop codon positions

-d, --dead-ends <file> while processing a subgraph -- file listing edges which are dead-ends in the original graph

Graph splitting

Graph splitting tool spades-gfa-split partitions input assembly graph (provided in GFA format) into subgraphs corresponding to its undirected components (i.e. the components of the undirected graph that is obtained by ignoring the orientations of edges). GFA paths, if present, are preserved and splitted as well.

Synopsis: spades-gfa-split <graph (in GFA)> <output base>

The components will be emitted as subgraph_NNN.gfa files inside <output base> directory.

hybridSPAdes aligner

Not to be confused with SPAligner.

A tool spades-gmapper gives the opportunity to extract long read alignments generated with hybridSPAdes pipeline options. It has three mandatory options: dataset description file in YAML format, graph file in GFA format and an output file name.

While spades-gmapper is a solution for those who work on hybridSPAdes assembly and want to get exactly its intermediate results, SPAligner is an end-product application for sequence-to-graph alignment with tunable parameters and output types.

Synopsis: spades-gmapper <dataset description (in YAML)> <graph (in GFA)> <output filename> [-k <value>] [-t <value>] [-tmpdir <dir>]

Additional options are:

-k <int> k-mer length that was used for graph construction

-t <int> number of threads

-tmpdir <dir_name> scratch directory to use