Skip to content

Stand-alone tools released within SPAdes package

k-mer counter

To provide input data to SPAdes k-mer counting tool spades-kmercount you may just specify files in SPAdes-supported formats without any flags (after all options) or provide dataset description file in YAML format.

Output: <output_dir>/final_kmers - unordered set of kmers in binary format. Kmers from both forward and reverse-complementary reads are taken into account.

Output format: All kmers are written sequentially without any separators. Each k-mer takes the same number of bits. One k-mer of length K takes 2*K bits. Kmers are aligned by 64 bits. For example, one kmer with length=21 takes 8 bytes, with length=33 takes 16 bytes, and with length=55 takes 16 bytes. Each nucleotide is coded with 2 bits: 00 - A, 01 - C, 10 - G, 11 - T.

Example:

    For k-mer: AGCTCT
    Memory: 6 bits * 2 = 12, 64 bits (8 bytes)
    Let’s describe bytes:
    data[0] = AGCT -> 11 01 10 00 -> 0xd8
    data[1] = CT00 -> 00 00 11 01 -> 0x0d
    data[2] = 0000 -> 00 00 00 00 -> 0x00
    data[3] = 0000 -> 00 00 00 00 -> 0x00
    data[4] = 0000 -> 00 00 00 00 -> 0x00
    data[5] = 0000 -> 00 00 00 00 -> 0x00
    data[6] = 0000 -> 00 00 00 00 -> 0x00
    data[7] = 0000 -> 00 00 00 00 -> 0x00

Synopsis: spades-kmercount [OPTION...] <input files>

The options are:

-d, --dataset file <file name> dataset description (in YAML format), input files ignored

-k, --kmer <int> k-mer length (default: 21)

-t, --threads <int> number of threads to use (default: number of CPUs)

-w, --workdir <dir name> working directory to use (default: current directory)

-b, --bufsize <int> sorting buffer size in bytes, per thread (default 536870912)

-h, --help print help message

k-mer coverage read filter

spades-read-filter is a tool for filtering reads with median kmer coverage less than threshold.

To provide input data to SPAdes k-mer read filter tool spades-read-filter you should provide dataset description file in YAML format.

Synopsis: spades-read-filter [OPTION...] -d <yaml>

The options are:

-d, --dataset file <file name> dataset description (in YAML format)

-k, --kmer <int> k-mer length (default: 21)

-t, --threads <int> number of threads to use (default: number of CPUs)

-o, --outdir <dir> output directory to use (default: current directory)

-c, --cov <value> median kmer count threshold (read pairs, s.t. kmer count median for BOTH reads LESS OR EQUAL to this value will be ignored)

-h, --help print help message

k-mer cardinality estimating

spades-kmer-estimating is a tool for estimating the approximate number of unique k-mers in the provided reads. Kmers from reverse-complementary reads aren"t taken into account for k-mer cardinality estimating.

To provide input data to SPAdes k-mer cardinality estimating tool spades-kmer-estimating you should provide dataset description file in YAML format.

Synopsis: spades-kmer-estimating [OPTION...] -d <yaml>

The options are:

-d, --dataset file <file name> dataset description (in YAML format)

-k, --kmer <int> k-mer length (default: 21)

-t, --threads <int> number of threads to use (default: number of CPUs)

-h, --help print help message

Graph construction

Graph construction tool spades-gbuilder has two mandatory options: dataset description file in YAML format and an output file name.

Synopsis: spades-gbuilder <dataset description (in YAML)> <output filename> [-k <value>] [-t <value>] [-tmpdir <dir>] [-b <value>] [-unitigs|-fastg|-gfa|-spades]

Additional options are:

-k <int> k-mer length used for construction (must be odd)

-t <int> number of threads

-tmp-dir <dir_name> scratch directory to use

-b <int> sorting buffer size (per thread, in bytes)

-unitigs k-mer length used for construction (must be odd)

-fastg output graph in FASTG format

-gfa output graph in GFA1 format

-spades output graph in SPAdes internal format

Graph simplification

Graph simplification tool spades-gsimplifier has four mandatory options: the first one is an input graph in GFA format, or a prefix of the SPAdes internal graph pack format created by setting checkpoint options. The second one is the prefix of the output simplified graph. The last two are the k-mer size and the read length.

Synopsis: spades-gsimplifier <graph. In GFA (ending with .gfa) or prefix to SPAdes graph pack> <output prefix> [--gfa] [--spades-gp] [--use-cov-ratios] -k <value> --read-length <value> [OPTION...]

Additional options are:

--gfa produce GFA output (default: true)

--spades-gp produce output graph pack in the SPAdes internal format (default: false)

--use-cov-ratios enable simplification procedures based on unitig coverage ratios (default: false)

-k <value> k-mer length to use

--read-length <value> read length to use

-c, --coverage <coverage> estimated average (k+1-mer) bin coverage (default: 0.) or 'auto' (works only with '-d/--dead-ends' provided)

-t, --threads <value> number of threads to use (default: max_threads / 2)

-p, --profile <file> file with edge coverage profiles across multiple samples

-s, --stop-codons <file> file with stop codon positions

-d, --dead-ends <file> while processing a subgraph -- file listing edges which are dead-ends in the original graph

Long read to graph alignment

hybridSPAdes aligner

A tool spades-gmapper gives the opportunity to extract long read alignments generated with hybridSPAdes pipeline options. It has three mandatory options: dataset description file in YAML format, graph file in GFA format and an output file name.

Synopsis: spades-gmapper <dataset description (in YAML)> <graph (in GFA)> <output filename> [-k <value>] [-t <value>] [-tmpdir <dir>]

Additional options are:

-k <int> k-mer length that was used for graph construction

-t <int> number of threads

-tmpdir <dir_name> scratch directory to use

While spades-gmapper is a solution for those who work on hybridSPAdes assembly and want to get exactly its intermediate results, SPAligner is an end-product application for sequence-to-graph alignment with tunable parameters and output types.

SPAligner

A tool for fast and accurate alignment of nucleotide sequences to assembly graphs. It takes file with sequences (in fasta/fastq format) and assembly in GFA format and outputs long read to graph alignment in various formats (such as tsv, fasta and GPA).

Synopsis: spaligner src/projects/spaligner_config.yaml -d <value> -s <value> -g <value> -k <value> [-t <value>] [-o <value>]

Parameters are:

-d <type> long reads type: nanopore, pacbio

-s <filename> file with sequences (in fasta/fastq)

-g <filename> file with graph (in GFA)

-k <int> k-mer length that was used for graph construction

-t <int> number of threads (default: 8)

-o, --outdir <dir> output directory to use (default: spaligner_result/)

For more information on parameters and options please refer to the main SPAligner manual (assembler/src/projects/spaligner/README.md).

Also if you want to align protein sequences please refer to our pre-release version.

Note that in order you use SPAligner one needs either to use pre-built binaries or compile SPAdes from sources using the additional -DSPADES_ENABLE_PROJECTS=spaligner option.