Binning refining using assembly graphs
BinSPreader is a tool that attempts to refine metagenome-assembled genomes (MAGs) obtained from existing tools. BinSPreader exploits the assembly graph topology and other connectivity information, such as paired-end and Hi-C reads, to refine the existing binning, correct binning errors, and propagate binning from longer contigs to shorter contigs, and infer contigs belonging to multiple MAGs. Please refer to the BinSPreader paper for more details. In addition to increasing the completeness of the bins, refinement also enriches bins with contigs containing important conservative genes using the short assembly graph edges which are typically underrepresented in state-of-the-art contig binning methods.
The tool requires initial binning to refine, as well as an assembly graph as a source of information for refining. Optionally, BinSPreader can be provided with multiple Hi-C and/or paired-end libraries. The BinSPreader protocol contains more detailed instructions on installing and running BinSPreader.
Compilation
To compile BinSPreader, run
After the compilation is complete, binspreader executable will be located in the bin folder.
Command line options
Required positional arguments:
- Assembly graph file in GFA 1.0
format, with
scaffolds included as path lines. Alternatively, scaffold paths can be
provided separately using
--pathoption in the.pathsformat accepted by Bandage (see Bandage wiki for details). - Binning output from an existing tool (in
.tsvformat)
Synopsis
Main options
--paths
provide contigs paths from file separately from GFA
--dataset
Dataset in YAML format describing Hi-C and paired-end reads
-t
Number of threads to use (default: 1/2 of available threads)
-m
Allow multiple bin assignment (default: false)
-Smax|-Smle
Simple maximum or maximum likelihood binning assignment strategy (default: max likelihood)
-Rcorr|-Rprop
Select propagation or correction mode (default: correction)
--cami
Use CAMI bioboxes binning format
--zero-bin
Emit zero bin for unbinned sequences
--tall-multi
Use tall table for multiple binning result
--bin-dist
Estimate pairwise bin distance (could be slow on large graphs!)
-la
Labels correction regularization parameter for labeled data (default: 0.6)
Output
BinSPreader stores all output files in the output directory <output_dir> set by the user.
<output_dir>/binning.tsvcontains refined binning in.tsvformat<output_dir>/bin_stats.tsvcontains various per-bin statistics<output_dir>/bin_weights.tsvcontains soft bin weights per contig<output_dir>/edge_weights.tsvcontains soft bin weights per edge
In addition
<output_dir>/bin_dist.tsvcontains refined bin distance matrix (if--bin-distwas used)<output_dir>/bin_label_1.fastq, <output_dir>/bin_label_2.fastqread set for bin labeled bybin_label(if--readswas used)<output_dir>/pe_links.tsvlist of paired-end links between assembly graph edges with weights (if--debugwas used)<output_dir>/graph_links.tsvlist of graph links between assembly graph edges with weights (if--debugwas used)
References
If you are using BinSPreader in your research, please cite: