BarcodedIgReC manual

1. What is BarcodedIgReC?
2. Installation
    2.1. Verifying your installation
3. BarcodedIgReC usage
    3.1. Basic options
    3.2. Advanced options
    3.3. Examples
    3.4. Output files
4. Citation
5. Feedback and bug reports

1. What is BarcodedIgReC?

BarcodedIgReC is a modification of IgReC full-length antibody repertoire construction tool for barcoded datasets. BarcodedIgReC pipeline is shown below:

BarcodedIgReC_pipeline

Input:

BarcodedIgReC takes as an input demultiplexed paired-end or single reads with unique molecular identifiers (UMIs). Please note that IgRepertoireConstructor constructs full-length repertoire and expects that input reads cover variable region of antibody/TCR.

Output:

BarcodedIgReC corrects sequencing and amplification errors and groups together reads corresponding to identical antibodies. Thus, constructed repertoire is a set of antibody clusters characterized by sequence, read multiplicity and molecule multiplicity. While read multiplicity is the number of reads in an antibody cluster, molecule multiplicity is an estimate to the number of RNA molecules related to the cluster. BarcodedIgReC provides user with the following information about constructed repertoire:

antibody clusters: antibody sequences with read multiplicities,
antibody clusters: antibody sequences with molecule multiplicities,
read-cluster map: information about antibody clusters contents.

Stages:

BarcodedIgReC pipeline consists of the following steps:

VJ Finder: cleaning input reads using alignment against Ig germline genes
Barcode clustering: correcting errors in reads sharing a barcode, correcting barcode errors, handling issues with identical and close barcodes assigned to unrelated molecules. Also chimeric reads are discarded at this step. As a result, we group all reads by original molecules.
IgReC: grouping very close molecules, thus correcting minor remaining errors. The output is formed by this step as well.

2. Installation

For installation instructions please refer to IgRepertoireConstructor installation guide.

2.1. Verifying your installation

For testing purposes, BarcodedIgReC comes with a toy dataset.

To try BarcodedIgReC on the test data set, run:


    ./barcoded_igrec.py --test

If the installation of BarcodedIgReC is successful, you will find the following information at the end of the log:

    
    Thank you for using BarcodedIgReC!
    Log was written to barigrec_test/igrec.log

3. BarcodedIgReC usage

BarcodedIgReC takes as an input demultiplexed barcoded Illumina reads covering variable region of antibody and constructs repertoire in CLUSTERS.FA and RCM format.

To run BarcodedIgReC, type:

    
    ./barcoded_igrec.py [options] -s <single_reads.fastq> -o <output_dir>

    
    ./barcoded_igrec.py [options] -1 <left_reads.fastq> -2 <right_reads.fastq> -o <output_dir>

3.1. Basic options:

-s <single_reads.fastq>
FASTQ file with single Illumina reads (required).

-1 <left_reads.fastq> -2 <right_reads.fastq>
FASTQ files with paired-end Illumina reads (required).

-o / --output <output_dir>
Output directory (required).

-t / --threads <int>
The number of parallel threads. The default value is 16.

--test
Running on the toy test dataset. Command line corresponding to the test run is equivalent to the following:

    
    ./barcoded_igrec.py -s test_dataset/barcodedIgReC_test.fasta -l all -o barigrec_test

--loci / -l <str>
Immunological loci to align input reads and discard reads with low score (required).
Available values are IGH / IGL / IGK / IG (for all BCRs) / TRA / TRB / TRG / TRD / TR (for all TCRs) or all. This is a required parameter.

--help
Printing help.

3.2. Advanced options:

--organism <str>
Organism for which the germline is taken. Available values are human, mouse, pig, rabbit, rat and rhesus_monkey. The default value is human.

--igrec-tau <int>
Maximal allowed number of mismatches between two barcode cluster consensuses corresponding to identical antibodies. The default (and recommended) value is 2. This value allows barcode cluster consensuses to contain a single error. Higher values can reduce barcoding advantage. Lower values may produce better results for large clusters, not gluing close sequences. At the same time, small clusters can suffer from undercorrection in such case.

--clustering-thr
Maximal allowed distance between reads sharing a barcode to put them into one cluster. The default value is 20. Our analysis shows that this value allows both not to put unrelated antibodies into the same cluster and to correct all the amplification errors. You can increase this value for overamplified datasets to ensure better error correction. You can decrease this value for datasets with high clonality to better distinguish close antibodies.

3.3. Examples

To construct antibody repertoire from single reads reads.fastq, type:

    
    ./barcoded_igrec.py -s reads.fastq -o output_dir -l all

3.4. Output files

BarcodedIgReC creates working directory (which name was specified using option -o) and outputs the following files there:

Final repertoire files:

final_repertoire/final_repertoire.fa — CLUSTERS.FASTA file for all antibody clusters of the constructed repertoire with cluster size in terms of reads. (details in Antibody repertoire representation).
final_repertoire/final_repertoire_umi.fa.gz — CLUSTERS.FASTA file for all antibody clusters of the constructed repertoire with cluster size in terms of RNA molecules.
final_repertoire/final_repertoire.rcm — RCM file for the constructed repertoire (details in Antibody repertoire representation).

VJ finder output:

vj_finder/cleaned_reads.fa — FASTA file with cleaned reads constructed at the VJ Finder stage. Cleaned reads have forward direction (from V to J), contain V and J gene segments and are cropped by the left bound of V gene segment.
vj_finder/filtered_reads.fa — FASTA file with filtered reads. Filtered reads have bad alignment to Ig germline gene segments and are likely to present contaminations.
vj_finder/alignment_info.csv — CSV file containing information about alignment of cleaned reads to V and J gene segments. Details of alignment_info.csv format are given in Alignment Info file format.

igrec.log — full log of BarcodedIgReC run.

4. Citations

If you use BarcodedIgReC in your research, please refer to

Alexander Shlemov, Sergey Bankevich, Andrey Bzikadze, Dmitriy M. Chudakov, Yana Safonova, and Pavel A. Pevzner. Reconstructing antibody repertoires from error-prone immunosequencing datasets (submitted)

5. Feedback and bug reports

Your comments, bug reports, and suggestions are very welcome. They will help us to further improve BarcodedIgReC.

If you have any trouble running BarcodedIgReC, please send us the log file from the output directory.

Address for communications: igtools_support@googlegroups.com.