MiXCR parameters (non-barcoded data)

name parameters comments
Align -f, -g, --noMerge, -p = kaligner2, –species = hsa, -OreadsLayout = Collinear, -OvParameters.geneFeatureToAlign = VTranscript, -OallowPartialAlignments = true
Assemble -f, -OassemblingFeatures = FR1Begin:FR4Begin Since sequences are cropped by the end of CDR3, FR4 region is not present in final sequences. We selected the specified parameter value since running MiXCR with seemingly more appropriate value FR1Begin:FR4Eend results in a non-stable behavior and often produces an empty repertoire.
Export clones -f, --no-spaces, -sequence, -count, -readIds

pRESTO parameters (non-barcoded data)

name parameters comments
CollapseSeq Default parameters Although this stage can use information about primers, we do not use this information since we want to conduct primer-independent benchmarking. Although this stage can fix unspecified nucleotides (“N”s), but we do not use this feature too, since it is addressed at the preliminary alignment step.
SplitSeq Default parameters The stage uses a threshold parameter (--num=X) that is analogous in IgReC (discussed in Section 2.2 of the main text). In our experiments, this parameter is not fixed and estimation of its optimal value is a part of benchmarking.

pRESTO parameters (barcoded data)

name parameters
ClusterSets Default parameters
BuildConsensus --prcons 0.6 --maxerror 0.1 --maxgap 0.5
CollapseSeq --uf PRCONS --cf CONSCOUNT --act sum
Table A1. Benchmarking parameters of MiXCR (top) and pRESTO (middle) on non-barcoded datasets and pRESTO (bottom) on barcoded datasets. For all tools, we unified the read merging, alignment and filtering by using the IgReC preprocessing. After this preprocessing, all input libraries contain Ig-relevant reads that are cropped by the start of the corresponding V gene.