Supported data types

Sequencing data

IsoQuant support all kinds of long RNA data:

PacBio CCS
ONT dRNA / ONT cDNA
Assembled / corrected transcript sequences

Reads must be provided in FASTQ/FASTA format (can be gzipped) or unmapped BAM format.
If you have already aligned your reads to the reference genome, simply provide sorted and indexed BAM files.

IsoQuant expect reads to contain polyA tails. For more reliable transcript model construction do not trim polyA tails.

IsoQuant can also take aligned Illumina reads to correct long-read spliced alignments. However, short reads are not used to discover transcript models or compute abundances.

Supported reference data

Reference genome should be provided in multi-FASTA format (can be gzipped). Reference genome is mandatory even when BAM files are provided.

Reference gene annotation is not mandatory, but is likely to increase precision and recall. It can be provided in GFF/GTF format (can be gzipped). In this case it will be converted to gffutils database. Information on converted databases will be stored in your ~/.config/IsoQuant/db_config.json to increase speed of future runs. You can also provide gffutils database manually. Make sure that chromosome/scaffold names are identical in FASTA file and gene annotation. Note, that gffutils databases may not work correctly on NFS shares. It is possible to set a designated folder for the database with --genedb_output (different from the output directory).

Pre-constructed aligner index can also be provided to reduce mapping time.