Supported data types
Sequencing data
IsoQuant support all kinds of long RNA data:
- PacBio CCS
- ONT dRNA / ONT cDNA
- Assembled / corrected transcript sequences
Reads must be provided in FASTQ or FASTA format (can be gzipped). If you have already aligned your reads to the reference genome, simply provide sorted and indexed BAM files.
IsoQuant expect reads to contain polyA tails. For more reliable transcript model construction do not trim polyA tails.
IsoQuant can also take aligned Illumina reads to correct long-read spliced alignments. However, short reads are not used to discover transcript models or compute abundances.
Supported reference data
Reference genome should be provided in multi-FASTA format (can be gzipped). Reference genome is mandatory even when BAM files are provided.
Reference gene annotation is not mandatory, but is likely to increase precision and recall.
It can be provided in GFF/GTF format (can be gzipped).
In this case it will be converted to gffutils database. Information on converted databases will be stored in your ~/.config/IsoQuant/db_config.json
to increase speed of future runs. You can also provide gffutils database manually. Make sure that chromosome/scaffold names are identical in FASTA file and gene annotation.
Note, that gffutils databases may not work correctly on NFS shares. It is possible to set a designated folder for
the database with --genedb_output
(different from the output directory).
Pre-constructed aligner index can also be provided to increase mapping time.