Summarized Nucleotide Frequency data.table
tx_makeDT_nucFreq.Rd
This function constructs a list of data.tables that contains nucleotide frequency metrics per nucleotide by transcript:
A = Adenine
C = Cytosine
G = Guanine
T = Thymine
N = Undetermined nucleotide
- = Deletion
. = Insert, not read gap between read1 and read2
The function requires the input of a GRangesList object output by the
tx_reads
function, which should contain sequence alignments in the
transcriptomic space, and a gene annotation in GRanges format, as loaded by
the tx_load_bed
function.
Usage
tx_makeDT_nucFreq(
x,
geneAnnot,
genome = NULL,
simplify_IUPAC = "splitForceInt",
fullDT = FALSE,
nCores = 1
)
Arguments
- x
CompressedGRangesList. Genomic Ranges list containing genomic alignments data by gene. Constructed via the
tx_reads
function.- geneAnnot
GRanges. Gene annotation as loaded by
tx_load_bed
()- genome
list. The full reference genome sequences, as loaded by
tx_load_genome
() or prepackaged by BSgenome, see ?BSgenome::available.genomes- simplify_IUPAC
string. Available options are :
"not": Will output the complete nucleotide frequency table including ambiguous reads using the IUPAC ambiguity code. See:
IUPAC_CODE_MAP
"splitForceInt" (Default): Will force an integers split in which ambiguous codes will be split and assigned half the frequency into their respective nucleotides, if the frequency is an odd number the uneven count will be assigned as "N".
"splitHalf": Ambiguous nucleotide frequencies will be split in half to their corresponding nucleotides, in cases where frequency is odd creating non-integer frequencies.
- fullDT
logical. Set to TRUE if it is desired to output a data.table with all genes and in the same order as 'geneAnnot' object.
- nCores
integer. Number of cores to run the function with. Multicore capability not available in Windows OS.
Details
This function allows for usage of multiple cores to reduce processing times in UNIX-like OS.
See also
Other makeDT functions:
tx_makeDT_covNucFreq()
,
tx_makeDT_coverage()