coverage¶
coverage module contain several functions which allow to get the coverages from data input, a
craw.wig.Genome
object or a pysam.AlignmentFile
.
There is 2 kind of functions:
the functions to get coverage from input.
the functions to process the coverage.
Functions to get coverage¶
These low level functions are not aimed to be called directly. They are called inside function which process the coverages.
get_raw_bam_coverage¶
Get coverage from pysam for reference (chromosome) for an interval of positions, a quality on both strand. and convert the coverage return by pysam. A score on each position for each base (ACGT)) in a global coverage for this position.
This function is called for each entry of the annotation file.
get_raw_wig_coverage¶
Get coverage from craw.wig.Genome
instance for reference (chromosome) for an interval of positions, on both strand.
The quality parameter is here just to have the same signature as get_bam_coverage but will be ignores .
This function is called for each entry of the annotation file.
get_raw_coverage_function¶
Allow to choose the right get_raw_(*)_coverage in function of the data input type
(craw.wig.Genome
, pysam.AlignmentFile
)
-
craw.coverage.
get_raw_wig_coverage
(genome, annot_entry, start, stop, qual_thr=None)[source]¶ - Parameters
genome (
craw.wig.Genome
object) – The genome which store all coverages.annot_entry (
annotation.Entry
object) – an entry of the annotation filestart (int) – The position to start to compute the coverage(coordinates are 0-based, start position is included).
stop (int) – The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).
qual_thr (None) – this parameter is not used, It’s here to have the same api as get_bam_coverage.
- Returns
the coverage (all bases)
- Return type
tuple of 2 list containing int or float
-
craw.coverage.
get_raw_bam_coverage
(sam_file, annot_entry, start, stop, qual_thr=15)[source]¶ Compute the coverage for a region position by position on each strand
- Parameters
sam_file (
pysam.AlignmentFile
object.) – the samfile openend with pysamannot_entry (
annotation.Entry
object) – an entry of the annotation filestart (positive int) – The position to start to compute the coverage(coordinates are 0-based, start position is included).
stop (positive int) – The position to start to compute the coverage (coordinates are 0-based, stop position is excluded).
qual_thr (int) – The quality threshold
- Returns
the coverage (all bases)
- Return type
tuple of 2 list containing int
-
craw.coverage.
get_raw_coverage_function
(input)[source]¶ - Parameters
input (
wig.Genome
orpysam.calignmentfile.AlignmentFile
object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)- Returns
get_wig_coverage or get_bam_coverage according the type of input
- Return type
function
- Raises
RuntimeError – when input is not instance of
pysam.calignmentfile.AlignmentFile
orwig.Genome
Functions to process coverages¶
These functions guess the right get_raw_(*)_coverage in function of the data input and pass it to a post processing function.
all functions returned have the same API
3 parameters as input
annot_entry: an entry of the annotation file.
start: The position to start to compute the coverage(coordinates are 0-based, start position is included).
stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).
and
return: a tuple of two list or tuple containing in this order the coverages on the forward strand then the coverages on the reverse strand.
These architecture allow to combine easily the different get_raw_coverage function with the different post-processing. For instance:
bam = pysam.AlignmentFile(bam_file, "rb")
get_coverage = get_padded_coverage(bam, max_left, max_right, qual_thr=15)
forward, reverse = get_coverage(annot_entry, 10 200)
or
wig = wig_parser.parse()
get_coverage = get_resized_coverage(wig, 200)
forward, reverse = get_coverage(annot_entry, 10 200)
-
craw.coverage.
padded_coverage_maker
(input_data, max_left, max_right, qual_thr=None)[source]¶ - Parameters
input_data (
wig.Genome
orpysam.AlignmentFile
object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)max_left (int) – The highest number of base before the reference position to take in account.
max_right (int) – The highest number of base after the reference position to take in account.
qual_thr (int) – The quality threshold if input data come from wig this parameter is not used,
- Returns
a function
get_padded_coverage()
, a function which compute the coverage for a gene on each strand between position [start, stop[The coverage values are centered on the annot_entry.ref position, the matrix is padded by
None
value.:[.......[ coverage ref.pos ] .....] [....[covergae ref.pos ] .....] [............[ cov ref.pos ]]
This function take 3 parameters:
annot_entry: an entry of the annotation file.
start: The position to start to compute the coverage(coordinates are 0-based, start position is included).
stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).
and
return: a tuple with 2 tuple of float or int representing the coverage on strand forward and reverse.
- Return type
function
-
craw.coverage.
resized_coverage_maker
(input_data, new_size, qual_thr=None)[source]¶ - Parameters
input_data (
craw.wig.Genome
orpysam.AlignmentFile
object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)new_size (postive int) – the number of values in the coverage vector.
qual_thr (int) – The quality threshold if input data come from wig this parameter is not used,
- Returns
a function
get_resized_coverage()
, a function which compute the coverage for a gene on each strand between position [start, stop[ This function take 3 parameters: the coverage values are generate by linear interpolation from raw values between [start, stop[ using the scipy. see https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.interpolate.interp1d.htmlannot_entry: an entry of the annotation file.
start: The position to start to compute the coverage(coordinates are 0-based, start position is included).
stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).
and
return: a tuple with 2 tuple of float or int representing the coverage on strand forward and reverse.
- Return type
function
-
craw.coverage.
sum_coverage_maker
(input_data, qual_thr=None)[source]¶ This function return a new function
get_sum_coverage()
- Parameters
input_data (
craw.wig.Genome
orpysam.AlignmentFile
object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)qual_thr (int) – The quality threshold if input data come from wig this parameter is not used,
- Returns
get_sum_coverage()
, a function which compute the sum of coverage for a gene on each strand between position [start, stop[ This function take 3 parameters:annot_entry: an entry of the annotation file.
start: The position to start to compute the coverage(coordinates are 0-based, start position is included).
stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).
and
return: a tuple with 2 tuple of float or int representing the coverage on strand forward and reverse.
- Return type
function