coverage

coverage module contain several functions which allow to get the coverages from data input, a craw.wig.Genome object or a pysam.AlignmentFile.

There is 2 kind of functions:

  • the functions to get coverage from input.

  • the functions to process the coverage.

Functions to get coverage

These low level functions are not aimed to be called directly. They are called inside function which process the coverages.

get_raw_bam_coverage

Get coverage from pysam for reference (chromosome) for an interval of positions, a quality on both strand. and convert the coverage return by pysam. A score on each position for each base (ACGT)) in a global coverage for this position.

This function is called for each entry of the annotation file.

get_raw_wig_coverage

Get coverage from craw.wig.Genome instance for reference (chromosome) for an interval of positions, on both strand. The quality parameter is here just to have the same signature as get_bam_coverage but will be ignores .

This function is called for each entry of the annotation file.

get_raw_coverage_function

Allow to choose the right get_raw_(*)_coverage in function of the data input type (craw.wig.Genome, pysam.AlignmentFile)

craw.coverage.get_raw_wig_coverage(genome, annot_entry, start, stop, qual_thr=None)[source]
Parameters
  • genome (craw.wig.Genome object) – The genome which store all coverages.

  • annot_entry (annotation.Entry object) – an entry of the annotation file

  • start (int) – The position to start to compute the coverage(coordinates are 0-based, start position is included).

  • stop (int) – The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).

  • qual_thr (None) – this parameter is not used, It’s here to have the same api as get_bam_coverage.

Returns

the coverage (all bases)

Return type

tuple of 2 list containing int or float

craw.coverage.get_raw_bam_coverage(sam_file, annot_entry, start, stop, qual_thr=15)[source]

Compute the coverage for a region position by position on each strand

Parameters
  • sam_file (pysam.AlignmentFile object.) – the samfile openend with pysam

  • annot_entry (annotation.Entry object) – an entry of the annotation file

  • start (positive int) – The position to start to compute the coverage(coordinates are 0-based, start position is included).

  • stop (positive int) – The position to start to compute the coverage (coordinates are 0-based, stop position is excluded).

  • qual_thr (int) – The quality threshold

Returns

the coverage (all bases)

Return type

tuple of 2 list containing int

craw.coverage.get_raw_coverage_function(input)[source]
Parameters

input (wig.Genome or pysam.calignmentfile.AlignmentFile object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)

Returns

get_wig_coverage or get_bam_coverage according the type of input

Return type

function

Raises

RuntimeError – when input is not instance of pysam.calignmentfile.AlignmentFile or wig.Genome

Functions to process coverages

These functions guess the right get_raw_(*)_coverage in function of the data input and pass it to a post processing function.

all functions returned have the same API

3 parameters as input

  • annot_entry: an entry of the annotation file.

  • start: The position to start to compute the coverage(coordinates are 0-based, start position is included).

  • stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).

and

  • return: a tuple of two list or tuple containing in this order the coverages on the forward strand then the coverages on the reverse strand.

These architecture allow to combine easily the different get_raw_coverage function with the different post-processing. For instance:

bam = pysam.AlignmentFile(bam_file, "rb")
get_coverage = get_padded_coverage(bam, max_left, max_right, qual_thr=15)
forward, reverse = get_coverage(annot_entry, 10 200)

or

wig = wig_parser.parse()
get_coverage = get_resized_coverage(wig, 200)
forward, reverse = get_coverage(annot_entry, 10 200)
craw.coverage.padded_coverage_maker(input_data, max_left, max_right, qual_thr=None)[source]
Parameters
  • input_data (wig.Genome or pysam.AlignmentFile object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)

  • max_left (int) – The highest number of base before the reference position to take in account.

  • max_right (int) – The highest number of base after the reference position to take in account.

  • qual_thr (int) – The quality threshold if input data come from wig this parameter is not used,

Returns

a function get_padded_coverage(), a function which compute the coverage for a gene on each strand between position [start, stop[

The coverage values are centered on the annot_entry.ref position, the matrix is padded by None value.:

[.......[ coverage ref.pos ] .....]
[....[covergae     ref.pos ] .....]
[............[ cov ref.pos       ]]

This function take 3 parameters:

  • annot_entry: an entry of the annotation file.

  • start: The position to start to compute the coverage(coordinates are 0-based, start position is included).

  • stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).

and

  • return: a tuple with 2 tuple of float or int representing the coverage on strand forward and reverse.

Return type

function

craw.coverage.resized_coverage_maker(input_data, new_size, qual_thr=None)[source]
Parameters
  • input_data (craw.wig.Genome or pysam.AlignmentFile object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)

  • new_size (postive int) – the number of values in the coverage vector.

  • qual_thr (int) – The quality threshold if input data come from wig this parameter is not used,

Returns

a function get_resized_coverage(), a function which compute the coverage for a gene on each strand between position [start, stop[ This function take 3 parameters: the coverage values are generate by linear interpolation from raw values between [start, stop[ using the scipy. see https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.interpolate.interp1d.html

  • annot_entry: an entry of the annotation file.

  • start: The position to start to compute the coverage(coordinates are 0-based, start position is included).

  • stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).

and

  • return: a tuple with 2 tuple of float or int representing the coverage on strand forward and reverse.

Return type

function

craw.coverage.sum_coverage_maker(input_data, qual_thr=None)[source]

This function return a new function get_sum_coverage()

Parameters
  • input_data (craw.wig.Genome or pysam.AlignmentFile object) – the input either a samfile (see pysam library) or a genome build from a wig file (see wig module)

  • qual_thr (int) – The quality threshold if input data come from wig this parameter is not used,

Returns

get_sum_coverage(), a function which compute the sum of coverage for a gene on each strand between position [start, stop[ This function take 3 parameters:

  • annot_entry: an entry of the annotation file.

  • start: The position to start to compute the coverage(coordinates are 0-based, start position is included).

  • stop: The position to stop to compute the coverage (coordinates are 0-based, stop position is excluded).

and

  • return: a tuple with 2 tuple of float or int representing the coverage on strand forward and reverse.

Return type

function