Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
########################################################################### # # # This file is part of Counter RNAseq Window (craw) package. # # # # Authors: Bertrand Neron # # Copyright (c) 2017-2019 Institut Pasteur (Paris). # # see COPYRIGHT file for details. # # # # craw is free software: you can redistribute it and/or modify # # it under the terms of the GNU General Public License as published by # # the Free Software Foundation, either version 3 of the License, or # # (at your option) any later version. # # # # craw is distributed in the hope that it will be useful, # # but WITHOUT ANY WARRANTY; without even the implied warranty of # # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # See the GNU General Public License for more details. # # # # You should have received a copy of the GNU General Public License # # along with craw (see COPYING file). # # If not, see <http://www.gnu.org/licenses/>. # # # ###########################################################################
""" Handle error related to wig parsing """
""" Represent the data following a declaration line. The a Chunk contains sparse data on coverage on a region of one chromosomes on both strand plus data contains on the declaration line. """
""" :param kwargs: the key,values pairs found on a Declaration line :type kwargs: dictionary """
def is_fixed_step(self): """ This is an abstract methods, must be implemented in inherited class :return: True if i's a fixed chunk of data, False otheweise :rtype: boolean """ return NotImplemented
def parse_data_line(self, line, chrom, strand_type): """ parse a line of data and append the results in the corresponding strand This is an abstract methods, must be implemented in inherited class.
:param line: line of data to parse (the white spaces at the end must be strip) :type line: string :param chrom: the chromosome to add coverage data :type chrom: :class:`Chromosome` object. :param strand_type: which kind of wig is parsing: forward, reverse, or mixed strand :type strand_type: string '+' , '-', 'mixed' """ return NotImplemented
def _convert_cov(strand_type, cov): else:
""" The FixedChunk objects handle data of 'fixedStep' declaration line and it's coverage data """
# we switch from 1-based positions in wig into 0-based position in chromosome # to have the same behavior as in bam
""" :return: True :rtype: boolean """
""" parse line of data following a fixedStep Declaration. add the result on the corresponding strand (forward if coverage value is positive, reverse otherwise) :param line: line of data to parse (the white spaces at the end must be strip) :type line: string :param chrom: the chromosome to add coverage data :type chrom: :class:`Chromosome` object. :param strand_type: which kind of wig is parsing: forward, reverse, or mixed strand :type strand_type: string '+' , '-', 'mixed' """ # the line is already striped # in FixedChunk we translate the origin to a 0-based position at the __init__
""" The Variable Chunk objects handle data of 'variableStep' declaration line and it's coverage data
If in data there is negative values this indicate that the coverage match on the reverse strand. the chunk start with the smallest position and end to the higest position whatever on wich strand are these position. This mean that when the chunk will be convert in Coverage, the lacking positions will be filled with 0.0.
for instance:
variableStep chrom=chr3 span=2 10 11 20 22 20 -30 25 -50
will give coverages starting at position 10 and ending at 26 for both strands and with the following coverages values
| for = [11.0, 11.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 22.0, 22.0, 0.0, 0.0, 0.0, 0.0, 0.0] | rev = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.0, 30.0, 0.0, 0.0, 0.0, 50.0, 50.0] """
""" :return: False :rtype: boolean """
""" Parse line of data following a variableStep Declaration. Add the result on the corresponding strand (forward if coverage value is positive, reverse otherwise)
:param line: line of data to parse (the white spaces at the end must be strip) :type line: string :param chrom: the chromosome to add coverage data :type chrom: :class:`Chromosome` object. :param strand_type: which kind of wig is parsing: forward, reverse, or mixed strand :type strand_type: string '+' , '-', 'mixed' :raise ValueError: if strand_type is different than 'mixed', '-', '+' """ # we switch from 1-based positions in wig into 0-based position in chromosome # to have the same behavior as in bam
""" Handle chromosomes. A chromosome as a name and contains :class:`Chunk` objects (forward and reverse) """
"""
:param name: :type name: str :param size: :type size: the default size of the chromosome. Each time we try to set a value greater than the chromosome the chromosome size is doubled. This is to protect the machine against memory swapping if the user provide a wig file with very big chromosomes. """ # 30 is the memory used to allocated new array of shape (2,1) # it was empirically determined
""" :return: the actual length of the chromosome :rtype: int """
"""
:param pos: the postion (0-based) to set value :type pos: int or :class:`slice` object :param value: value to assign :type value: float or iterable of float :raise ValueError: when pos is a slice and value have not the same length of the slice :raise TypeError: when pos is a slice and value is not iterable :raise IndexError: if pos is not in coverage or one bound of slice is out the coverage """ else:
else: else: else:
""" :param pos: a position or a slice (0 based) if pos is a slice the left indice is excluded :return: the coverage at this position or corresponding to this slice. :rtype: a list of 2 list of float [[float,...],[float, ...]] :raise IndexError: if pos is not in coverage or one bound of slice is out the coverage """
""" Extend this chromosome of the size size and fill with fill. :param size: the size (in bp) we want to increase the chromosome. :type size: int :param fill: the default value to fill the chromosome. :type fill: float or nan :raise MemoryError: if the chromosome extension could overcome the free memory. """ # 10 is the memory used to horizontally extend an array with one col and 2 rows fill with 0. # it was empirically determined on linux gentoo plateform with python 3.4.5 and numpy 1.11.2 " {} to {})".format(self.name, h_size))
""" :param col_nb: the number of column of the new array or the extension :type col_nb: int :param mem_per_col: the memory needed to create or extend an array with one col and 2 rows fill with 0.0 :type mem_per_col: int :return: the estimation of free memory available after creating or extending chromosome :rtype: int """
""" A genome is made of chromosomes and some metadata, called infos """
""" :param name: the name of the chromosome to retrieve :type name: string :return: the chromosome corresponding to the name. :rtype: :class:`Chromosome` object. """
else: chrom.__class__.__name__))
""" remove a chromosome from this genome
:param name: the name of the chromosome to remove :type name: string :return: None
""" else:
def chromosomes(self):
""" add a chromosome in to a genome. if a chromosome with the same name already exist the previous one is replaced silently by this one.
:param chrom: a chromosome to ad to this genome :type chrom: :class:`Chromosome` object. :raise: TypeError if chrom is not a :class:`Chromosome` object. """
""" class to parse file in wig format. at the end of parsing it returns a :class:`Genome` object. """
"""
:param mixed_wig: The path of the wig file to parse. The wig file code for the 2 strands:
- The positive coverage values for the forward strand - The negative coverage values for the reverse strand
This parameter is incompatible with for_wig and rev_wig parameter. :type mixed_wig: string :param for_wig: The path of the wig file to parse. The wig file code for forward strand only. This parameter is incompatible with mixed_wig parameter. :type for_wig: string :param rev_wig: The path of the wig file to parse. The wig file code for reverse strand only. This parameter is incompatible with mixed_wig parameter. :type rev_wig: string """
""" Open a wig file and parse it. read wig file line by line check the type of line and call the corresponding method accordingly the type of the line: - comment - track - declaration - data see - https://wiki.nci.nih.gov/display/tcga/wiggle+format+specification - http://genome.ucsc.edu/goldenPath/help/wiggle.html for wig specifications. This parser does not fully follow these specification. When a score is negative, it means that the coverage is on the reverse strand. So some positions can appear twice in one block of declaration (what I call a chunk).
:return: a Genome coverage corresponding to the wig files (mixed strand on one wig or two separate wig) :rtype: :class:`Genome` object """ else: continue else:
"""
:param line: line to parse. :return: True if it's a data line, False otherwise """
""" :param line: line to parse. It must not a comment_line, neither a track line nor a declaration line. :type line: string :type strand_type: string '+' , '-', 'mixed' :raise ValueError: if strand_type is different than 'mixed', '-', '+' """
""" A single line, beginning with one of the identifiers variableStep or fixedStep, followed by attribute/value pairs for instance: ::
fixedStep chrom=chrI start=1 step=10 span=5
:param line: line to parse. :type line: string :return: True if line is a declaration line. False otherwise. :rtype: boolean """
""" Get the corresponding chromosome create one if necessary, and set the current_chunk and current_chromosome.
:param line: line to parse. The method :meth:`is_declaration_line` must return True with this line. """
else:
else:
def is_track_line(line): """ A track line begins with the identifier track and followed by attribute/value pairs for instance: ::
track type=wiggle_0 name="fixedStep" description="fixedStep format" visibility=full autoScale=off
:param line: line to parse. :type line: string :return: True if line is a track line. False otherwise. :rtype: boolean """
""" fill the genome infos with the information found on the track.
:param line: line to parse. The method :meth:`is_track_line` must return True with this line. """ else: else:
def is_comment_line(line): """ :param line: line to parse. :type line: string :return: True if line is a comment line. False otherwise. :rtype: boolean """
|