annotation¶
The annotation module contains everything that is needed to parse annotation file and handle it.
AnnotationParser¶
The entry point to parse an annotation file is the craw.annotation.AnnotationParser
.
An annotation parser have two methods:
craw.annotation.AnnotationParser.get_annotations()
create a new type of Entry and iterate over the annotation file and for each line return a new instance of the newlycraw.annotation.Entry
class it just create on the fly.the other more technique give the maximum of nucleotides before and after the reference. It is needed to compute the size of the resulting matrix.
The force of this approach is to generate a new type of entry for each parsing. So it’s very flexible and allow to fit with most of annotation file. But for one file, all the parsing use the same Entry class so it ensure the coherence in data.
new_entry_type¶
Is a factory which generate a new subclass of craw.annotation.Entry
given the fields gather form the annotation file header
(first line non starting with #) and the columns semantic given by the user.
The first role of this factory is to check if all parameter given by user correspond ot header and do some coherence checking.
If everything seems Ok it generate on the fly a new subclass of craw.annotation.Entry
.
Entry Class¶
An Entry correspond to one line of the annotation file.
The Entry convert values if necessary (strand in a internal representation +/-, position in integer …). It also expose a generic api to access some fields whatever the named of the columns.
annotation API reference¶
-
class
craw.annotation.
AnnotationParser
(path, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None, sep='t')[source]¶ - Parse the annotation file
create new type of Entry according to the header
create one Entry object for each line of the file
-
__init__
(path, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None, sep='\t')[source]¶ - Parameters
path (string) – the path to the annotation file to parse.
ref_col (string) – the name of the column for the reference position
chr_col (string) – the name of the column for the chromosome
strand_col (string) – the name of the column for the strand
start_col (string) – the name of the column for start position
stop_col (string) – the name of the column for the stop position
sep (string) – The separator tu use to split fields
-
__weakref__
¶ list of weak references to the object (if defined)
-
class
craw.annotation.
Entry
(values)[source]¶ Handle one entry (One line) of annotation file
-
__init__
(values)[source]¶ - Parameters
values (list of string) – the values parsed from one line of the annotation file
-
__weakref__
¶ list of weak references to the object (if defined)
-
_convert
(field, value)[source]¶ Convert field parsed from annotation file in Entry internal value
- Parameters
field (string) – the field name associated to the value.
value (string) – the value to convert
- Returns
the converted value
- Return type
any
- Raise
RuntimeError or value Error if a value cannot be converted
-
_switch_start_stop
()[source]¶ Switch start and stop value if self.start > self.stop This situation can occur if annotation regards the reverse strand
-
chromosome
¶ The name of the Chromosome
-
header
¶ The header of the annotation file
-
ref
¶ The position of reference
-
start
¶ The Position to start the coverage computation
-
stop
¶ The position to end the coverage computation (included)
-
strand
¶ the strand +/-
-
-
class
craw.annotation.
Idx
(col_name, idx)¶ -
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, col_name, idx)¶ Create new instance of Idx(col_name, idx)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)¶ Make a new Idx object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new Idx object replacing specified fields with new values
-
col_name
¶ Alias for field number 0
-
idx
¶ Alias for field number 1
-
-
craw.annotation.
new_entry_type
(name, fields, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None)[source]¶ From the header of the annotation line create a new Entry Class inherited from Entry Class
- Parameters
name (str) – The name of the new class of entry.
fields (list of string) – The fields constituting the new type of entry.
ref_col (string) – The name of the column representing the position of reference (default is ‘position’).
strand_col (string) – The name of the column representing the strand (default is ‘strand’).
chr_col (string) – The name of the column representing the name of chromosome (default is ‘chromosome’).
start_col (string) – The name of the column representing the position of the first base to compute the coverage (inclusive).
stop_col (string) – The name of the column representing the position of the last base to compute the coverage (inclusive).
- Returns
a new class child of
Entry
which is able to store information corresponding to the header.