annotation¶

The annotation module contains everything that is needed to parse annotation file and handle it.

AnnotationParser¶

The entry point to parse an annotation file is the craw.annotation.AnnotationParser. An annotation parser have two methods:

craw.annotation.AnnotationParser.get_annotations() create a new type of Entry and iterate over the annotation file and for each line return a new instance of the newly craw.annotation.Entry class it just create on the fly.

the other more technique give the maximum of nucleotides before and after the reference. It is needed to compute the size of the resulting matrix.

The force of this approach is to generate a new type of entry for each parsing. So it’s very flexible and allow to fit with most of annotation file. But for one file, all the parsing use the same Entry class so it ensure the coherence in data.

new_entry_type¶

Is a factory which generate a new subclass of craw.annotation.Entry given the fields gather form the annotation file header (first line non starting with #) and the columns semantic given by the user. The first role of this factory is to check if all parameter given by user correspond ot header and do some coherence checking. If everything seems Ok it generate on the fly a new subclass of craw.annotation.Entry.

Entry Class¶

An Entry correspond to one line of the annotation file.

The Entry convert values if necessary (strand in a internal representation +/-, position in integer …). It also expose a generic api to access some fields whatever the named of the columns.

annotation API reference¶

class craw.annotation.AnnotationParser(path, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None, sep='t')[source]¶

Parse the annotation file

create new type of Entry according to the header
create one Entry object for each line of the file

__init__(path, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None, sep='\t')[source]¶

Parameters

path (string) – the path to the annotation file to parse.
ref_col (string) – the name of the column for the reference position
chr_col (string) – the name of the column for the chromosome
strand_col (string) – the name of the column for the strand
start_col (string) – the name of the column for start position
stop_col (string) – the name of the column for the stop position
sep (string) – The separator tu use to split fields

__weakref__¶: list of weak references to the object (if defined)

get_annotations()[source]¶

Parse an annotation file and yield a Entry for each line of the file.

Returns: a generator on a annotation file.

max()[source]¶

Returns: the maximum of bases to take in count before and after the reference position.
Return type: tuple of 2 int

class craw.annotation.Entry(values)[source]¶

Handle one entry (One line) of annotation file

__eq__(other)[source]¶: Return self==value.

__init__(values)[source]¶

Parameters: values (list of string) – the values parsed from one line of the annotation file

__str__()[source]¶: Return str(self).

__weakref__¶: list of weak references to the object (if defined)

_convert(field, value)[source]¶

Convert field parsed from annotation file in Entry internal value

Parameters

field (string) – the field name associated to the value.
value (string) – the value to convert

Returns

the converted value

Return type

any

Raise

RuntimeError or value Error if a value cannot be converted

_switch_start_stop()[source]¶: Switch start and stop value if self.start > self.stop This situation can occur if annotation regards the reverse strand

chromosome¶: The name of the Chromosome

header¶: The header of the annotation file

ref¶: The position of reference

start¶: The Position to start the coverage computation

stop¶: The position to end the coverage computation (included)

strand¶: the strand +/-

class craw.annotation.Idx(col_name, idx)¶

__getnewargs__()¶: Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, col_name, idx)¶: Create new instance of Idx(col_name, idx)

__repr__()¶: Return a nicely formatted representation string

_asdict()¶: Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)¶: Make a new Idx object from a sequence or iterable

_replace(**kwds)¶: Return a new Idx object replacing specified fields with new values

col_name¶: Alias for field number 0

idx¶: Alias for field number 1

craw.annotation.new_entry_type(name, fields, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None)[source]¶

From the header of the annotation line create a new Entry Class inherited from Entry Class

Parameters

name (str) – The name of the new class of entry.
fields (list of string) – The fields constituting the new type of entry.
ref_col (string) – The name of the column representing the position of reference (default is ‘position’).
strand_col (string) – The name of the column representing the strand (default is ‘strand’).
chr_col (string) – The name of the column representing the name of chromosome (default is ‘chromosome’).
start_col (string) – The name of the column representing the position of the first base to compute the coverage (inclusive).
stop_col (string) – The name of the column representing the position of the last base to compute the coverage (inclusive).

Returns

a new class child of Entry which is able to store information corresponding to the header.

annotation¶

AnnotationParser¶

new_entry_type¶

Entry Class¶

annotation API reference¶

Table of Contents

Previous topic

Next topic

This Page