annotation

The annotation module contains everything that is needed to parse annotation file and handle it.

AnnotationParser

The entry point to parse an annotation file is the craw.annotation.AnnotationParser. An annotation parser have two methods:

  • craw.annotation.AnnotationParser.get_annotations() create a new type of Entry and iterate over the annotation file and for each line return a new instance of the newly craw.annotation.Entry class it just create on the fly.

  • the other more technique give the maximum of nucleotides before and after the reference. It is needed to compute the size of the resulting matrix.

The force of this approach is to generate a new type of entry for each parsing. So it’s very flexible and allow to fit with most of annotation file. But for one file, all the parsing use the same Entry class so it ensure the coherence in data.

new_entry_type

Is a factory which generate a new subclass of craw.annotation.Entry given the fields gather form the annotation file header (first line non starting with #) and the columns semantic given by the user. The first role of this factory is to check if all parameter given by user correspond ot header and do some coherence checking. If everything seems Ok it generate on the fly a new subclass of craw.annotation.Entry.

Entry Class

An Entry correspond to one line of the annotation file.

The Entry convert values if necessary (strand in a internal representation +/-, position in integer …). It also expose a generic api to access some fields whatever the named of the columns.

annotation API reference

class craw.annotation.AnnotationParser(path, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None, sep='t')[source]
Parse the annotation file
  • create new type of Entry according to the header

  • create one Entry object for each line of the file

__init__(path, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None, sep='\t')[source]
Parameters
  • path (string) – the path to the annotation file to parse.

  • ref_col (string) – the name of the column for the reference position

  • chr_col (string) – the name of the column for the chromosome

  • strand_col (string) – the name of the column for the strand

  • start_col (string) – the name of the column for start position

  • stop_col (string) – the name of the column for the stop position

  • sep (string) – The separator tu use to split fields

__weakref__

list of weak references to the object (if defined)

get_annotations()[source]

Parse an annotation file and yield a Entry for each line of the file.

Returns

a generator on a annotation file.

max()[source]
Returns

the maximum of bases to take in count before and after the reference position.

Return type

tuple of 2 int

class craw.annotation.Entry(values)[source]

Handle one entry (One line) of annotation file

__eq__(other)[source]

Return self==value.

__init__(values)[source]
Parameters

values (list of string) – the values parsed from one line of the annotation file

__str__()[source]

Return str(self).

__weakref__

list of weak references to the object (if defined)

_convert(field, value)[source]

Convert field parsed from annotation file in Entry internal value

Parameters
  • field (string) – the field name associated to the value.

  • value (string) – the value to convert

Returns

the converted value

Return type

any

Raise

RuntimeError or value Error if a value cannot be converted

_switch_start_stop()[source]

Switch start and stop value if self.start > self.stop This situation can occur if annotation regards the reverse strand

chromosome

The name of the Chromosome

header

The header of the annotation file

ref

The position of reference

start

The Position to start the coverage computation

stop

The position to end the coverage computation (included)

strand

the strand +/-

class craw.annotation.Idx(col_name, idx)
__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, col_name, idx)

Create new instance of Idx(col_name, idx)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)

Make a new Idx object from a sequence or iterable

_replace(**kwds)

Return a new Idx object replacing specified fields with new values

col_name

Alias for field number 0

idx

Alias for field number 1

craw.annotation.new_entry_type(name, fields, ref_col, strand_col='strand', chr_col='chromosome', start_col=None, stop_col=None)[source]

From the header of the annotation line create a new Entry Class inherited from Entry Class

Parameters
  • name (str) – The name of the new class of entry.

  • fields (list of string) – The fields constituting the new type of entry.

  • ref_col (string) – The name of the column representing the position of reference (default is ‘position’).

  • strand_col (string) – The name of the column representing the strand (default is ‘strand’).

  • chr_col (string) – The name of the column representing the name of chromosome (default is ‘chromosome’).

  • start_col (string) – The name of the column representing the position of the first base to compute the coverage (inclusive).

  • stop_col (string) – The name of the column representing the position of the last base to compute the coverage (inclusive).

Returns

a new class child of Entry which is able to store information corresponding to the header.