heatmap

craw.heatmap.split_data() and craw.heatmap.sort() work on data freshly parsed from coverage file. That mean that the data contain the metadata (all columns which are not coverage scores like chromosome, position strand , on so on)

The other functions sort normalization function work on pandas 2D DataFrame or numpy arrays containing only scores of coverage. That means all metadata was removed (craw.heatmap.remove_metadata()).

sort

The is one public sort function which act as proxy for several private sorting function.

normalisation

Several functions to normalize data.

The data can be normalize using min max of the whole data. Or the min max is recalculated for each row.

in both case the formula is

zi = xi - min(x) / max(x) - min(x)

where x=(x1,…,xn) and zi is now your with normalized data. in first case x is the whole matrix in 2nd is the row.

Normalization can be precede by 10 base log transformation.

Note

In this case all 0 values are replace by 1 (10 base log is not define)

drawing heatmap

There are 2 way to generates figures, the first one is to generate a figures containing 2 heatmap for sense or antisense with axis, legend on so on. But in this representation it’s not possible to display a figure with no scaling out/in. So the information of one pixel is not accessible. This representation is generate by craw.heatmap.draw_heatmap() and use matplotlib.

The second representation is to produce raw image where one nucleotide (one position for one gene) is represent by one pixel without any scale in/out. In this representation there si not axis legend on so on it’s only a raw image.

heatmap API reference

class craw.heatmap.Mark(pos, data, color_map, color=None)[source]

A mark is a position and a color tight together. It is used to draw a colored vertical line at the given position on the heatmap

__init__(pos, data, color_map, color=None)[source]
Parameters
  • pos (int) – The position where to draw a mark, the position is relative to the reference position (0)

  • data (pandas.DataFrame object) – the coverage matrix

  • color_map (class`:matplotlib.pyplot.ColorMap` object) – the color map used to draw the heatmap

  • color – the color of the line, the supported formats are - hexadecimal values as #rgb or #rrggbb, for instance #ff0000 is pure red. - common html color names

__weakref__

list of weak references to the object (if defined)

_color_converter(color, data)[source]
Parameters
  • color (string) – the color of the line, the supported formats are - hexadecimal values as #rgb or #rrggbb, for instance #ff0000 is pure red. - common html color names

  • data (pandas.DataFrame object) – the matrix coverage

Returns

rgb color

Return type

tuple with 3 int between 0 and 255

_get_matrix_bound(data)[source]
Parameters

data (pandas.DataFrame object) – the matrix coverage

Returns

the most right and left position of the coverage

Return type

tuple of 2 int

to_px()[source]

tanslate the position of the mark relative to the reference in pixel. :return: the position of the mark in pixel. :rtype: positive int

craw.heatmap._sort_by_gene_size(data, start_col=None, stop_col=None, ascending=True)[source]

Sort the matrix in function of the gene size.

Parameters
  • data (pandas.DataFrame.) – the data to sort.

  • start_col (string.) – the name of the column representing the beginning of the gene.

  • stop_col (string) – the name of the column representing the end of the gene.

Returns

sorted data.

Return type

a pandas.DataFrame object.

craw.heatmap._sort_using_col(data, col=None, ascending=True)[source]

Sort the matrix in function of the column col

Parameters
  • data (pandas.DataFrame.) – the data to sort.

  • col (string.) – the name of the column to use for sorting the data.

Returns

sorted data.

Return type

a pandas.DataFrame object.

craw.heatmap._sort_using_file(data, file=None)[source]

Sort the matrix in function of file. The file must have the following structure the first line must be the name of the column the following lines must be the values, one per line each line starting by ‘#’ will be ignore.

Parameters
  • data (pandas.DataFrame.) – the data to sort.

  • file (a file like object.) – The file to use as guide to sort the data.

Returns

sorted data.

Return type

a pandas.DataFrame object.

craw.heatmap.crop_matrix(data, start_col, stop_col)[source]

Crop matrix (remove columns). The resulting matrix will be [start_col, stop_col]

Parameters
  • data (a 2D pandas.DataFrame object.) – the data to sort.

  • start_col (string.) – The name of the first column to keep.

  • stop_col (string.) – The name of the last column to keep.

Returns

sorted data.

Return type

a 2D pandas.DataFrame object or None if data is None.

craw.heatmap.draw_heatmap(sense, antisense, color_map=<matplotlib.colors.LinearSegmentedColormap object>, title='', sense_on='top', size=None, marks=None)[source]

Create a figure with subplot to represent the data as heat map.

Parameters
  • sense (a pandas.DataFrame object.) – the data normalized (xi in [0,1]) representing coverage on sense.

  • antisense – the data normalized (xi in [0,1]) representing coverage on anti sense.

  • color_map (a matplotlib.pyplot.cm object.) – the color map to use to represent the data.

  • title (string.) – the figure title (by default the same as the coverage file).

  • sense_on (string.) – specify the lay out. Where to place the heat map representing the sense data. the available values are: ‘left’, ‘right’, ‘top’, ‘bottom’ (default = ‘top’).

  • size (tuple of 2 float.) – the size of the figure in inches (wide, height).

  • marks (list of Mark object) – list of vertical marks

Returns

The figure.

Return type

a matplotlib.pyplot.Figure object.

craw.heatmap.draw_one_matrix(mat, ax, cmap=<matplotlib.colors.LinearSegmentedColormap object>, y_label=None, marks=None)[source]

Draw a matrix using matplotlib imshow object

Parameters
  • mat (a pandas.DataFrame object.) – the data to represent graphically.

  • ax (a matplotlib.axis object) – the axis where to represent the data

  • cmap (a matplotlib.pyplot.cm object.) – the color map to use to represent the data.

  • y_label (string) – the label for the data draw on y-axis.

  • marks (list of Mark object) – list of vertical marks

Returns

the mtp image corresponding to data

Return type

a matplotlib.image object.

craw.heatmap.draw_raw_image(data, out_name, color_map=<matplotlib.colors.LinearSegmentedColormap object>, format='PNG', marks=None)[source]

Generate an image file with one pixel for each values of the data matrix. the data can be either the coverage on sense or on antisense.

Parameters
  • data (2D pandas.DataFrame or numpy.array object) – a Normalized (where all values are between 0 and 1) matrix.

  • out_name (string) – The name of the generated graphic file.

  • color_map

  • format (string) – the format of the result png, jpeg, … (see pillow supported formats)

  • marks (a sequence (list, tuple or set) of Mark objects) – the marks (vertical rule) to draw on the resulting heat map

Raise

RuntimeError if data are not normalized.

craw.heatmap.get_data(coverage_file)[source]
Parameters

coverage_file (str) – the path of the coverage file to parse.

Returns

the data as 2 dimension dataframe

Return type

a pandas.DataFrame object

craw.heatmap.lin_norm(data)[source]

Normalize data with linear algorithm. The formula applied to obtain the results is:

zi = xi - min(x) / max(x) - min(x)

where x=(x1,…,xn) and zi is now your with normalized data. Ensure that the resulting values are comprise between 0 and 1. return None if data is None, return empty pd.DataFrame object if data is empty.

Parameters

data (a 2D pandas.DataFrame object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).

Returns

a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)

Return type

a 2D pandas.DataFrame object or None if data is None.

craw.heatmap.lin_norm_row_by_row(data)[source]

Normalize data with linear algorithm but instead to normalize all the matrix, the normalization formula (see normalize()) is applied row by row. It ensure that all values are between 0 and 1.

Parameters

data (a 2D pandas.DataFrame object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).

Returns

a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)

Return type

a 2D pandas.DataFrame object or None if data is None.

craw.heatmap.log_norm(data)[source]

The base 10 logarithm is compute for all values before a normalization (see normalize() ) to ensure that all values are comprise between 0 and 1 .

Note

coverage scores are integers >= 0. log10(0) = -inf or warning in macos prior to normalize data the 0 values are replace by 1.

Parameters

data (a 2D pandas.DataFrame object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).

Returns

a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)

Return type

a 2D pandas.DataFrame object or None if data is None.

craw.heatmap.log_norm_row_by_row(data)[source]

as normalize_row_by_row() but prior normalisation a 10 base logarithm is applied.

Note

coverage scores are integers >= 0. log10(0) = -inf to normalize data the -inf value are change in 0.

Parameters

data (a 2D pandas.DataFrame object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).

Returns

a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)

Return type

a 2D pandas.DataFrame object or None if data is None.

craw.heatmap.remove_metadata(data)[source]

Remove all information which is not coverage value (as chromosome, strand, name, …)

Parameters

data (pandas.DataFrame.) – the data coming from a coverage file parsing containing coverage information and metadata chromosome, gene name , …

Returns

sorted data.

Return type

a 2D pandas.DataFrame object or None if data is None.

craw.heatmap.sort(data, criteria, **kwargs)[source]

Sort the matrix in function of criteria. This function act as proxy for several specific sorting functions

Parameters
  • data (pandas.DataFrame.) – the data to sort.

  • criteria (string.) – which criteria to use to sort the data (by_gene_size, using_col, using_file).

  • kwargs – depending of the criteria - start_col, stop_col for sort_by_gene_size - col for using_col - file for using file

Returns

sorted data.

Return type

a pandas.DataFrame object.

craw.heatmap.split_data(data)[source]

Split the matrix in 2 matrices one for sense the other for antisense.

Parameters

data (a 2 dimension pandas.DataFrame object) – the coverage data to split

Returns

two matrix

Return type

tuple of two pandas.DataFrame object (sense pandas.DataFrame, antisense pandas.DataFrame)