heatmap¶
craw.heatmap.split_data()
and craw.heatmap.sort()
work on data freshly parsed from coverage file.
That mean that the data contain the metadata (all columns which are not coverage scores
like chromosome, position strand , on so on)
The other functions sort normalization function work on pandas 2D DataFrame or numpy arrays containing only
scores of coverage. That means all metadata was removed (craw.heatmap.remove_metadata()
).
sort¶
The is one public sort function which act as proxy for several private sorting function.
normalisation¶
Several functions to normalize data.
The data can be normalize using min max of the whole data. Or the min max is recalculated for each row.
in both case the formula is
zi = xi - min(x) / max(x) - min(x)
where x=(x1,…,xn) and zi is now your with normalized data. in first case x is the whole matrix in 2nd is the row.
Normalization can be precede by 10 base log transformation.
Note
In this case all 0 values are replace by 1 (10 base log is not define)
drawing heatmap¶
There are 2 way to generates figures, the first one is to generate a figures containing 2 heatmap for sense or antisense
with axis, legend on so on. But in this representation it’s not possible to display a figure with no scaling out/in.
So the information of one pixel is not accessible. This representation is generate by craw.heatmap.draw_heatmap()
and use matplotlib.
The second representation is to produce raw image where one nucleotide (one position for one gene) is represent by one pixel without any scale in/out. In this representation there si not axis legend on so on it’s only a raw image.
heatmap API reference¶
- class
craw.heatmap.
Mark
(pos, data, color_map, color=None)[source]¶A mark is a position and a color tight together. It is used to draw a colored vertical line at the given position on the heatmap
__init__
(pos, data, color_map, color=None)[source]¶
- Parameters
pos (int) – The position where to draw a mark, the position is relative to the reference position (0)
data (
pandas.DataFrame
object) – the coverage matrixcolor_map (class`:matplotlib.pyplot.ColorMap` object) – the color map used to draw the heatmap
color – the color of the line, the supported formats are - hexadecimal values as #rgb or #rrggbb, for instance #ff0000 is pure red. - common html color names
__weakref__
¶list of weak references to the object (if defined)
_color_converter
(color, data)[source]¶
- Parameters
color (string) – the color of the line, the supported formats are - hexadecimal values as #rgb or #rrggbb, for instance #ff0000 is pure red. - common html color names
data (
pandas.DataFrame
object) – the matrix coverage- Returns
rgb color
- Return type
tuple with 3 int between 0 and 255
craw.heatmap.
_sort_by_gene_size
(data, start_col=None, stop_col=None, ascending=True)[source]¶Sort the matrix in function of the gene size.
- Parameters
data (
pandas.DataFrame
.) – the data to sort.start_col (string.) – the name of the column representing the beginning of the gene.
stop_col (string) – the name of the column representing the end of the gene.
- Returns
sorted data.
- Return type
a
pandas.DataFrame
object.
craw.heatmap.
_sort_using_col
(data, col=None, ascending=True)[source]¶Sort the matrix in function of the column col
- Parameters
data (
pandas.DataFrame
.) – the data to sort.col (string.) – the name of the column to use for sorting the data.
- Returns
sorted data.
- Return type
a
pandas.DataFrame
object.
craw.heatmap.
_sort_using_file
(data, file=None)[source]¶Sort the matrix in function of file. The file must have the following structure the first line must be the name of the column the following lines must be the values, one per line each line starting by ‘#’ will be ignore.
- Parameters
data (
pandas.DataFrame
.) – the data to sort.file (a file like object.) – The file to use as guide to sort the data.
- Returns
sorted data.
- Return type
a
pandas.DataFrame
object.
craw.heatmap.
crop_matrix
(data, start_col, stop_col)[source]¶Crop matrix (remove columns). The resulting matrix will be [start_col, stop_col]
- Parameters
data (a 2D
pandas.DataFrame
object.) – the data to sort.start_col (string.) – The name of the first column to keep.
stop_col (string.) – The name of the last column to keep.
- Returns
sorted data.
- Return type
a 2D
pandas.DataFrame
object or None if data is None.
craw.heatmap.
draw_heatmap
(sense, antisense, color_map=<matplotlib.colors.LinearSegmentedColormap object>, title='', sense_on='top', size=None, marks=None)[source]¶Create a figure with subplot to represent the data as heat map.
- Parameters
sense (a
pandas.DataFrame
object.) – the data normalized (xi in [0,1]) representing coverage on sense.antisense – the data normalized (xi in [0,1]) representing coverage on anti sense.
color_map (a
matplotlib.pyplot.cm
object.) – the color map to use to represent the data.title (string.) – the figure title (by default the same as the coverage file).
sense_on (string.) – specify the lay out. Where to place the heat map representing the sense data. the available values are: ‘left’, ‘right’, ‘top’, ‘bottom’ (default = ‘top’).
size (tuple of 2 float.) – the size of the figure in inches (wide, height).
marks (list of
Mark
object) – list of vertical marks- Returns
The figure.
- Return type
a
matplotlib.pyplot.Figure
object.
craw.heatmap.
draw_one_matrix
(mat, ax, cmap=<matplotlib.colors.LinearSegmentedColormap object>, y_label=None, marks=None)[source]¶Draw a matrix using matplotlib imshow object
- Parameters
mat (a
pandas.DataFrame
object.) – the data to represent graphically.ax (a
matplotlib.axis
object) – the axis where to represent the datacmap (a
matplotlib.pyplot.cm
object.) – the color map to use to represent the data.y_label (string) – the label for the data draw on y-axis.
marks (list of
Mark
object) – list of vertical marks- Returns
the mtp image corresponding to data
- Return type
a
matplotlib.image
object.
craw.heatmap.
draw_raw_image
(data, out_name, color_map=<matplotlib.colors.LinearSegmentedColormap object>, format='PNG', marks=None)[source]¶Generate an image file with one pixel for each values of the data matrix. the data can be either the coverage on sense or on antisense.
- Parameters
data (2D
pandas.DataFrame
ornumpy.array
object) – a Normalized (where all values are between 0 and 1) matrix.out_name (string) – The name of the generated graphic file.
color_map –
format (string) – the format of the result png, jpeg, … (see pillow supported formats)
marks (a sequence (list, tuple or set) of
Mark
objects) – the marks (vertical rule) to draw on the resulting heat map- Raise
RuntimeError if data are not normalized.
craw.heatmap.
get_data
(coverage_file)[source]¶
- Parameters
coverage_file (str) – the path of the coverage file to parse.
- Returns
the data as 2 dimension dataframe
- Return type
a
pandas.DataFrame
object
craw.heatmap.
lin_norm
(data)[source]¶Normalize data with linear algorithm. The formula applied to obtain the results is:
zi = xi - min(x) / max(x) - min(x)
where x=(x1,…,xn) and zi is now your with normalized data. Ensure that the resulting values are comprise between 0 and 1. return None if data is None, return empty
pd.DataFrame
object if data is empty.
- Parameters
data (a 2D
pandas.DataFrame
object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).- Returns
a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)
- Return type
a 2D
pandas.DataFrame
object or None if data is None.
craw.heatmap.
lin_norm_row_by_row
(data)[source]¶Normalize data with linear algorithm but instead to normalize all the matrix, the normalization formula (see
normalize()
) is applied row by row. It ensure that all values are between 0 and 1.
- Parameters
data (a 2D
pandas.DataFrame
object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).- Returns
a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)
- Return type
a 2D
pandas.DataFrame
object or None if data is None.
craw.heatmap.
log_norm
(data)[source]¶The base 10 logarithm is compute for all values before a normalization (see
normalize()
) to ensure that all values are comprise between 0 and 1 .Note
coverage scores are integers >= 0. log10(0) = -inf or warning in macos prior to normalize data the 0 values are replace by 1.
- Parameters
data (a 2D
pandas.DataFrame
object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).- Returns
a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)
- Return type
a 2D
pandas.DataFrame
object or None if data is None.
craw.heatmap.
log_norm_row_by_row
(data)[source]¶as
normalize_row_by_row()
but prior normalisation a 10 base logarithm is applied.Note
coverage scores are integers >= 0. log10(0) = -inf to normalize data the -inf value are change in 0.
- Parameters
data (a 2D
pandas.DataFrame
object.) – the data to normalize, this 2D matrix must contains only coverage scores (no more metadata).- Returns
a normalize matrix, where 0 <= zi <=1 where z=(z1, …, zn)
- Return type
a 2D
pandas.DataFrame
object or None if data is None.
craw.heatmap.
remove_metadata
(data)[source]¶Remove all information which is not coverage value (as chromosome, strand, name, …)
- Parameters
data (
pandas.DataFrame
.) – the data coming from a coverage file parsing containing coverage information and metadata chromosome, gene name , …- Returns
sorted data.
- Return type
a 2D
pandas.DataFrame
object or None if data is None.
craw.heatmap.
sort
(data, criteria, **kwargs)[source]¶Sort the matrix in function of criteria. This function act as proxy for several specific sorting functions
- Parameters
data (
pandas.DataFrame
.) – the data to sort.criteria (string.) – which criteria to use to sort the data (by_gene_size, using_col, using_file).
kwargs – depending of the criteria - start_col, stop_col for sort_by_gene_size - col for using_col - file for using file
- Returns
sorted data.
- Return type
a
pandas.DataFrame
object.
craw.heatmap.
split_data
(data)[source]¶Split the matrix in 2 matrices one for sense the other for antisense.
- Parameters
data (a 2 dimension
pandas.DataFrame
object) – the coverage data to split- Returns
two matrix
- Return type
tuple of two
pandas.DataFrame
object (sense pandas.DataFrame, antisense pandas.DataFrame)