In the past I hat to compare time series or other data across groups and for example find spots of strong differences between the groups. For this is was very useful to visualize your data before doing detailed statistics. Surprisingly in the famous python libraries matplotlib, pandas and seaborn I was not able to find any easy function that does exactly what I wanted: Take a pandas.DataFrame as an input plus some predefined group labeling and simply plot me the means/medians of that groups/conditions across columns plus some errors. Of course any of the mentioned libraries can plot points and error-bars or even nice area plots like matplotlib’s fill_between() function but I found those to be not so handy if you frequently do such stuff. The method that gets closest to what I was looking for was the seaborn.tsplot() but this is primarily written for time series and has reduced functionality for other sorts of data. My solution today was to simply wrap the existing functionality into easier interface function. Hope that makes your life easier as well!
Here I share my code with you including an example of how to apply.
import matplotlib import matplotlib.pyplot as plt import numpy as np from pylab import find def group_boxplot(df, conditions, colors=None, ax=None, alpha=0): """ :param df: pandas.DataFrame object Data to plot :param conditions: array-like List of group assignments, same length as df :param colors: optional Matplotlib colors for the individual groups :param ax: PyPlot axis, optional Predefined axis on which to display the plot :return: figure, axes Published on querdanker.de by Johannes Nagele """ conditions = np.array(conditions) if ax is None: ax = plt.subplot(111) if colors is None: colors = [cname for cname, chex in matplotlib.colors.cnames.iteritems() if cname is not 'white'] handles = [] for i, cond in enumerate(np.unique(conditions)): _, bplot = df.iloc[find(conditions == cond)].plot.box(return_type='both', color=dict(boxes=colors[i], whiskers=colors[i], medians=colors[i], caps=colors[i]), ax=ax, patch_artist=True) for bbox in bplot['boxes']: bbox.set_facecolor(colors[i]) bbox.set_alpha(alpha) for bflier in bplot['fliers']: bflier.set_markeredgecolor(colors[i]) handles.append(bplot['boxes'][0]) plt.legend(handles, np.unique(conditions)) return plt.gcf(), ax def group_line_plot(df, conditions, center='mean', err='std', colors=None, ax=None, alpha=.5): """ :param df: pandas.DataFrame object Data to plot :param conditions: array-like List of group assignments, same length as df :param err: String 'std' (Standard deviation) or 'sem' (Standard error of the mean) - This controls the way how the errors for each x-value are computed :param ax: PyPlot axis, optional Predefined axis on which to display the plot :return: figure, axes Published on querdanker.de by Johannes Nagele """ conditions = np.array(conditions) if ax is None: ax = plt.subplot(111) if colors is None: colors = [cname for cname, chex in matplotlib.colors.cnames.iteritems() if cname is not 'white'] for i, cond in enumerate(np.unique(conditions)): exec ('df.iloc[find(conditions == cond)].%s(0).plot(lw=2, color=colors[i], label=cond, ax=ax)' % center) exec ('e = df.iloc[find(conditions==cond)].%s(0)' % err) ax.fill_between(np.float_(df.columns), df.iloc[find(conditions == cond)].mean(0) - e, df.iloc[find(conditions == cond)].mean(0) + e, alpha=alpha, color=colors[i]) plt.legend() return plt.gcf(), ax
So, how to use? You might already guess by reading the code, it is quite straight forward.
The data I was using for the following example looks like this (a pandas DataFrame):
Note that the column labels of the DataFrame will be used as xticks.
import pandas # Specify the input data: df = pandas.DataFrame(data=...) # In this example the group assignments are stored within the DataFrame object in column df.Type as shown above. # The following line calls our newly defined function to create a beautiful box-plot: fig, ax = group_boxplot(df._get_numeric_data(),df.Type, colors=None, alpha=.2) # In case you want to modify the plot, i.e. add some labels you can access the figure object as well as the axes easily: ax.set_xlabel('Time') ax.set_ylabel('Score') # And the same for the line plot: fig, ax = group_line_plot(df._get_numeric_data(),df.Type, colors=None, alpha=.2) ax.set_xlabel('Time') ax.set_ylabel('Score')
Sure enough there are plenty things to add but as a quick solution this might help. Enjoy!
Questions? Just write me a mail or leave a comment, I will try to answer asap.