Descriptive Statistics - Some More Visualization Tools
07 Aug 2008 Rob Slazas 7 comments 328 views
This is the third of three posts in Descriptive Statistics. Click here to see the full list of statistics posts.
Since we finished up descriptive statistics last time, it might be useful to briefly mention a few ways of visualizing datasets. If you’re like me, you don’t get much inspiration from a list of numbers on the page, but seeing them plotted really helps to tell their story. Here are a few examples that I use frequently. If you use others, please share them in the comments below so the rest of us can try them out.
Contents
Boxplot
We already dissected the boxplot and what its parts mean back in the first post on descriptive stats, but I still think they’re cool. Of note here is how you can throw boxplot a matrix, and it gives each column its own box. (download the .mat file here)
load fudge.mat; h1 = figure('Position',[100 100 600 400],'Color','w'); boxplot(fudgedata,'notch','on');

Scatterhist
There is good ‘ole hist like we used to look at shape, and then there is scatterhist for bivariate data (points in 2D space). Scatterhist gives a traditional scatter plot but adds bonus histograms on each axis independently. This is nice for observing the separate behaviors on each dimension.
Here you can see one of Quan’s archery range targets (he is a man of many talents). The up/down accuracy looks OK, but he seems to lean a tiny-bit to the left, no? (download the .mat file here)
load QuanArchery.mat; h2 = figure('Position',[100 100 600 400],'Color','w'); scatterhist(archery_x,archery_y); axis off;

Cdfplot
And even though distribution plots lie on the border between descriptive and inferential statistics, they are useful to look at even when not making an inference. In particular, I like the empirical / cumulative distribution plot cdfplot since it does not compare the given dataset to an assumed distribution (the other cdf’s do). What it does show, however, is the percentage of the dataset (vertical axis) that lies below each point in the dataset - think about that for a second while looking at the breath hold data in this new way. (download the .mat file here)
load RobPracticeHolds.mat; h3 = figure('Position',[100 100 600 400],'Color','w'); cdfplot(breathholds);

Hist & Boxplot Combo
OK, so I made this one up - because in MATLAB we can do that. This is a combo I frequently use, especially as a first look at Monte Carlo simulation output. It exercises subplot to arrange the plots, one above the other. I encourage you to play with the graphic output functions to find an arrangement that helps you visualize data. See how tight the iqr is within the center bar of the histogram? (download the .mat file here)
load MCsimdata.mat; h4 = figure('Color','w'); h5 = subplot(4,1,1:3); % partition the figure over/under, 3/4 and 1/4 hist(s); title('MC sim - skewness of t(3)'); h6 = subplot(4,1,4); % give the boxplot the bottom 1/4 boxplot(s,'orientation','horizontal'); % lay the boxplot down horizontally linkaxes([h5 h6],'x'); % make the x-axes line up

Wrapping up
So that’s it for our brief review of visualizations commonly related to descriptive statistics. From here on we’ll discuss inferential statistics. As usual, questions and comments are welcome below. And again, if you’ve landed on a good ad-lib visullization like the hist - boxplot combo, share the wealth!
Hint: when you post MATLAB code in your comments here at Blinkdagger, remember to spruce it up with the <pre lang="MATLAB"> and </pre> tags as shown below. It really helps make the code readable and neat.
7 Responses to “Descriptive Statistics - Some More Visualization Tools”
Leave a Reply
Include MATLAB code in your comment by doing the following:
<pre lang="MATLAB">
%insert code here
</pre>


Nice post.
ang pangit nio
hehehe
very cool
Talking of combining MATLAB’s statistical functions/plotting routines, I was wondering how to go about plotting a scatter and then overlaying boxplots eg to split each axis into 10 and give each a boxplot to illustrate the mean and spread…
@ Michael,
Sounds interesting. Do you mean like using scatterhist, except you want to see boxplots on the outer edges instead of histogram bars? Help me visualize exactly what you’re thinking about. I’m a big fan of combining plots to get a better visual representation of data - graphical analysis is a pretty cool area.
Rob