Intro to Hypothesis Testing with MATLAB
24 Sep 2008 Rob Slazas 15 comments 2,876 views
This is the first post on Hypothesis Testing. Click here to see the full list of statistics posts.
Hypothesis testing is one of the fundamental tools of experimentation. It is also sometimes called difference testing, because the question underlying most experiments is whether (or not) there is a difference between the behavior of two groups. In this post we’ll get familiar with the moving parts of a hypothesis test, and survey the tests that MATLAB and the Statistics Toolbox have to offer.
Contents
- Contemplating Quan
- The Basic Setup of A Hypothesis Test
- Picking the Right Test
- Am I Really Better Than Quan In Long Jumping?
- Wrapping Up
Contemplating Quan

So I was thinking about challenging Quan to another contest, but this time in the standing long jump. I’ve heard he is good, but I’m much better at long jumping than holding my breath. I have some inside information from Dan that Quan averages 180cm per jump and that his jumps are normally distributed. I took some practice jumps and found the following: my jumps are normally distributed too; and they look like this (download the data file here).
load robpracticejumps.mat; h1 = figure('Position',[100 100 300 300],'Color','w'); h2 = scatter(ones(numel(robjumps),1),robjumps); hold all; plot([0 2],[180 180],'g-','linewidth',2); set(gca,'ylim',[140 220],'xtick',[]); grid on; title('Rob''s jumps vs. Quan''s mean','fontweight','bold'); xlabel('Jumps'); ylabel('distance (cm)');

Looking at this data, can I conclude that I’m better than Quan? Well, a simple analysis might be to compare the two means. In this case it looks like my mean is higher. But, by reaching the conclusion based upon the means alone I wouldn’t know how confident I can be about it. How much higher than Quan’s mean should mine be to conclude that it really is higher, and not due to random chance in our sample jumps? So, to build in some scientific rigor, I need to use another method. Yes, you guessed right. It’s time for a hypothesis test.
The Basic Setup of A Hypothesis Test
When you want to test your hypothesis that there is (or isn’t) a difference between the behavior of 2 groups, a standardized method to do that has already been devised. Here are the players:
| H0 = the null hypothesis | H0 is the assumed conclusion unless the test can prove otherwise. For most tests, H0 is that the two groups are equal. If you prove that they are not equal, then you get to “reject the null hypothesis”. In our example, accepting H0 means that my average jump is not statistically different than Quan’s. |
| H1 = the alternative hypothesis | H1 is what you conclude to be true IF you can reject H0. For most tests, H1 is that the two groups are not equal. H1 can also be made more specific, such as group1 > group2 or group1 < group2. In our example, rejecting H0 and accepting H1 means that my average jump IS statistically different than Quan's. |
| alpha = the risk of incorrectly rejecting H0 | Before performing the test, you need to decide how much risk of being wrong you can accept. alpha describes that risk and ranges from 0 to 1. The closer to 0 alpha is, the less chance there is of rejecting H0 when you should not reject it. The trade-off is that lower values of alpha require more samples to test, so they are more expensive experiments. The default for most tests is 0.05, or 5% chance of this error. |
| Side note on risks | There is a second, complimentary risk described by beta. It is taken care of when determining sample size, not a direct input into the hypothesis test. We’ll cover alpha, beta, and sample size more deeply in a later post. |
| The assumptions | Many hypothesis tests make assumptions about the groups being tested, such as what their underlying distribution is. This is both valuable and limiting. Valuable since the test can be more powerful with less samples if it leverages what is already known about a distribution. Limiting since it only applies to groups from the assumed distribution. Be careful not to apply tests to groups of data that don’t fit the assumptions! |
Picking the Right Test
MATLAB offers many hypothesis tests (some rather obscure), each designed to answer a different kind of question. Here is a guide to the most commonly used tests so we can select the proper one for our question above: On average, am I better at the standing long jump than Quan?
| Answers this question | Assumptions | Hypothesis Test |
| Do the values in this data group appear in random order? | none | runstest |
| Does this data group come from a Standard Normal distribution? | distribution parameters are known (not estimated) | kstest- Kolmogorov-Smirnov normality test |
| Does this data group come from a Normal distribution? | none | lillietest- Lilliefors normality test |
| Is the mean of this data different than a given mean? | data is Normally distributed, variance of data is unknown | ttest - one sample t-test |
| Does the mean of this data differ before/after a treatment? | data is Normally distributed, variance of data is unknown | ttest- paired t-test |
| Are the means of these two groups of data different? | data is Normally distributed, variance of data is unknown | ttest2 - two sample t-test |
| Is the variance of this data different than a given variance? | data is Normally distributed | vartest - one sample chi-square test |
| Are the variances of these two groups of data different? | data is Normally distributed | vartest2- two sample F-test |
One notable omission is the Anderson-Darling normality test. It is a commonly used test to see if a sample comes from a normally distributed population. File exchange to the rescue! Credit goes to Antonio Trujillo-Ortiz for supplying this important function.
And for completeness, since I only listed the most common tests above, here is a list of all the hypothesis tests in the Statistics Toolbox. You’ll find it is pretty extensive.
Am I Really Better Than Quan In Long Jumping?
Now that we have reviewed the basics of hypothesis tests, which test did you select to answer our question? If you picked the one sample t-test, nice job! We want to compare the mean of my jump data to the given mean of Quan’s jumps.
To call the ttest function for a one sample t-test, we need our data (got it), the given mean (got it), and the alpha risk we can tolerate. For this test we will accept 10% risk, more than the default of 5%, so alpha will be 0.10 in the input arguments.
[h,p,ci] = ttest(robjumps,180,0.10)
h =
1
p =
0.0692
ci =
180.3339
186.3661
In the output, h = 1 means that we accept the alternative hypothesis H1 (rejecting the null H0). Optionally, little p and the confidence interval around my mean are displayed. In this case you can say with (1 - alpha = 90%) confidence that my average long jump is between 180.3cm and 186.4cm. So, on average, I am slightly better than Quan at the standing long jump! WOO-HOO!
Wrapping up
This overview of Hypothesis Testing should get you started. If you are using a test for the first time, please take a quick look at your stats book or the MATLAB help to check the assumptions and usage. If there are some specifics of the tests you would like to see covered in a future post, please let us know in the comments below.
15 Responses to “Intro to Hypothesis Testing with MATLAB”
Leave a Reply
Include MATLAB code in your comment by doing the following:
<pre lang="MATLAB">
%insert code here
</pre>


There is a typo in your link to the data file.
Antasi, nice catch. Quan beat me to the fix!
[...] before because they write great introductory tutorials on how to use MATLAB in various fields. Their latest one, written by Rob Slazas, takes a look at how you can use some of the functions in the MATLAB [...]
[...] the central tendencies between groups WITHOUT using any hypothesis test functions! I used a t-test in the last post to show that my mean long-jump was farther than Quan’s mean long-jump. Here we will get the [...]
Dear Rob,
I calculated the mean of the data set and it seems to be 183.35, why did you choose 180 as the mean in the ttest?
Hi Munir,
You are correct, the mean of the data set is about 183.35, but that is the mean of *MY* jump data. The input arguments for the one-sample t-test are: my jump data (with mean of 183.35); Quan’s given mean of 180; and our alpha of 0.10. So, the reason I put 180 into the ttest is to have the test compare my dataset to the value of 180. Make sense?
Best Regards,
Rob
Hi Rob,
I have 20 people who labelled 11 textures. I would like to:
A: check if each person fits the overall distribution for all people
B: check each label that it fits the textures distribution
So I used ttests (which i’m thinking after reading this is the wrong test):
A: tested each persons dist. against the mean for each texture
B: tested the dist. of each texture against the mean for that texture
Are these the right ways to use a ttest on this data?
And are there other tests i should look at?
Thanks for the imformative article!
James
Hi James,
I love questions about experiments, so I ‘d be glad to help. I’m not quite visualizing what you had the 20 people do, so please answer a few questions for me and then send me some data through the contact form, OK?
About the experiment:
1. Did you have all 20 ppl look at the same 11 textures?
2. What is a “texture”?
3. When a person “labels” a texture, what is the output? Do they rate it on a scale, measure it with an insrument, or come up with some other value?
About the stats:
4. For test A, is the question you’re trying to answer “Do any of these people’s observations differ greatly from the rest of the group?”
5. For test B, is the question you’re trying to answer “Do any of these texture’s ratings differ greatly from the rest of the textures?”
6. If those aren’t the questions you are trying to answer, please explain a little more exactly what you need to learn / decide about the people and textures.
Then please send me the observations (people / textures) in some data format. Text file, excel spreadsheet, whatever - doesn’t matter. As long as it’s not binary in a format other than .mat, I can probly read it.
– My first thought is that you have done A LOT of ttests! This is probably inflating the alpha error, and is generally discouraged. If you have several groups that you want to examine for differing means, ANOVA is the best place to start. Also, if you must compare them individually, I would look into some of the multi-comparative methods that adjust the alpha error accordingly.
I will probably be more helpful after fully understanding the experiment from the questions above. Talk to you again then.
Regards,
Rob
Hi Rob, thanks for responding so fast.
The background to this is that we used machine learning to process the artificial textures. They have been recorded using an artificial finger and the results of classification are very good (>95% for texture discrimination, and >80% for label discrimination). I’m writing my thesis now and previously ran the label data (from subjects) through a ttest to check for outliers, however i’m pretty sure i used the wrong method (it was fast and lazy, they all looked fine by eye anyway).
Hopefully these answers will shed some light on the experiment:
1. All 20 people looked at the same 11 textures.
2. Each texture is manufactured to look at an individual area of texture using a rapid prototype machine, however some of them are variations on similar shapes etc.
3. The people labelled them the set {Very Rough, Rough, Medium, Smooth, Very Smooth}, and this label is represented as a number (1-5 respectively). I will attach the data below this post as csv.
And the stats:
4. Yes, i’m checking for crazy subjects!
5. On a texture basis only, do any of the applied labels vary from that textures distribution (as each texture has a distinct distribution).
ttest wise I ran two loops:
% person is the labels assigned by that individual (vector)
% personmean is the mean of all people by texture (vector)
for 1:20 (people)
testP(i) = ttest(person(i,:),peoplemean);
end
% texture is the labels assigned by every person (vector)
% texturemean is the mean of those labels (singular)
for 1:11 (texture)
testT(i) = ttest(texture(:,i),texturemean);
end
Thanks for the help!
James
Subject,Tex A,Tex B,Tex C,Tex D,Tex E,Tex F,Tex G,Tex H,Tex I,Tex J,Tex K
1,1,4,1,1,2,5,5,3,5,4,2
2,1,2,2,1,3,5,5,3,4,4,3
3,2,2,1,1,3,5,5,3,4,4,3
4,1,3,3,1,2,5,5,4,4,4,3
5,1,2,2,1,3,5,5,3,4,4,3
6,1,4,1,1,2,5,5,4,4,5,3
7,1,2,1,1,3,5,5,3,5,4,3
8,1,2,2,1,3,5,5,4,4,5,3
9,1,3,3,1,1,5,5,2,4,5,2
10,1,2,2,1,2,5,5,3,3,4,3
11,1,2,1,1,3,5,5,4,4,4,3
12,1,2,1,1,3,5,5,4,4,4,3
13,1,2,2,1,1,5,5,3,3,4,3
14,1,2,2,1,3,5,4,4,4.5,5,3
15,1,3,1,1,2,5,4,3,3,4,3
16,1,3,1,1,2,5,5,4,4,4,2
17,1,3,3,1,2,5,5,5,4,4,2
18,1,4,3,1,2,5,5,3,4,4,3
19,1,3,2,1,2,5,5,4,4,4,3
20,1,2,2,1,4,5,5,3,3,4,3
James,
OK, thanks. I think I better understand your experiment now. If I reiterate things below that you’ve already told me, it’s to check that I really do have it right. So please bear with me. That said, here are some points you might consider. Please take them constructively:
1. Generally speaking I don’t think that ttest’s are the right tool for finding an outlier in a crowd. This test is a rather sharp instrument and usually intended for experiments that isolate a specific difference that you want to test. I expect that ttest is often over-used. I have recommendations for a better approach below.
2. Using ttest in a FOR loop is usually trouble. Earlier when I guessed that you were doing lots of ttests, this is what I worried about. The problem is that you are applying the alpha error each time you run the test (the chance of the test finding a difference when there really isn’t one). Since you used the default test parameter of alpha = 0.05 (generally accepted), the observed alpha after running the FOR loop is 1-(1-alpha)^n, or 1-.95^20 = 0.6415! This activity has increased your type I risk to 64%. This is a theoretical number and probably won’t hold for your data (more on that in a minute), but any way you look at it, multiple ttest’s inflates alpha to unacceptable levels.
3. Your data does not meet the underlying assumptions for ttest, which are continuous and normally distributed. The class of data you collected is known as “Likert” data and is very often seen in surveys and clinical research where there just isn’t a continuous measurement method / instrument.
4. All is not lost though, for a couple of reasons. First, Likert is not as bad a “categorical” data - at least it is ordinal. By that I mean that 1 to 5 corresponding to blue, red, white, green, black would be worse. At least your data is increasing in “goodness” as it increases in value. Secondly, there is some literature to support running statistical procedures on Likert data that would otherwise require normally distributed data. The fallback is to check that your data is at least “hump shaped”.
5. I looked and your data doesn’t really have much “hump” to it. So, for all these reasons so far, I don’t recommend running ttests. Instead, I would select ANOVA. Since you want to compare things to the group mean, this fits exactly what ANOVA is designed to do. You could run an ANOVA on all 20 people’s observations, called a one-way ANOVA by people. You could run an ANOVA on all 11 texture’s lables, called a one-way ANOVA by textures. Or, if it makes sense for the questions you are trying to answer, you could run a 2-way ANOVA by people and textures.
7. After running the ANOVA, if anything was significantly different than the group mean, you wouldn’t know which one. Although you can usually pick out the odd duck by inspecting a boxplot, ANOVA itself just tells you that one of these things is not like the other one - not which one. Then I would use the *multcompare* procedure to ID the dissenters. It is similar to a bunch of ttest’s, but with adjusted alpha that does not over-inflate. Only run it if ANOVA rejects the null though.
8. All the stat stuff aside, I have to double-check if what you’re trying to compare here all makes sense or not. For the first loop of 20 you ran, you compared the ith person’s observations to their own mean, correct? Tell me if I’m wrong, but a test of that sort will not find a difference by design. If you were, in fact, comparing each person’s observations to a vector of means by texture, then the test was effectively comparing the person to the grand mean (the mean of the means by texture). This second case is what ANOVA will get at without the alpha problem.
9. Similarly for the second loop, it looks like the singular value for the mean of the labels is the same case as in #8 above. Either it was a self-comparison (destined not to find a diff), or ANOVA will be better.
OK James, this very long comment (my longest so far!) verbosely explains what I think will work best here. I also thank you for adding this example to the post. These were actually some of the points that I edited out in the interest of length and readership, so I’m glad we could get them out after all.
I would like to keep working with you offline to make sure you get what you need for your thesis. After you’ve had a chance to digest all this info above, please follow up with me at rslazasnospam(at)gmail(dot)com, without the “nospam”.
Best Regards,
Rob
James (again),
Oh yes, I would be more helpful if I actually included the MATLAB code for the tests I recommended above. I wrote so much that I forgot!
Presume that your data was imported exactly the way you posted it above, with the header row and the first column of 1:20 people deleted. This will leave a 20 row x 11 column matrix of doubles.
To run a one-way ANOVA by people:
To run a one-way ANOVA by textures:
To run a two-way ANOVA by textures and people:
Running this myself, I see that “by people” does not show a difference in either anova1 or anova2. This indicates that none of your testers were too crazy.
I see that “by texture” shows a difference with both anova1 and anova2. This would be expected if your textures were made to be quite different from each other. If all the textures were very similar (only subtle differences), then you might not expect this. You tell me!
Hope this helps,
Rob
Hi Guys,
I am working on a project to test normality using a hypothesis test that has unknown parameters. I need to test whether or not the random data generated comes from normally distributed samples. Also, I need to design experiments to know how confident the test be.
Can any one tell me steps and piece of code, if available, on how to generate random variable that does this and what experiment that fits this test.
Thanks,
Hi ElectEng,
For your first question, I think the “lilliefors” function is what you’re looking for. It tests a sample for normality while making no assumptions about the parameters. This means that it estimates them from the sample. Check out this function in the table above and in the MATLAB docs.
For your second question, I am a bit unclear about what you intend to do. The “random” function can generate a random variable for you in quite a few different ways. The part that I’m unclear about is why you would want to then test its normality. Since you determine the random variable’s distribution and parameters, it seems redundant to test it. For example, running x = random(’norm’,…) and then lilliefors(x) is really only testing the system made up of the random generator and the lilliefors hypoth test. But if that’s what you want to do, then these are the functions I recommend.
Hope this helps! Write back if you need more info, or if I have misunderstood your goal.
Rob
The style of writing is very familiar to me. Have you written guest posts for other bloggers?
@Ted, nope! I’ve only posted here so far.