 # Hypothesis Testing Question 3: Interpret the p-value

Given our result of p = .171, what would you conclude about the VAST preparation course?

Recall that VAST scores for the population of people who did not take the preparation course is normally distributed with a mean of 500 and standard deviation of 100. Our null hypothesis is that population of people who complete the ACE training course have VAST test scores no larger than those who do not complete the course. We set alpha equal to .05 before we collected data.

#### Graduates of ACE training program do not differ from non-graduates on average VAST scores.

Incorrect. Even though we cannot reject the null hypothesis, we have not shown that the null hypothesis is true. This distinction is very important. Failure to find an effect does not tell us that there is no effect. Perhaps the effect is just too small for us to detect with our limited sample size.

#### We are confident that the training program graduates on average score higher than 500.

Incorrect. Our sample mean of 530 suggests that graduates of the training program may on average score better than 500, but we cannot be confident to the desired level (p < .05). Our calculated p-value of .171 is the probability that we would observe a sample mean of 530 or greater if the graduates of the training program did not differ from non-graduates.

When we set alpha to .05, we specified that we would need a p-value less than .05 to reject the null hypothesis and conclude that the graduates of the training program have larger scores on average. Our observed p-value of .171 is not small enough to meet our criterion (α = .05) for rejecting H0

#### Graduates of the training program may not differ from non-graduates on average VAST scores.

This is the best answer. Our sample mean of 530 suggests that graduates of the training program may on average score better than 500, but we cannot be confident. Our calculated p-value is the probability that we would observe a sample mean of 530 or greater if the training program had no effect.

When we set alpha to .05, we specified that we would need a p-value less than .05 to reject the null hypothesis and conclude that graduates of the training program have higher scores. Our observed p-value of .171 is not small enough to meet our criterion (α = .05) for rejecting H0. We are left with the conclusion that graduates of the training program may not have greater VAST scores on average.

#### We are confident that training program graduates score lower than the the average VAST score.

This is not correct. Our sample mean is 530 and the null population mean is 500. Because the sample mean (for the 10 students from the ACE training course) is actually a little higher than the null population mean, we have no evidence to argue that the average for graduates is less than 500.

#### Show me a hint!

The probability is .171 of obtaining a sample mean as large as 530 (N = 10), when the population mean is 500. This means that 17% of the time when we take a random sample of 10 people from a normally distributed population with a mean score of 500 and SD = 100, we will observe a sample mean of 530 or greater.

The most important issue to understand here is that a sample mean as large as the observed 530 is not very surprising even if we are sampling from a population with a mean of only 500. Because the observed sample mean of 530 could easily be found if the population mean is 500, we cannot be confident that graduates of the program have an average above 500. To have confidence that graduates of the program have higher VAST scores, we would need data that are more convincing.

For example, suppose we randomly selected a sample of 10 people from another VAST preparation program and found a sample mean of 600. Our z-score would be 3.16, p = .001, telling us that a mean this large or larger would occur only about 0.1% of the time if the population mean for the program was actually 500. Since a sample mean as large as 600 is has a p-value less than alpha, we would conclude that graduates of this program probably do score greater than 500 on average. 