Regression: Module #4–Another Example

Module #4–Another Example

Download PDF Copy Go to Regression Applet

If you have a copy of this page, proceed to the applet. If you do not have a copy, print this page before proceeding to the applet.

In this module, you will learn how to use the WISE regression applet to deepen your understanding of the relationship between regression and ANOVA. Using the very small set of data shown below, we will step through relevant regression values and see how they are calculated and how they are represented graphically. If you completed Modules #1 and #2 this will be review. Answers are provided for most problems at the end of the exercise. In Exercise 2, you will compare regression calculations to ANOVA calculations for these same data.

Set up the applet:  From the ‘Select a Lesson:’ menu in the lower right hand corner of the applet, choose ‘Regression.’  Remove the checks from all boxes except for the box: Show Regression Line.

A. Correlation, Slope, and Y-intercept. The applet provides these statistics, which are important for regression analysis. Find these terms in the applet and enter each below.

correlation (r)   = _________

slope (b)           =  _________

intercept (a)  = _________

B. The regression equation. The regression equation is the formula for a straight line that best fits the data. The regression equation can be used to predict scores (called Y´, or Y-prime). For each of our X-values, we can predict Y. Below you will complete several calculations, deriving y-prime for each value of X. First, you will need the regression equation. The general form of this equation is Y´ = a +bx

C. Predicted Scores.  In our example, a=2.0 and b=2.0, so the regression equation is Y´ = 2.0 + 2.0X. Our first X score is 1, which generates a predicted Y score of 4.0, from 2.0 + 2.0(1). Calculate the three remaining Y´ values by hand and enter them into the table below. Make sure that you can do the calculations and interpretations yourself.

Case X Y
1 1 2 4
2 1 6
3 2 5
4 2 7
Sum

 

D. SS Total (Total Variance).  SS total is the sum of squared deviations of observed Y scores from the mean of Y.  This is an indication of the error we expect if we predict every Y score to be at the mean of Y.  (If X is not available or if X is not useful, then the mean of Y is our best prediction of Y scores.).

To calculate SS Total, take each value of Y, subtract the mean, and square the result, then sum all of the values in the column. A general formula for SS Total is . For these data the mean of Y is 5. For the first case, the squared deviation from the mean is 9. Calculate the values for the last three cases, and sum the values for all four cases in the last column to get SS Total.

Case X Y
1 1 2 4 2-5 = -3 (-3)2 = 9
2 1 6
3 2 5
4 2 7
Sum Σ =

 

E. Deviations from the mean.  Now in the applet, place a check mark in the boxes titled Show SS Total and Show Mean of Y and remove all other checks. The  vertical black lines[note that there are only three, because one is zero] represent the deviations of each case from the mean of Y. Verify the correspondence of the length of these lines with the values in the table for the column . Which case has the largest deviation from the mean?

The largest deviation from the mean is _____ for Case ___.

Hint: Look at the graph in the applet and at your calculations in the table.

F. Contribution to SS Total.  Now check the box labeled Show Error as Squares. The sizes of the black squares correspond to the squared deviations from the mean, and the sum of the areas of these squares corresponds to SS Total. Notice how the deviations from the mean for the first and fourth cases are -3 and +2, while the squared deviations are 9 and 4. This shows how points farther from the mean contribute much more to SS Total than points closer to the mean. What is the contribution of the third case to SS Total? Why?

The contribution to SS Total for Case 3 is ______ because

G. SS Total.  Now calculate the sum of the squared deviations from the mean . You can do this by adding the values in the column headed .

=  SS Total = ________.   In the applet, SS for Total = ________.

H. SS Error.  SS Error is the sum of squared deviations of observed Y scores from the predicted Y scores when we use information on X to predict Y scores with a regression equation.  SS Error is the part of SS Total that CANNOT be explained by the regression.

Calculations. Complete the calculations below using the predicted scores () calculated in question B.

Case X Y (Y – Y´) (Y – Y´)2
1 1 2 4 2-4 = -2 (-2)2 = 4
2 1 6
3 2 5
4 2 7
Sum Σ =

I. Regression Line and Deviations.  Now place check marks in the boxes titled Show Regression Line and Show SS error, and remove checks from all other boxes. Deviations of the observed points from their predicted values on the regression line are shown in red.

The largest deviations are for Cases ____, and the size of the deviation is ______.

The smallest deviations are for Cases ____, and the size of the deviation is ______.

J. SS Error.  Now check the box titled Show Errors as Squares. The sizes of the red squares correspond to the squared deviations. In the table for part h, compare the squared deviations shown in the last column for Cases 2 and 3. Observe how the red boxes for Cases 2 and 3 correspond to these values. The sum of the squared deviations is the sum of the last column in the table.

Record your calculated value here _________.  This is the Sum of Squares Error.

In the applet under Analysis of Variance find the value for SS Error _____________

K. SS Predicted.  SS Predicted is the part of SS Total that CAN be predicted from the regression.  This corresponds to the sum of squared deviations of predicted values of Y from the mean of Y.

Calculations. Complete the calculations below using the predicted scores () calculated for each case in part 1b and the mean of Y (5).

Case X Y
1 1 2 4 4-5 = -1 (-1)2 = 1
2 1 6
3 2 5
4 2 7
Sum Σ =

L. Regression Line and the Mean.  Now click the boxes marked Show Mean of Y and Show Regression Line and remove the checks from all other boxes. Check Show SS Predicted to see deviations of regression line from the mean, shown in blue.  The blue lines represent the differences between the mean and predicted scores.  If X were not useful in predicting Y, then the best prediction of Y would be simply the mean of Y for any value of X, and the blue lines would be zero in length. If X is useful in predicting Y, then the predicted values differ from the mean.  The blue lines give an indication of how well X predicts Y.

M. SS Error Meaning.  Click the box marked Show Error as Squares, to see the squared deviations of predicted scores from means. Compare these to the red squares for SS Error. (You can click Show SS Error if you would like to be reminded of the size of the red squares.) Is X useful for predicting Y in this plot?  How do you know?

N. SS Predicted.  The sum of the squared deviations of the predicted scores from the mean is the sum of the last column in the table in part J.

Record the calculated value here _________.  This is the Sum of Squares Predicted.

In the applet under Analysis of Variance find the value for SS Predicted _____________

Explain what SS Predicted means. What would the plot look like if SS Predicted was very small relative to SS Total?

Note that SS Total = SS Predicted + SS Error. (14.0 = 4.0 + 10.0). Thus, with the regression model, we split SS Total into two parts, SS Predicted and SS Error. We can compute the proportion of SS Total that is in SS Predicted.  In terms of sums of squares, this is the ratio of SS Predicted to SS Total.

O. r-squared as proportion of variance explained.

Calculate [SS Predicted/ SS Total] = _________ / __________ = ____________.

SS Total is the numerator of the variance of Y (i.e.,  ), so the calculated ratio can be interpreted as the proportion of variance in Y that can be predicted from X using the regression model. A useful fact in regression is that this ratio is equal to the correlation squared (r-squared).  Thus, the correlation squared (r-squared) represents the proportion of variance in Y that can be explained by X, using the regression model.

What does the applet report for the correlation r and r-squared?

r = ______;    r squared = ________

Go to Follow Up Questions

Loading