## Sunday, June 20, 2010

### Variance in values from prediction by regression

In a section about linear regression in Understanding Statistics in the Behavioral Sciences (7th), Robert Pagano, Thomson, 2004, we find the following equation on page 119.

After an explanation of the notation, we find,
We could then construct $Y_i - \overline{Y}$for each score.  If we squared each $Y_i - \overline{Y}$ and summed over all the scores, we would obtain

Obtaining this second equation from the first seems remarkable, but the textbook offered no insights on how one could see this.

Here's one possibility.

If we think of vectors $\vec{y}=\left(Y_1, Y_2, \ldots,Y_n \right )$, $\overline{y}=\left(\overline{Y}, \overline{Y}, \ldots,\overline{Y} \right )$, and $\hat{y}=\left(Y_1', Y_2', \ldots,Y_n' \right )$, then the second equation above is the assertion that

This equation is true whenever the vectors $\vec{y}-\hat{y}$ and $\hat{y}-\overline{y}$ are orthogonal.

But $\hat{y}$ is precisely the orthogonal projection of $\vec{y}$ onto the space spanned by $\vec{x}=\left(X_1,X_2,\ldots ,X_n)$ and $\left(1,1,\ldots ,1)$ (see my earlier blog), so $\hat{y}-\overline{y}$ is in that space and $\vec{y}-\hat{y}$ is in the orthogonal complement.