## Tuesday, December 15, 2009

### The Correlation Coeffiicent as cosine theta

Mathematicians define the dot product between vectors and as

On the other hand, the alternate geometric definition for the dot product popular with physicists is

So

And statisticians define Pearson's correlation coefficient r so that

Thus if we set and , then .

The idea is to think not of n ordered pairs (x1, y1), (x2, y2), ..., (xn, yn), but rather to think of two vectors in n-dimensional space. When the vectors are pointing in the same direction, the angle between them is zero and the correlation coefficient is cos 0 = 1. When the vectors point in opposite directions, the correlation coefficient is the cosine of a straight angle, r = -1. And when the vectors are orthogonal, the correlation coefficient is the cosine of a right angle, r = 0.

The only tricky part is that the two n-dimensional vectors are not the vectors and  , the vectors containing all the and respectively.  Instead, the necessary two n-dimensional vectors are the and defined above.

And nicely, the least-squares regression line for the data is y = mx + b, where and .  (Notice that the variance , so m can also be written as  .

One typically derives the least-squares regression line by finding m and b that minimize .  But one can alternatively use the n-dimensional vector point of view, where the coefficients m and b correspond to the solution of the vector equation .  The vector is the vector of all 1's and the vector   is the orthogonal projection of the vector  onto the space spanned by and .