Mathematicians define the dot product between vectors
and
as
On the other hand, the alternate geometric definition for the dot product popular with physicists is
And statisticians define Pearson's correlation coefficient r so that
Thus if we set
and
, then
.
The idea is to think not of n ordered pairs (x1, y1), (x2, y2), ..., (xn, yn), but rather to think of two vectors in n-dimensional space. When the vectors are pointing in the same direction, the angle between them is zero and the correlation coefficient is cos 0 = 1. When the vectors point in opposite directions, the correlation coefficient is the cosine of a straight angle, r = -1. And when the vectors are orthogonal, the correlation coefficient is the cosine of a right angle, r = 0.
The only tricky part is that the two n-dimensional vectors are not the vectors
and
, the vectors containing all the
and
respectively. Instead, the necessary two n-dimensional vectors are the
and
defined above.
And nicely, the least-squares regression line for the
data is y = mx + b, where
and
. (Notice that the variance
, so m can also be written as
.
One typically derives the least-squares regression line by finding m and b that minimize
. But one can alternatively use the n-dimensional vector point of view, where the coefficients m and b correspond to the solution of the vector equation
. The vector
is the vector of all 1's and the vector
is the orthogonal projection of the vector
onto the space spanned by
and
.
The only tricky part is that the two n-dimensional vectors are not the vectors
And nicely, the least-squares regression line for the
One typically derives the least-squares regression line by finding m and b that minimize
No comments:
Post a Comment