Vector Projections and Orthogonality
Orthogonality
Two vectors \(u, v \in \mathbb{R}^n\) are orthogonal (perpendicular) if their dot product is zero:
Geometric interpretation: the angle between \(u\) and \(v\) is 90°.
Example
\(u = [1, 0, 0]^T and\) v = [0, 1, 0]^T
\(u \cdot v\) = (1)(0) + (0)(1) + (0)(0) = 0 ✓
\(u\) and \(v\) are orthogonal.
Orthonormal Vectors
A set of vectors {\(q_{1}, q_{2}, \ldots, q_{k}\)} is orthonormal if:
- Each vector has unit length: \(\|q_i\|_2 = 1\)
- Vectors are pairwise orthogonal: \(q_i \cdot q_j = 0\) for \(i \neq j\)
- \(QQ^T = I\) (rows are also orthonormal)
- Preserves lengths: \(\|Qv\|_2 = \|v\|_2\)
- Preserves dot products: \((Qu) \cdot (Qv) = u \cdot v\)
- \(\det(Q) = \pm 1\)
- \(P^2\) = P$ (idempotent: projecting twice is the same as once)
- \(P^T = P\) (symmetric, for orthogonal projections)
- Eigenvalues are 0 or 1
- \(u_1 = v_1\)
- \(u_2 = v_2 - \text{proj}_{u_1}(v_2)\)
- \(u_3 = v_3 - \text{proj}_{u_1}(v_3) - \text{proj}_{u_2}(v_3)\)
- ...
- \(q_i = u_i / \|u_i\|_2\) (normalize)
Equivalently, \(q_i \cdot q_j = \delta_{ij}\) where \(\delta_{ij}\) is the Kronecker delta:
Example
q₁ = [1, 0, 0]ᵀ
q₂ = [0, 1, 0]ᵀ
q₃ = [0, 0, 1]ᵀ
These are orthonormal (standard basis vectors in \(\mathbb{R}^{3}\)).
Orthogonal Matrices
A square matrix \(Q \in \mathbb{R}^{n \times n}\) is orthogonal if its columns are orthonormal:
Equivalently, \(Q^T = Q^{-1}\).
Properties
Example
Rotation matrix by angle \(\theta\):
Q = [cos(θ) -sin(θ)]
[sin(θ) cos(θ)]
QᵀQ = [cos(θ) sin(θ)][ cos(θ) -sin(θ)]
[-sin(θ) cos(θ)][ sin(θ) cos(θ)]
= [cos²(θ)+sin²(θ) -cos(θ)sin(θ)+sin(θ)cos(θ)] [1 0]
[sin(θ)cos(θ)-cos(θ)sin(θ) sin²(θ)+cos²(θ) ] = [0 1] ✓
Scalar Projection
The scalar projection of \(v\) onto \(u\) is the length of the component of \(v\) in the direction of \(u\):
This is a scalar (can be negative if \(v\) points opposite to \(u\)).
Example
\(u = [3, 0]^T,\) v = [2, 2]^T
comp_u(\(v\)) = ((3)(2) + (0)(2)) / \(\sqrt\)(3\(^2\) + 0\(^2\)) = 6 / 3 = 2
Vector Projection
This is a vector in the direction of \(u\).
If \(u\) is a unit vector (\(\|u\|_2 = 1\)):
Example
\(u = [1, 0]^T,\) v = [3, 4]^T
proj_u(\(v\)) = ((\(u \cdot v\)) / (\(u \cdot u\))) \(u\) = ((3)(1) + (4)(0)) / (1\(^2\) + 0\(^2\)) \([1, 0]^T = 3\)[1, 0]^T = $[3, 0]^T
Interpretation: The projection of \([3, 4]^T onto the x-axis is\)[3, 0]^T.
Projection onto a Subspace
Given an orthonormal basis {\(q_{1}, q_{2}, \ldots, q_{k}\)} for a subspace S, the projection of \(v\) onto S is:
\(\text{proj}_S(v) = (q_1 \cdot v)q_{1} + (q_2 \cdot v)q_{2} + \cdots + (q_k \cdot v)q_{k}\)
In matrix form, if \(Q = [q_1\, q_{2}\, \cdots\, q_{k}] \in \mathbb{R}^{n \times k}\):
The matrix \(P = QQ^T\) is the projection matrix onto S.
Example
Project \(v = [1, 2, 3]^T onto the plane spanned by\) q_{1} = [1, 0, 0]^T and $q_{2} = [0, 1, 0]^T (the xy-plane):
$\text{proj}_S(v) = (q_1 \cdot v)q_{1} + (q_2 \cdot v)q_{2} = (1)[1, 0, 0]^T + (2)[0, 1, 0]^T = [1, 2, 0]^T
The z-component is removed.
Orthogonal Complement
The orthogonal complement of a subspace S, denoted S⊥, is the set of all vectors orthogonal to every vector in S:
S⊥ = {\(w\) : \(w \cdot v\) = 0 for all \(v \in\) S}
Every vector \(v\) can be uniquely decomposed:
where \(v_{\parallel} \in S\) and \(v_{\perp} \in S^{\perp}\).
Projection Matrix Properties
A projection matrix \(P\) satisfies:
The matrix \(I - P\) projects onto the orthogonal complement.
Example
Projection onto the x-axis in \(\mathbb{R}^{2}\):
P = [1 0]
[0 0]
P² = [1 0][1 0] [1 0]
[0 0][0 0] = [0 0] = P ✓
Pᵀ = [1 0]
[0 0] = P ✓
Projection onto orthogonal complement (y-axis):
I - P = [0 0]
[0 1]
Gram-Schmidt Orthogonalization
Given linearly independent vectors {\(v_{1}\), \(v_{2}\), ..., \(v_{k}\)}, construct orthonormal vectors {\(q_{1}\), \(q_{2}\), ..., \(q_{k}\)} spanning the same subspace:
Example
Orthogonalize \(v_{1}\) = \([1, 1, 0]^T,\) v_{2}\(=\)[1, 0, 1]^T:
Step 1:
u₁ = v₁ = [1, 1, 0]ᵀ
Step 2:
\[proj_u₁(v₂) = ((u₁ · v₂) / (u₁ · u₁)) u₁\]
= ((1)(1) + (1)(0) + (0)(1)) / ((1)² + (1)² + (0)²) [1, 1, 0]ᵀ
= (1/2)[1, 1, 0]ᵀ
= [1/2, 1/2, 0]ᵀ
\[u₂ = v₂ - proj_u₁(v₂)\]
= [1, 0, 1]ᵀ - [1/2, 1/2, 0]ᵀ
= [1/2, -1/2, 1]ᵀ
Step 3: Normalize:
\|u₁\|₂ = √(1² + 1² + 0²) = √2
q₁ = [1/√2, 1/√2, 0]ᵀ
\|u₂\|₂ = √((1/2)² + (-1/2)² + 1²) = √(1/4 + 1/4 + 1) = √(3/2) = √6/2
q₂ = [1/√6, -1/√6, 2/√6]ᵀ
Verify orthogonality: \(q_1\) \cdot \(q_{2}\) = (1/\(\sqrt{2})(1/ \sqrt{6}\)) + (1/\(\sqrt{2})(-1/ \sqrt{6}\)) + (0)(2/$\sqrt{6}) = 0 ✓
Relevance for Machine Learning
QR Decomposition: Gram-Schmidt is used to compute the QR decomposition \(A = QR\), where \(Q\) is orthogonal and \(R\) is upper triangular. Used in solving least squares problems and computing eigenvalues.
Orthogonal Initialization: Initializing neural network weights with orthogonal matrices helps gradient flow and prevents vanishing/exploding gradients.
Attention Mechanisms: Multi-head attention in transformers uses orthogonal projections to create independent attention heads.
Residual Connections: In ResNets, skip connections can be viewed as orthogonal complements to learned transformations, enabling easier optimization.
Least Squares Regression: The solution \(w\) = (\(X^TX\))\(^{-1}\) X^Ty \(can be interpreted as projecting\) y \(onto the column space of\) X \(. The residual\) y - Xw$ is orthogonal to the column space.
Kernel Methods: Kernel ridge regression finds the solution in the span of training samples. The projection onto this space is computed using the kernel matrix.
Feature Orthogonalization: Decorrelating features (e.g., via PCA) creates orthogonal features, simplifying interpretation and improving numerical stability.
Gram Matrix: In style transfer and kernel methods, the Gram matrix \(G = X^T\) X\(captures feature correlations. Diagonal\) G$ indicates orthogonal features.
Distance Metrics: Mahalanobis distance accounts for feature correlations by transforming data into an orthogonal basis.
Manifold Learning: Locally Linear Embedding (LLE) and other methods project high-dimensional data onto low-dimensional manifolds using orthogonal projections.