Vector Projections and Orthogonality

Orthogonality

Two vectors $u, v \in \mathbb{R}^n$ are orthogonal (perpendicular) if their dot product is zero:

u \cdot v = 0

Geometric interpretation: the angle between $$u$$ and $$v$$ is 90°.

Example

$$u = [1, 0, 0]^T and$$ v = [0, 1, 0]^T

$u \cdot v$ = (1)(0) + (0)(1) + (0)(0) = 0 ✓

$$u$$ and $$v$$ are orthogonal.

Orthonormal Vectors

A set of vectors { $q_{1}, q_{2}, \ldots, q_{k}$ } is orthonormal if:

Each vector has unit length: $\|q_i\|_2 = 1$
Vectors are pairwise orthogonal: $q_i \cdot q_j = 0$ for $i \neq j$

Equivalently, $q_i \cdot q_j = \delta_{ij}$ where $\delta_{ij}$ is the Kronecker delta:

\delta_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases}

Example

q₁ = [1, 0, 0]ᵀ
q₂ = [0, 1, 0]ᵀ
q₃ = [0, 0, 1]ᵀ

These are orthonormal (standard basis vectors in $\mathbb{R}^{3}$ ).

Orthogonal Matrices

A square matrix $Q \in \mathbb{R}^{n \times n}$ is orthogonal if its columns are orthonormal:

\[Q^TQ = I\]

Equivalently, $Q^T = Q^{-1}$ .

Properties

$$QQ^T = I$$ (rows are also orthonormal)
Preserves lengths: $\|Qv\|_2 = \|v\|_2$
Preserves dot products: $(Qu) \cdot (Qv) = u \cdot v$
$\det(Q) = \pm 1$

Example

Rotation matrix by angle $\theta$ :

Q = [cos(θ)  -sin(θ)]
    [sin(θ)   cos(θ)]

QᵀQ = [cos(θ)   sin(θ)][ cos(θ)  -sin(θ)]
      [-sin(θ)  cos(θ)][ sin(θ)   cos(θ)]

    = [cos²(θ)+sin²(θ)   -cos(θ)sin(θ)+sin(θ)cos(θ)]   [1  0]
      [sin(θ)cos(θ)-cos(θ)sin(θ)   sin²(θ)+cos²(θ)  ] = [0  1] ✓

Scalar Projection

The scalar projection of $$v$$ onto $$u$$ is the length of the component of $$v$$ in the direction of $$u$$ :

\text{comp}_u(v) = \frac{u \cdot v}{\|u\|_2}

This is a scalar (can be negative if $$v$$ points opposite to $$u$$ ).

Example

$$u = [3, 0]^T,$$ v = [2, 2]^T

comp_u( $$v$$ ) = ((3)(2) + (0)(2)) / $\sqrt$ (3 $$^2$$ + 0 $$^2$$ ) = 6 / 3 = 2

Vector Projection

\text{proj}_u(v) = \left(\frac{u \cdot v}{u \cdot u}\right)u = \left(\frac{u \cdot v}{\|u\|_2^2}\right)u

This is a vector in the direction of $$u$$ .

If $$u$$ is a unit vector ( $\|u\|_2 = 1$ ):

\text{proj}_u(v) = (u \cdot v)u

Example

$$u = [1, 0]^T,$$ v = [3, 4]^T

proj_u( $$v$$ ) = (( $u \cdot v$ ) / ( $u \cdot u$ )) $$u$$ = ((3)(1) + (4)(0)) / (1 $$^2$$ + 0 $$^2$$ ) $$[1, 0]^T = 3$$ [1, 0]^T = $[3, 0]^T

Interpretation: The projection of $$[3, 4]^T onto the x-axis is$$ [3, 0]^T.

Projection onto a Subspace

Given an orthonormal basis { $q_{1}, q_{2}, \ldots, q_{k}$ } for a subspace S, the projection of $$v$$ onto S is:

$\text{proj}_S(v) = (q_1 \cdot v)q_{1} + (q_2 \cdot v)q_{2} + \cdots + (q_k \cdot v)q_{k}$

In matrix form, if $Q = [q_1\, q_{2}\, \cdots\, q_{k}] \in \mathbb{R}^{n \times k}$ :

\text{proj}_S(v) = QQ^Tv

The matrix $$P = QQ^T$$ is the projection matrix onto S.

Example

Project $$v = [1, 2, 3]^T onto the plane spanned by$$ q_{1} = [1, 0, 0]^T and $q_{2} = [0, 1, 0]^T (the xy-plane):

$\text{proj}_S(v) = (q_1 \cdot v)q_{1} + (q_2 \cdot v)q_{2} = (1)[1, 0, 0]^T + (2)[0, 1, 0]^T = [1, 2, 0]^T

The z-component is removed.

Orthogonal Complement

The orthogonal complement of a subspace S, denoted S⊥, is the set of all vectors orthogonal to every vector in S:

S⊥ = { $$w$$ : $w \cdot v$ = 0 for all $v \in$ S}

Every vector $$v$$ can be uniquely decomposed:

v = v_{\parallel} + v_{\perp}

where $v_{\parallel} \in S$ and $v_{\perp} \in S^{\perp}$ .

v_{\parallel} = \text{proj}_S(v)

v_{\perp} = v - \text{proj}_S(v)

Projection Matrix Properties

A projection matrix $$P$$ satisfies:

$$P^2$$ = P$ (idempotent: projecting twice is the same as once)
$$P^T = P$$ (symmetric, for orthogonal projections)
Eigenvalues are 0 or 1

The matrix $$I - P$$ projects onto the orthogonal complement.

Example

Projection onto the x-axis in $\mathbb{R}^{2}$ :

P = [1  0]
    [0  0]

P² = [1  0][1  0]   [1  0]
     [0  0][0  0] = [0  0] = P ✓

Pᵀ = [1  0]
     [0  0] = P ✓

Projection onto orthogonal complement (y-axis):

I - P = [0  0]
        [0  1]

Gram-Schmidt Orthogonalization

Given linearly independent vectors { $v_{1}$ , $v_{2}$ , ..., $v_{k}$ }, construct orthonormal vectors { $q_{1}$ , $q_{2}$ , ..., $q_{k}$ } spanning the same subspace:

$$u_1 = v_1$$
$u_2 = v_2 - \text{proj}_{u_1}(v_2)$
$u_3 = v_3 - \text{proj}_{u_1}(v_3) - \text{proj}_{u_2}(v_3)$
...
$q_i = u_i / \|u_i\|_2$ (normalize)

Example

Orthogonalize $v_{1}$ = $$[1, 1, 0]^T,$$ v_{2} $$=$$ [1, 0, 1]^T:

Step 1:

u₁ = v₁ = [1, 1, 0]ᵀ

Step 2:

 $\[proj_u₁(v₂) = ((u₁ \cdot v₂) / (u₁ \cdot u₁)) u₁\]$ 
            = ((1)(1) + (1)(0) + (0)(1)) / ((1)² + (1)² + (0)²) [1, 1, 0]ᵀ
            = (1/2)[1, 1, 0]ᵀ
            = [1/2, 1/2, 0]ᵀ

 $\[u₂ = v₂ - proj_u₁(v₂)\]$ 
   = [1, 0, 1]ᵀ - [1/2, 1/2, 0]ᵀ
   = [1/2, -1/2, 1]ᵀ

Step 3: Normalize:

\|u₁\|₂ = √(1² + 1² + 0²) = √2
q₁ = [1/√2, 1/√2, 0]ᵀ

\|u₂\|₂ = √((1/2)² + (-1/2)² + 1²) = √(1/4 + 1/4 + 1) = √(3/2) = √6/2
q₂ = [1/√6, -1/√6, 2/√6]ᵀ

Verify orthogonality: $$q_1$$ \cdot $q_{2}$ = (1/ $\sqrt{2})(1/ \sqrt{6}$ ) + (1/ $\sqrt{2})(-1/ \sqrt{6}$ ) + (0)(2/$\sqrt{6}) = 0 ✓

Relevance for Machine Learning

QR Decomposition: Gram-Schmidt is used to compute the QR decomposition $$A = QR$$ , where $$Q$$ is orthogonal and $$R$$ is upper triangular. Used in solving least squares problems and computing eigenvalues.

Orthogonal Initialization: Initializing neural network weights with orthogonal matrices helps gradient flow and prevents vanishing/exploding gradients.

Attention Mechanisms: Multi-head attention in transformers uses orthogonal projections to create independent attention heads.

Residual Connections: In ResNets, skip connections can be viewed as orthogonal complements to learned transformations, enabling easier optimization.

Least Squares Regression: The solution $$w$$ = ( $$X^TX$$ ) $^{-1}$ X^Ty $$can be interpreted as projecting$$ y $$onto the column space of$$ X $$. The residual$$ y - Xw$ is orthogonal to the column space.

Kernel Methods: Kernel ridge regression finds the solution in the span of training samples. The projection onto this space is computed using the kernel matrix.

Feature Orthogonalization: Decorrelating features (e.g., via PCA) creates orthogonal features, simplifying interpretation and improving numerical stability.

Gram Matrix: In style transfer and kernel methods, the Gram matrix $$G = X^T$$ X $$captures feature correlations. Diagonal$$ G$ indicates orthogonal features.

Distance Metrics: Mahalanobis distance accounts for feature correlations by transforming data into an orthogonal basis.

Manifold Learning: Locally Linear Embedding (LLE) and other methods project high-dimensional data onto low-dimensional manifolds using orthogonal projections.

Vector Projections and Orthogonality

Orthogonality

Example

Orthonormal Vectors

Example

Orthogonal Matrices

Properties

Example

Scalar Projection

Example

Vector Projection

Example

Projection onto a Subspace

Example

Orthogonal Complement

Projection Matrix Properties

Example

Gram-Schmidt Orthogonalization

Example

Relevance for Machine Learning

Related Articles