Dot Products and Vector Norms
Dot Product Definition
The dot product (also called inner product or scalar product) of two vectors \(u, v \in \mathbb{R}^n\) is:
The result is a scalar, not a vector.
Example
Let \(u = [1, 2, 3]^T\) and \(v = [4, 5, 6]^T\). Then:
Matrix Notation
The dot product can be written as matrix multiplication:
where \(u^T\) is a row vector and \(v\) is a column vector.
Properties of the Dot Product
- Commutativity: \(u \cdot v = v \cdot u\)
- Distributivity: \(u \cdot (v + w) = u \cdot v + u \cdot w\)
- Scalar multiplication: \((\alpha u) \cdot v = \alpha(u \cdot v)\)
- Positive definiteness: \(v \cdot v \geq 0\), with equality if and only if \(v = \mathbf{0}\)
- If \(u \cdot v > 0\), then \(\theta < 90°\) (vectors point in similar directions)
- If \(u \cdot v = 0\), then \(\theta = 90°\) (vectors are orthogonal/perpendicular)
- If \(u \cdot v < 0\), then \(\theta > 90°\) (vectors point in opposite directions)
- Non-negativity: \(\|v\| \geq 0\), with \(\|v\| = 0\) if and only if \(v = \mathbf{0}\)
- Homogeneity: \(\|\alpha v\| = |\alpha| \|v\|\)
- Triangle inequality: \(\|u + v\| \leq \|u\| + \|v\|\)
- L₁, L₂, and L∞ are special cases
- As \(p \to \infty\), Lₚ approaches L∞
Geometric Interpretation
For vectors \(u\) and \(v\) in \(\mathbb{R}^n\):
where \(\theta\) is the angle between \(u\) and \(v\), and \(\|\cdot\|\) denotes the Euclidean norm (defined below).
Implications
Example
Let \(u = [1, 0]^T\) and \(v = [0, 1]^T\). Then:
These vectors are orthogonal.
Vector Norms
A norm is a function that assigns a non-negative length to vectors. A norm \(\|\cdot\|\) must satisfy:
Euclidean Norm (L₂ Norm)
The Euclidean norm or L₂ norm is:
This measures the straight-line distance from the origin to the point represented by \(v\).
Example
Manhattan Norm (L₁ Norm)
The L₁ norm is the sum of absolute values:
This measures the distance traveling along axes (grid distance).
Example
For $v = [3, -4]^T:
Maximum Norm (L∞ Norm)
The L∞ norm is the maximum absolute value:
Example
For $v = [3, -7, 2]^T:
General Lₚ Norm
The Lₚ norm for \(p \geq 1\) is:
Unit Vectors and Normalization
A unit vector has norm equal to 1. Any non-zero vector \(v\) can be normalized:
This produces a unit vector in the same direction as \(v\).
Example
For \(v = [3, 4]^T with\)\|v\|_2 = 5$:
Verify: \(\|\hat{u}\|_2 = \sqrt{0.6^2 + 0.8^2} = \sqrt{0.36 + 0.64} = 1\)
Distance Between Vectors
The distance between \(u\) and \(v\) using the Lₚ norm is:
For L₂ (Euclidean distance):
Relevance for Machine Learning
Similarity Measurement: The dot product measures similarity between vectors. In recommendation systems, user-item preferences are vectors, and dot products compute compatibility scores.
Cosine Similarity: Normalized dot product measures angular similarity:
This is used in document similarity, image retrieval, and clustering.
Distance Metrics: K-nearest neighbors (KNN) classifies samples based on distance. Euclidean distance (L₂) is most common, but L₁ (Manhattan) is robust to outliers.
Loss Functions: Mean squared error (MSE) for regression is the squared L₂ norm:
Mean absolute error (MAE) uses L₁:
Regularization: Ridge regression (L₂ regularization) adds \(\|w\|_2^2\) to penalize large weights. Lasso regression (L₁ regularization) adds \(\|w\|_1\) to encourage sparsity (many weights become exactly zero).
Gradient Computation: The gradient of \(\|w\|_2^2\) is \(2w\), used in weight decay. The subgradient of \(\|w\|_1\) is \(\text{sign}(w)\), used in sparse optimization.
Attention Mechanisms: In transformers, attention scores are computed using scaled dot products:
where \(q\) is a query vector, \(k\) is a key vector, and \(d\) is dimensionality.