Systems of Linear Equations
Definition
A system of m linear equations in n unknowns has the form:
a₁₁x₁ + a₁₂x₂ + ... + a₁ₙxₙ = b₁
a₂₁x₁ + a₂₂x₂ + ... + a₂ₙxₙ = b₂
...
aₘ₁x₁ + aₘ₂x₂ + ... + aₘₙxₙ = bₘ
This can be written compactly as a matrix equation:
where:
- \(A \in \mathbb{R}^{m \times n}\) is the coefficient matrix
- \(x \in \mathbb{R}^{n}\) is the vector of unknowns
- \(b \in \mathbb{R}^{m}\) is the vector of constants
Example: \(2 \times 2\) System
2x₁ + 3x₂ = 8
4x₁ + 1x₂ = 10
Matrix form:
[2 3][x₁] [8 ]
[4 1][x₂] = [10]
Solution Types
A system \(Ax = b\) can have:
- Unique solution: Exactly one \(x\) satisfies the equation
- Infinite solutions: Multiple (infinitely many) \(x\) satisfy the equation
- No solution: No \(x\) satisfies the equation (inconsistent system)
- Swap two rows
- Multiply a row by a non-zero scalar
- Add a multiple of one row to another
- \(R_{2}\) ← \(R_{2}\) - \(2R_1\)
- \(R_{3}\) ← \(R_{3}\) - \(R_{1}\)
- \(R_{3}\) ← \(R_{3}\) - \(R_{2}\)
- From row 2: \(-3x_2 + x_3 = -1 \implies x_3 = 3x_2 - 1\)
- From row 1: \(x_1 + 2x_2 + x_3 = 4 \implies x_1 = 4 - 2x_2 - x_3 = 4 - 2x_2 - (3x_2 - 1) = 5 - 5x_2\)
- All zero rows are at the bottom
- The first non-zero entry (pivot) of each row is to the right of the pivot above it
- All entries below a pivot are zero
- All pivots equal 1
- All entries above and below pivots are zero
Example: Unique Solution
[2 1][x₁] [5]
[1 3][x₂] = [8]
From equation 1: \(2x_1 + x_2 = 5 \implies x_2 = 5 - 2x_1\)
Substitute into equation 2:
Then \(x_2 = 5 - 2(1.4) = 5 - 2.8 = 2.2\).
Solution: \(x = [1.4, 2.2]^T\)
Example: Infinite Solutions
[1 2][x₁] [3]
[2 4][x₂] = [6]
The second equation is \(2 \times\) the first, so both equations represent the same constraint. Solutions: \(x_2 = (3 - x_1)/2\) for any \(x_1\). Example solutions: \([1, 1]^T\), \([3, 0]^T\), \([-1, 2]^T\).
Example: No Solution
[1 2][x₁] [3]
[2 4][x₂] = [7]
The second equation requires \(2x_1\) + \(4x_2\) = 7, but doubling the first gives \(2x_1\) + \(4x_2\) = 6. Contradiction: 6 \(\neq\) 7. No solution exists.
Gaussian Elimination
Gaussian elimination transforms \(Ax = b\) to row echelon form using elementary row operations:
Example
Solve:
x₁ + 2x₂ + x₃ = 4
2x₁ + x₂ + 3x₃ = 7
x₁ - x₂ + 2x₃ = 3
Augmented matrix:
[1 2 1 | 4]
[2 1 3 | 7]
[1 -1 2 | 3]
Step 1: Eliminate \(x_{1}\) from rows 2 and 3:
[1 2 1 | 4]
[0 -3 1 |-1]
[0 -3 1 |-1]
Step 2: Eliminate from row 3:
[1 2 1 | 4]
[0 -3 1 |-1]
[0 0 0 | 0]
Step 3: Back substitution:
Solution (parametrized by \(x_2 = t\)):
Example: \(t = 1\) gives \(x = [0, 1, 2]^T\).
Row Echelon Form
A matrix is in row echelon form if:
Reduced Row Echelon Form (RREF)
Additionally requires:
Example RREF:
[1 0 3 | 2]
[0 1 -1 | 4]
[0 0 0 | 0]
Homogeneous Systems
A system is homogeneous if \(b\) = 0:
Homogeneous systems always have at least one solution: \(x = \mathbf{0}\) (trivial solution).
If \(A\) has more columns than rows (\(n > m\)), or if columns are linearly dependent, non-trivial solutions exist.
Relevance for Machine Learning
Linear Regression (Normal Equation): The optimal weights for linear regression satisfy:
This is a system of linear equations. When \(X^TX\) is invertible, the solution is:
Least Squares Problems: Many ML problems involve finding \(x\) that minimizes \(\|Ax - b\|_2^2\). The solution satisfies the normal equation:
Solving Network Equations: In graph neural networks, node representations satisfy systems of equations based on message passing.
Kernel Ridge Regression: The prediction function involves solving:
where \(K\) is the kernel matrix.
System Identification: Inferring system dynamics from observations leads to solving \(Ax = b\) where \(A\) contains observed states and \(b\) contains outcomes.
Interpolation: Fitting polynomials or splines through data points formulates as a linear system where coefficients are unknowns.
Computational Efficiency: Understanding when systems have unique solutions guides algorithm design. Iterative methods (conjugate gradient, GMRES) solve large systems efficiently when direct methods are infeasible.