Varol Cagdas Tok

Personal notes and articles.

Linear Algebra for Machine Learning

Linear algebra is the mathematical language of machine learning. From data representation to the mechanics of deep neural networks, its concepts are ubiquitous. Understanding vectors, matrices, and their operations is required for the field. We review these building blocks.

Vectors: The Foundation of Data Representation

At its core, a vector is an ordered list of numbers. In machine learning, we almost always use vectors to represent our data points. For example, a single data point with M features (e.g., the age, height, and weight of a person) can be represented as an M-dimensional vector. By convention, we typically work with column vectors.

\[c = \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_M \end{bmatrix}\]

The transpose of a column vector, denoted \(c^T\), is a row vector. This distinction becomes important when we start performing operations that involve both vectors and matrices.

Key vector operations include:

\[a \cdot c = a^T c = \sum_{j} a_j c_j\]

The dot product is a fundamental operation, often used to measure the similarity or projection of one vector onto another. The dot product of a vector with itself gives its squared Euclidean norm or length, \(\|a\|^2\).

Matrices: Organizing Data and Transformations

A matrix is a rectangular, two-dimensional array of numbers. Matrices serve two primary purposes in machine learning:

  1. Data Representation: An entire dataset of N data points, each with M features, can be organized into a single N x M matrix, often called the design matrix. Each row of the matrix represents a single data point (a vector), and each column represents a feature.
    1. Linear Transformations: A matrix can be seen as a function that performs a linear transformation on a vector. When we multiply a matrix by a vector, the vector is transformed—rotated, scaled, or sheared—into a new vector in a potentially different dimensional space.
    2. Matrix Operations

      • Matrix-Vector Product: If \(A\) is an \(N \times M\) matrix and \(c\) is an \(M\)-dimensional column vector, their product \(d = Ac\) is an \(N\)-dimensional column vector. Each component \(d_i\) of the resulting vector is the dot product of the \(i\)-th row of the matrix \(A\) with the vector \(c\). This is the mathematical representation of applying a linear transformation to a vector.
      • Matrix-Matrix Product: The product of an \(N \times M\) matrix \(A\) and an \(M \times K\) matrix \(C\) is an \(N \times K\) matrix \(D = AC\). Each element \(d_{i,k}\) is the dot product of the \(i\)-th row of \(A\) and the \(k\)-th column of \(C\). Matrix multiplication can be viewed as a composition of linear transformations.
      • Transpose: The transpose of a matrix \(A\), denoted \(A^T\), is obtained by swapping its rows and columns. A key property is that the transpose of a product is the product of the transposes in reverse order: \((AC)^T = C^T A^T\).
      • Outer Product: The product of an \(N\)-dimensional column vector \(d\) and an \(M\)-dimensional row vector \(c^T\) results in an \(N \times M\) matrix \(A = dc^T\). This is known as the outer product.

      Special Matrices and the Concept of Inverse

      Certain types of matrices have special properties that are important in machine learning:

      • Square Matrix: A matrix with the same number of rows and columns (N x N).
      • Identity Matrix (\(I\)): A square matrix with ones on the main diagonal and zeros everywhere else. It is the matrix equivalent of the number 1; multiplying any matrix by the identity matrix leaves it unchanged (\(AI = IA = A\)).
      • Diagonal Matrix: A square matrix where all off-diagonal elements are zero.

      The inverse of a square matrix \(A\), denoted \(A^{-1}\), is a matrix such that \(A^{-1}A = AA^{-1} = I\). The inverse "undoes" the transformation performed by \(A\). The inverse is a crucial concept because it allows us to solve systems of linear equations. For example, the equation \(Ac = d\) can be solved for \(c\) by \(c = A^{-1}d\). This concept underpins the closed-form solution for linear regression. Not all square matrices have an inverse; those that do not are called singular.

      • Orthogonal Matrix: A square matrix \(R\) whose columns (and rows) are orthonormal (unit vectors that are mutually perpendicular). Orthogonal matrices represent transformations that are pure rotations (or reflections) and preserve the lengths and angles of vectors. A key property is that their inverse is simply their transpose: \(R^{-1} = R^T\).

      Conclusion: The Language of Data

      Linear algebra provides the tools for representing and manipulating data in machine learning. Vectors and matrices give us a compact way to handle datasets and to define the mathematical operations at the heart of learning algorithms. Whether solving for weights in a linear regression model, representing connections in a neural network, or performing dimensionality reduction, the principles of linear algebra are the foundation upon which these methods are built.