2026-02-26

matrix world

(w.i.p.) Purposeful re-learnings in linear algebra.

#work-in-progress#math#long

I watched a math video series.

Proving grounds for math typesetting and core competencies in AI.

Matrix world

i love matrices, they're facing me

song: LEARNING TIME - slowerpace 音楽

[experience the full animation on desktop]

Introduction: Why re-learn linear algebra for ai safety

I took linear algebra in university. The professor was writing the book at the time of the course. I was allowed to fail one class, so it worked out in the end.

My preference is to know ahead of time what I will be learning, and why I am learning it. How does this thing pertain to the ends that I am pursuing, and how do all the items of the syllabus pertain to this thing.

If someone had told me that linear algebra is the key to making AI work, I would have paid more attention. Nah, who am I kidding, I probably would have missed that detail.

It's also the key to having more spatial awareness, as well as making cool art, with the help of open source libraries and AI for the tedious work.

I am (re-)learning linear algebra in order to have a deeper understanding of artificial intelligence; that is, specifically Large Language Models (LLMs).

This knowledge yields a deeper understanding, because the Tensor object in PyTorch generalizes vectors and matrices, enabling researchers to perform large-scale matrix operations efficiently on GPUs. Under the hood, computations like gradient descent, backpropagation, and optimization are all orchestrated via linear transformations and their derived functions.

PyTorch is widely used in the space of "research", which includes AI safety research; thus, it is important to learn PyTorch; thus, it is important to (re-)learn linear algebra.

The "Essence of linear algebra" video series by 3Blue1Brown is highly recommended for the intuitive and visual approach to learning that it provides.

If you want to learn properly, go to my sources. These are simply my notes from the video series.

Hence, the very loose language and informal style.

0. Study tips

Pomodoro technique. 3Blue1Brown's videos are roughly 10 minutes each.

That is half a pomodoro. The other half can be spent writing notes and solidifying personal interpretation.

Then niksen. Then repeat.

Vamos a empezar.

1. Vectors

Vectors — to me — are ordered lists of numbers, because I align with the computer science perspective, a concrete perspective for our purposes.

Vectors represent, from the origin of the $x-y(-z)$ axis, a direction and magnitude. You'll often hear the terms vectors and matrices intermixed, at least from me, and when people are talking like human beings. Vectors typically imply motion, and the Matrix typically implies Keeanu Reeves.

Since we are only working up to the third dimension, it is easy to represent this using the Cartesian coordinate system.

The origin has zero coordinates:

\begin{pmatrix} 0 \\ 0 \end{pmatrix}

Vectors in two-dimensional space are represented with a pair of numbers $[x, y]$ where each number represents the displacement along the respective axes from the origin. The positive direction for $x$ is right. The positive direction for $y$ is up.

Here is a 2-D vector:

\begin{bmatrix} x \\ y \end{bmatrix}

Vectors in three-dimensional space are represented with a triplet written $[x, y, z]$ where $z$ is orthogonal (perpendicular) to the other two axes.

Here is a 3-D vector:

\begin{bmatrix} x \\ y \\ z \end{bmatrix}

Note: points are written $(x, y)$ in order to differentiate from vectors.

See my point?

\begin{pmatrix}x \\ y \end{pmatrix}

Where $x$ and $y$ can be any real number. $z$ too, so we may say that $x, y, z \in \mathbb{R}$ .

(Note: $\in$ means "in" as in... these variables can be ANY real number — so, anything EXCEPT FOR a complex number)

If we square $\mathbb{R}$ , which represents one-dimensional space, we arrive in two-dimensional space, represented by $\mathbb{R}^2$ .

If we cube $\mathbb{R}$ , we arrive in three-dimensional space, represented by $\mathbb{R}^3$ .

Thus, the shape of $\mathbb{R}^3$ is an infinite cube in all directions from the origin.

We won't get into higher dimensions than this, but mathematically, it may be said that, a vector in $\mathbb{R}^n$ is an ordered list of $n$ real numbers.

We use the term "ordered list" of numbers because "set" of numbers implies uniqueness. A vector is a list of components (numbers), where order matters.

Vector Addition

Adding two vectors combines them component-wise:

\vec{u} + \vec{v} = \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \end{bmatrix}

For example, $\vec{u} = [2, 3]$ and $\vec{v} = [1, -1]$ :

\vec{u} + \vec{v} = \begin{bmatrix} 2 \\ 3 \end{bmatrix} + \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 3 \\ 2 \end{bmatrix}

Vector addition is commutative $\vec{u} + \vec{v} = \vec{v} + \vec{u}$ and associative $(\vec{u} + \vec{v}) + \vec{w} = \vec{u} + (\vec{v} + \vec{w})$ .

Scalar Multiplication

Vector (scalar) multiplication magnifies a vector by a constant. These constants are called scalars because they scale vectors. Scalars are the only numbers in linear algebra — these terms are use interchangeably.

Multiplying a vector by a scalar $c \in \mathbb{R}$ scales each component:

c \cdot \vec{v} = c \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} c v_1 \\ c v_2 \end{bmatrix}

Scaling by $c = 2$ when $\vec{v} = [1, 3]$ :

c \cdot \vec{v} \implies 2 \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} 2 \cdot 1 \\ 2 \cdot 3 \end{bmatrix} = \begin{bmatrix} 2 \\ 6 \end{bmatrix}

Negative scalars reverse direction:

-2 \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} -2 \\ -6 \end{bmatrix}

Finally, $c = 0$ collapses the vector to the zero vector:

0 \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} = \vec{0}

Thus, there are only three outcomes from scalar multiplication:

Magnification of the vector
Inversion of the vector
$\vec{0}$

2. Linear combinations, span, and basis (unit) vectors

Vectors are stretched or squished by scalars.

A unit is a very basic thing. A unit vector is a very special thing. It is like the number one, but for vectors.

The unit vector $\hat{i}$ and the unit vector $\hat{j}$ .

The unit vector $\hat{k}$ is the unit vector in three-dimensional space; that is, it is the unit vector that represents $z$ .

Here is what they look like:

\hat{i} = \begin{bmatrix}1 \\ 0\end{bmatrix} ;\; \hat{j} = \begin{bmatrix}0 \\ 1\end{bmatrix}

The unit vectors $\hat{i}$ and $\hat{j}$ thus represent one positive step in each dimension available in two-dimensional space.

Taking this a step further, we see an un-interesting pattern emerge:

\hat{i} = \begin{bmatrix}1\end{bmatrix} ;\; \hat{j} = \begin{bmatrix}0 \\ 1\end{bmatrix} ;\; \hat{k} = \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix}

No further steps than that. This is enough to see that the unit vector of any given dimension should — in best practice and textbooks — represent a single step into that dimension.

We may scale that single step; that is to say, we may scale unit vectors, in order to arrive at any point in $\mathbb{R}^n$ for however many $n$ dimensions are available.

The unit vectors are special because they allow us to represent vectors in either of these ways:

4 \hat{i} + 2 \hat{j} \iff \begin{bmatrix}4 \\ 2\end{bmatrix}

Together, $\hat{i}$ and $\hat{j}$ are called the basis, because everything else is based on these two vectors, in two-dimensional space.

Every vector may be formed through scaling $\hat{i}$ and $\hat{j}$ .

The span is formed through all possible linear combinations of the two vectors:

\vec{v} = a \hat{i} + b \hat{j}

A span in $\mathbb{R}^2$ may be thought of as area.

A span in $\mathbb{R}^3$ may be thought of as volume.

Tune the span — tune the space.

I should also note that the unit vectors are unit vectors because they have magnitude one ( $\|\hat{i}\| = \|\hat{j}\| = 1$ ). They are also linearly independent; that is to say, you cannot find a scalar to magnify one vector unto the other:

\forall a \in \mathbb{R}, \; \vec{w} \neq a \vec{v}

Which is why they form the basis of separate dimensions.

3. Linear transformations and matrices

Transformations are movement. Movement from one space to another space.

That other space is stretchy by comparison, because in actuality, we are transforming the grid-space.

Or at least, that is just one perspective, that the grid itself is moving — the points are simply along for the ride.

Linear transformations must let lines be lines, and must not move the origin.

Grid lines remain parallel and evenly spaced as an effect; that is to say, the grid lines are not unevenly stretched, sheared, etc. The grid lines are transformed uniformly.

Here is an example of a linear transformation applied against $[-1, 2]$ , when $\text{Transformed } \hat{i} = [1, -2]$ and $\text{Transformed } \hat{j} = [3, 0]$ :

\begin{aligned} \text{Trans}&\text{formed } \vec{v} \\\\ &= -1 (\text{Transformed } \hat{i}) + 2 (\text{Transformed } \hat{j}) \\\\ &= -1 \begin{bmatrix}1 \\ -2\end{bmatrix} + 2 \begin{bmatrix}3 \\ 0\end{bmatrix} \\\\ &= \begin{bmatrix}-1 \\ 2\end{bmatrix} + \begin{bmatrix}6 \\ 0\end{bmatrix} \\\\ &= \begin{bmatrix} 5 \\ 2\end{bmatrix} \end{aligned}

So given a $\text{Transformed } \hat{i}$ and $\text{Transformed } \hat{j}$ , we are able to construct the formula:

\begin{bmatrix}x \\ y\end{bmatrix} \implies x \begin{bmatrix}1 \\ -2\end{bmatrix} + y \begin{bmatrix}3 \\ 0\end{bmatrix} = \begin{bmatrix}1 x + 3 y \\ -2 x + 0 y\end{bmatrix}

This means that, given some $x$ and $y$ in one space, we have a formula to find to where $x$ and $y$ have been moved. A succinct method of encoding these two vectors involves a 2x2 ("two-by-two") matrix:

\begin{bmatrix}1 & 3 \\ -2 & 0\end{bmatrix}

Allowing us to rewrite the previous formula:

\begin{bmatrix}x \\ y\end{bmatrix} \implies \begin{bmatrix}1 & 3 \\ -2 & 0\end{bmatrix}\begin{bmatrix}x \\ y\end{bmatrix} = \begin{bmatrix}1 x + 3 y \\ -2 x + 0 y\end{bmatrix}

Yielding the same effect. Thus, a linear transformation in two-dimensional space may be mathematically described as:

\begin{bmatrix}a & b \\ c & d\end{bmatrix}\begin{bmatrix}x \\ y\end{bmatrix} = \begin{bmatrix}a x + b y \\ c x + d y\end{bmatrix}

If these vectors are linearly dependent; that is to say:

\exists \; x, y \in \mathbb{R} \mid \vec{w} = a \hat{i} + b \hat{j}

In other words, if $\hat{i}$ can be magnified onto $\hat{j}$ , or, if $\hat{j}$ can be magnified onto $\hat{i}$ ; then there exists a one-dimensional span, whereby all of planar space is collapsed (onto a line).

4. Matrix multiplication as composition

Continuing the idea of a linear transformations; we may chain together linear transformations via composition matrices, which compose two or more separate linear transformations in the form of a matrix, as the name implies.

\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}\left(\begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix}x \\ y\end{bmatrix}\right)

Thankfully I am also studying Semitic languages at this moment; it helps with reading, because linear transformations in linear compositions are applied right-to-left.

This website is very useful for visualizing matrix multiplication.

For my own reference, here are some examples of rotations and shears (two common linear transformations).

Scaling

Preserve direction, changes scale.

\begin{bmatrix}s_x & 0 \\ 0 & s_y\end{bmatrix}

Rotations

Preserves scale, changes direction.

90° Clockwise Rotation:

\begin{bmatrix}0 & 1 \\ -1 & 0\end{bmatrix}

180° Clockwise Rotation:

\begin{bmatrix}-1 & 0 \\ 0 & -1\end{bmatrix}

270° Clockwise Rotation:

\begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}

Shears

Slides one of the unit vectors, shearing the area.

Horizontal shear:

\begin{bmatrix}1 & k \\ 0 & 1\end{bmatrix}

Vertical shear:

\begin{bmatrix}1 & 0 \\ k & 1\end{bmatrix}

Reflections

Useful for flipping the area, and self-improvement.

Reflection across the x-axis:

\begin{bmatrix}1 & 0 \\ 0 & -1\end{bmatrix}

Reflection across the y-axis:

\begin{bmatrix}-1 & 0 \\ 0 & 1\end{bmatrix}

Therefore, the above linear composition applies a 270° clockwise rotation, then a horizontal shear.

These are composition matrix; that is, a chain of linear transformations applied to a vector? Applied to a grid-space?

You decide.

When applying these linear transformations, we simply apply matrix multiplication:

\begin{bmatrix} a & b \\ c & d\end{bmatrix}\begin{bmatrix}e & f \\ g & h\end{bmatrix} = \begin{bmatrix}ae + bg & af + bh \\ ce + dg & cf + dh \end{bmatrix}

Simply is an understatement. A pneumonic that I have devised to remember the row/column business (apart from row/column itself tells the row of the first matrix, multiplied by column the second matrix):

First marches forward
Second slides downward

The row of the first matrix is multiplied by the column of the second matrix, and if there are variables, the calculation results in a number at the intersection of that row and column in a square, $m \times m$ matrix.

The inner dimensions between two matrices must match in order to apply linear transformations, perform matrix multiplication; that is to say, matrix multiplication can only take place between an $m \times n$ matrix and an $n \times p$ matrix, because the inner dimensions match on $n$ .

Matrix multiplication is always applied this way — this is imperative to understand, because:

\mathbb{M}_1 \mathbb{M}_2 \neq \mathbb{M}_2 \mathbb{M}_1

Changing the order of the matrices in matrix multiplication, changes the overall composition of the transformations.

5. Three-dimensional linear transformations

Now for extending the composition of linear transformations into three-dimensional space.

Let's not take longer than we have to — this place hurts my head.

Simply adding one extra direction introduces much complexity, but introduces no more complication than before, when we were working in two-dimensional space.

Behold, a linear transformation in three-dimensional space:

\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} ax + by + cz \\ dx + ey + fz \\ gx + hy + iz \end{bmatrix}

6. The determinant

All this talk of scaling an area, all this talk of scaling a space — but by what factor?

And what if we're interested in the factor after applying a linear transformation which does more than just scale? (e.g. shearing)

The determinant of a transformation tells us by what factor an area is scaled as result of said transformation.

The determinant of a $2 \times 2$ matrix:

\begin{bmatrix}a & b \\ c & d\end{bmatrix} = a d - b c

The determinant of a $3 \times 3$ matrix:

\begin{bmatrix}a & b & c \\ d & e & f \\ g & h & i\end{bmatrix} = a e i + b f g + c d h - c e g - b d i - a f h

These determinant formulas are themselves determined from the Leibniz formula for determinants.

When the output of the determinant function is zero, then the given linear transformation will collapse a space below the present dimension.

When the output of the determinant function is negative, then the given linear transformation will invert a space — a space, defined by a given, input matrix.

The determinant of a transformation in three-dimensional space tells us by what factor a volume is scaled as a result of said transformation.

Multiplying the amount that one matrix scales (the determinant) by the amount that another matrix scales is mathematically the same as calculating the amount scaled as a result of sequentially applying the two linear transformations (matrices):

\text{det}\left(\mathbb{M_1}\mathbb{M_2}\right) = \text{det}\left(\mathbb{M_1}\right) \text{det}\left(\mathbb{M_2}\right)

The glob, the shape of a matrix in my mind — is transformation, space, movement.

7. Inverse matrices, column space and null space

Systems of equations can be represented as a combination of linear transformations.

This system of equations:

2 x + 5 y + 3 z = -3 \\ 4 x + 8 z = 0 \\ x + 3 y = 2

Can also be written this way:

2 x + 5 y + 3 z = -3 \\ 4 x + 0 y + 8 z = 0 \\ 1 x + 3 y + 0 z = 2

Can be expressed as a series of matrices:

\begin{bmatrix}2 & 5 & 3 \\ 4 & 0 & 8 \\ 1 & 3 & 0\end{bmatrix} \begin{bmatrix}x \\ y \\ z\end{bmatrix} = \begin{bmatrix}-3 \\ 0 \\ 2\end{bmatrix}

I should add that this is intended for systems of linear equations.

You won't see anything like $x^2$ , $y^2$ , or $z^2$ in linear equations.

No functions like $\cos(x)$ either.

Linear equations are useful in the form of $A \vec{x} = \vec{v}$

If we could play this computation in reverse, we could retrieve $x$ , $y$ , and $z$ — and we can.

Such that — by morphing space, and remembering that linear transformations in linear compositions are applied right-to-left, we may deduce $\vec{x}$ from the knowns.

Morphing backwards in space, otherwise known as applying the inverse transformation, looks something like this:

A^{-1} A

For a more concrete example, let's reference the 90° Clockwise Rotation example from before, and the application of the inverse transformation:

\begin{bmatrix}0 & 1 \\ -1 & 0\end{bmatrix}^{-1} \implies \begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}

But how do we know the inverse for any given linear transformation? By doing nothing. I'm serious.

The matrix that does nothing is the identity matrix, much like the number $1$ in multiplication. When the identity matrix:

\begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}

— is used, it leaves $\hat{i}$ and $\hat{j}$ where they are.

Once you find the inverse of $A$ , it may be used to find $\vec{x}$ :

A^{-1} A \vec{x} = A^{-1} \vec{v} \implies \vec{x} = A^{-1} \vec{v}

This is because anything multiplied by $1$ , is itself. So at any point, we are allowed to multiply any coefficient by the identity, $1$ . The inverse transformation works like the inverse exponent in regular maths; that is to say, if we have the number, say $3$ , and multiply $3$ by $3^{-1}$ , or $\frac{1}{3}$ , then we end up with one.

I'm quite bad at this. Behold, a very loose analogy:

1 = \frac{1}{3} 3 = 3^{-1} 3 \rightsquigarrow A^{-1} A = 1

To recap, we take known variables in the matrix $A$ , utilize a computer/keen intuition to find $A^{-1}$ , which is then utilized to find $\vec{x}$ .

People don't understand — there's a lot that goes into doing nothing.

A loss of information occurs when $\text{det}(A) = 0$ . Once a matrix has been squished into a lower dimension, there exists no inverse.

In the case of three-dimensional space, we have the luxury of losing information twice without knowing, when $\text{det}(A) = 0$ . This is because the determinant alone does not signify the number of dimensions, or rank, of the transformation.

Predictably, $\text{Rank} \; 0$ is a point, and represents nothingness, total collapse; $\text{Rank} \; 1$ is a line, and represents one-dimensional space; $\text{Rank} \; 2$ is a plane, and represents two-dimensional space; $\text{Rank} \; 3$ is a volume-spanning subspace, or simply, space, and represents three-dimensional space.

Rank is the number of dimensions in the column space, or the span of the columns of the matrix $A$ .

The zero vector, $[0, 0]$ , is always in the column space.

Full rank means that the output space still has all the dimensions it started with.

The null space, also called kernel, is the space of all vectors that become null, or zero, through the applied linear transformation.

Concretely, this means that like setting $y = 0$ in $y = m x + b$ , such that $0 = m x + b$ gives a point at the solution, the null space in the following, familiar equation:

A \vec{v} = \begin{bmatrix}0 \\ 0\end{bmatrix}

— forms a span at the solution(s).

Column space helps us know when a solution exists.

Kernel helps us know the set of all possible solutions.

8. Non-square matrices as transformations between dimensions

Non-square matrices are portals between real-space, speaking loosely.

For our purposes, a non-square matrix is any $m \times n$ matrix where $m \neq n$ , which is only two matrices: $2 \times 3$ and $3 \times 2$ .

Because we work primarily in square matrices, a $2 \times 3$ non-square matrix represents transformation from three-dimensional space to two-dimensional space, and a $3 \times 2$ non-square matrix represents transformation from two-dimensional space to three-dimensional space.

9. Dot products and duality

The dot product produces a scalar from two vectors:

\vec{u} \cdot \vec{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n

Geometrically it encodes the angle $\theta$ between the vectors:

\vec{u} \cdot \vec{v} = \|\vec{u}\| \, \|\vec{v}\| \cos\theta

When $\vec{u} \cdot \vec{v} = 0$ the vectors are orthogonal (perpendicular), because the dot product measures how much the two vectors point in the same direction. No overlap yields a zero value.

Duality in mathematics describes the phenomenon that one concept often has dual, or multiple, different applications. Like taijitu, mathematical duality reveals how opposing concepts reflect and sustain one another, exposing an underlying symmetry that unites what first appears divided.

10. Cross products

11. Cross products in the light of linear transformations

12. Cramer's rule, explained geometrically

13. Change of basis

Given bases $\begin{bmatrix}2 \\ 1\end{bmatrix}$ and $\begin{bmatrix}-1 \\ 1\end{bmatrix}$ and target vector $\begin{bmatrix}-1 \\ 2\end{bmatrix}$ , translation between coordinate systems is possible:

-1 \begin{bmatrix}2 \\ 1\end{bmatrix} + 2 \begin{bmatrix}-1 \\ 1\end{bmatrix} = \begin{bmatrix}-4 \\ 1\end{bmatrix}

14. Eigenvector and eigenvalues

15. A quick trick for computing eigenvalues

16. Abstract vector spaces

Trivia

The title of this article refers to the Wikipedia article entitled "Matrix decomposition", which references an image entitled "Matrix World", which was useful to me — a visual thinker. The canvas artwork is also entitled "Matrix world", but with sentence casing to comply with corporate style guidelines.

I thought the grids in the video "Inverse matrices, column space and null space | Chapter 7, Essence of linear algebra" looked pretty cool, so I decided to create my own rendition of it here using Three.js drawn on an html canvas.

The color scheme was inspired, roughly, by the logo for the musical, Joseph and the Amazing Technicolor Dreamcoat, for which I played "left-hand piano" (the bass part (which inspired me to learn the ACTUAL upright bass + bass guitar [Thanks, Pat])) during freshman year in public high school. I don't know how that happened outside a private school. Anyways, I hope you've enjoyed.