Mathblr! I need your help.
What are your favorite texts about sharing mathematical research/giving good math talks???

seen from Germany

seen from Malaysia

seen from Argentina
seen from New Zealand

seen from United States

seen from United States

seen from United States

seen from Malaysia
seen from Switzerland

seen from Malaysia
seen from China
seen from United Kingdom
seen from Hong Kong SAR China

seen from United States
seen from China
seen from Netherlands
seen from Japan
seen from China

seen from Malaysia
seen from Singapore
Mathblr! I need your help.
What are your favorite texts about sharing mathematical research/giving good math talks???
The way they teach Gram Schmidt Orthonormalisation (referred to as GS henceforth) is actually criminal. It’s not a plug and chug algorithm, it’s actually very intuitive. Also apologies to anyone who doesn’t know what GS is or does, but that is beyond the scope of a single tumblr post. I would point towards my blog, but that would doxx me, so I won’t do that.
As an EE student, I’ve used a lot of GS. Every single semester, there’s at least one subject that requires it (after the general first year, though Lin Alg was the first place I learnt it), especially my comms classes. While I’m not sure, I’d assume it’s used all throughout engineering, since so much of engg can be reduced to linalg + probability.
In first year—when I first learnt it—I hated it. It felt confusing and kinda useless? One of the problems with learning these concepts devoid of application I guess. It came up in numerical methods with QR decomposition, but it still felt like fluff. It wasn’t until 3rd year ADC (Analog and Digital Comms) that it actually clicked in my head. It only did that because we were finally using real signals versus abstract quantities, so we were forced to see what it was doing. However, it is entirely possible to gain the required intuition with vectors, so I’ll be doing that because I don’t want to explain the inner product in detail.
***
The point of Gram Schmidt *Orthonormalisation* is pretty self evident if you know what orthogonal and normalisation mean (in terms of vectors). I will continue with the assumption that you do know what those are, and what projection is.
So the main idea behind GS is to take a set of vectors and produce an orthonormal basis for them. Thats a lot of word salad. What does it actually mean? Well, what we have are a bunch of arbitrary vectors. That means they look something like {[1,2,3], [-3, 2, 5], …}, vectors that point every which direction, and it’ll give you a basis for those vectors that is a) orthogonal, so that each vector is completely independent from all the others, and b) normal, so that projections are easy.
I think the bit that never got into my head was why it’s so important. Can’t we just use the trivial basis (i.e. {[1,0,0], [0,1,0], [0,0,1]}? That’s already orthonormal, and our vectors are already in that form.
The issue is that the trivial basis is often *not aligned super well with the problem we’re trying to solve.*
Imagine we have some vectors that represent velocity of a body. It’s meant to go in a straight line, but due to wind or other disturbances, it can deviate a bit from the line.
It’s important to remember that our trivial basis *can* describe this. GS doesn’t make any new information, it just makes it easier to spot.
Let’s take it in 2D, where our velocity vectors are {[0.5, 0.5], [0.4, 0.6], [0.3, 0.7]}. We can see that the velocity is aligned with that first vector, with some deviations. This could be because of drag, air resistance, irregularities in how the barrel we’re shooting a projectile out of, or any number of external forces/disturbances. However, we don’t really know how much each of those forces are actually contributing to the motion of the body.
How can we quantify any of this? Well, an approach might be to look at the difference between velocity vectors.
Taking the first vector as our reference, we get {[0,0], [-0.1, 0.1], [0, 0.2]}, from this, we can see that there *seems* to be a pattern.
But what does that pattern represent? Our measurement vectors are aligned to *some* frame, but that frame does not necessarily show us what actual directions are important.
The issue right now is that our measurements have two components. The main trend, and the disturbances. Our standard basis cannot represent this well, there’s a lot of overlap between the two. What we want to do is remove the overlap.
To remove the main trend and see what else is going on, GS comes pretty handy.
Our first step is to define the main trend. Again, we’ll take our first vector (calling the three vectors as u1, u2, and u3 from now) as reference. Its direction is the normalised vector in that direction (calling direction vectors v).
v1 = u1/||u1|| = [1/sqrt(2), 1/sqrt(2)] = [0.707, 0.707]
Now what we do is take our reference direction and see what new direction our next vector introduces.
u2 -> u2 - (u2•v1)v1
Let me explain this step a little more in detail. u2•v1 is the projection of u2 along the direction of v1. That is, how much of u2 is pointing in the same direction of v1. Now, if we subtract that portion from u2, we get the component of u2 that *isn’t* pointing in the direction of v1. It’s pointing in a *new direction*.
u2 -> [0.4,0.6] - 0.707[0.707, 0.707]
= [-0.1, 0.1]
Now normalising it, we get:
v2 = [-0.707, 0.707]
That’s the new direction introduced by the second measurement.
Now let’s look at the third vector. We’re going to do the same thing. See u3, and remove the components of u3 that align with v1 and now v2.
u3 -> u3 - (u3•v1)v1 - (u3•v2)v2
≈ 0
This means that u3 doesn’t contribute any new information about directions, its information is completely explained by the other two directions.
So our new basis looks like {[0.707,0.707], [-0.707, 0.707]}
The two vectors are orthogonal and normal. So, finding the original vectors in this new basis is as easy as adding the projections in each direction.
So:
ui = (ui•v1)v1 + (ui•v2)v2 =>
u1 = 0.707v1 + 0v2
u2 = 0.707v1 + 0.141v2
u3 = 0.707v1 + 0.282v2
Now it’s much easier to see how the vectors are changing. v1 is staying the same, while v2 is linearly increasing. *This strongly suggests it’s not random and we should try to figure out what it is*.
This is exactly why we shouldn’t use the standard basis. It hid the structure behind the system in the coordinates, since they were coupled. Using the basis we got from GS, it becomes immediately obvious what’s going on.
Like I said earlier, Gram Schmidt doesn’t create any new information, it just makes information easier to understand by giving us a coordinate system that’s tailored to the structure of the system. It stops different pieces of information from talking over each other.
***
I hope you now understand the intuition behind Gram Schmidt Orthonormalisation. It’s about finding all the directions in which data changes wrt a reference so that the numbers are more aligned with the problem at hand. I’ll get down to writing a more comprehensive blog post about it someday. Maybe we can go into least-squares or QR decomposition, since those are used *everywhere*. But for now, that’s it. Thank you for reading!
PS. I’m very new to this, and doing it on mobile, so I apologise if the formatting is off. I don’t think the markdown will render, but oh well. I’m too lazy to go back and correct it all now