Image Differencing, Part 2: Matrix Subtraction
In my last Image Differencing post, I went over what a pixel was and how every digital image no matter how smooth and natural it looks is actually an orderly and discrete grid of pixels.
I ended with the remark that since each pixel holds a numeric value, we can also think of pixel images as matrices of numbers with which we can do all kinds of mathematical operations.
I gave the example of the green screen technique used by newscasters and movie stars to cut themselves out of a video and place their performance on top of a new background, but in fact you don’t need to be in front of a solid green screen to do that: you can achieve the same effect with Background Subtraction.
Consider this: imagine that you have a picture of yourself that you want to use in an art project. You’d like to edit that picture and make it look like you’re standing on the moon next to Neil Armstrong. The first thing you have to do is cut out your image from the rest of the room surrounding you, and that usually means a ton of zooming in and carefully erasing around yourself in an image editing program.
However, you also happen to have a picture of the exact same room, from the exact same angle, but without you in the shot.
Consider what the pixels look like in those images: remember that each pixel has a numeric value representing its color. That bookshelf on the left is the same in both pictures, and the bookshelf hasn’t moved or changed color, so that part of the image should have the exact same values for its pixels. This also holds true for the walls, furniture, everything in the room that didn’t move.
Remember also that the mathematical way to measure difference between two quantities is subtraction. (How far away is 10 feet from 3 feet? 10-3 = 7, so it is 7 feet away.)
Therefore, if we just wanted to take two pictures and ask “what is different between them?” all we have to do is a matrix subtraction with the two images. Like so:
The upper left image is a picture of the room without me in it. (The background in this case.) The upper middle image is a picture of me giving the peace sign in the same, unchanged room. The upper right image is the result of the matrix subtraction: the difference between the two pictures.
Anything that didn’t change ends up with a value of 0 (solid black) because any number minus itself is zero (that’s the Additive Inverse.)
Any pixels that did change between the two pictures are left with nonzero values and show up in the black and white difference as shades of gray.
Notice that the only things in the difference image are myself and the chair, because I wasn’t in the picture of the background, and the chair moved a little from me sitting in it. Everything else (the bookshelves, the ceiling, the lights) is all gone. We’re left with a pretty neat cutout of just me. That’s a very specific type of Image Differencing called Background Subtraction.
In the lower left, I’ve taken the difference image and made it into a binary image (either something did change and the corresponding pixel is solid white, or the pixel value was the same between pictures and it is solid black to indicate no change.) The lower right is that binary image cleaned up a little with an Open operation to fill in any holes. Don’t worry about those two for now: we’ll get into them later.
You can also use this same image differencing technique as a simple way to detect motion.
In this set, we’ve got a picture of me giving the peace sign, and another picture of me a second later having slightly moved my hand and kept still otherwise. You can see in the difference image that now the only pixels that changed were the pixels around the edges of my hand and a little bit of my shoulders. Therefore, since those parts of the image changed, you can say that movement took place in those pixels of the image! Neat, huh?
In this set of pictures, I started to open my hand from a peace sign into an open palm. You can see in the difference image that my thumb, my ring finger and my pinky finger moved a lot, but the other fingers and the rest of me moved only a little.
In this last set, I’ve shifted my entire body slightly to the left. Notice how the movement is more easily detected on the right side of my hair because it stands out against the more varied background of the bookshelf, while the movement of the left side of my hair wasn’t noticed because it’s almost the same color as shadows of the room behind it. Image differencing isn’t a perfect method to detect motion: you have to be well lit without shining any lights into the camera itself and you also have to stand out from your background for it to work, but it’s a shockingly simple application of matrix arithmetic that lets you do some really neat things!
So now we have an understanding of what makes up digital images, and how we can use the awesome power of math to calculate differences between two images. Nothing fancy going on here so far, but wait! Even if we know what’s moving in the picture, how do we apply that to what’s going on in a video game environment?
The answer to that is also shockingly simple, and is no different than figuring out how many feet are in a yard, or how hot a Fahrenheit temperature is in Celsius.
The pixels in an image have their own two-dimensional coordinate system, like a battleship grid. We can say that the lower-leftmost pixel in the image is (0,0), and the pixel to the right of it is (1,0), and the pixel above the lower-leftmost is (0,1), and so on.
The game world has its own coordinate system too. If you’ve done any work with rendering 3D models you may already be familiar with this, but all 3D games have their own system for taking coordinates of game objects in 3D space and figuring out where they’ll show up on the flat plane of pixels that is your computer screen. That, in essence, is the process of rendering!
We’re just going to need to reverse the normal rendering calculations and instead of going from 3D space coordinates to the flat pixel coordinates of your screen, we’ll be going from the flat pixel coordinates of our difference image to the 3D space coordinates of the game. That will be our topic for next time: the glamorous world of Unit Conversion and Change of Coordinate Systems!












