Unit Conversion and Change of Coordinate Systems: A 2D Example
Last time, we learned how to use image differencing as a simple way to detect motion in a video feed. Now that we can identify which pixels in the picture contain motion, there is one more step we have to take in order to translate the locations of those pixels into the spatial coordinates of the game world.
Just like when we convert temperatures from Farenheit to Celsius, we have to establish a relationship between the two coordinate systems that allows us to convert from one to the other with a simple calculation. The tricky part with coordinate systems is that now we're dealing with multidimensional coordinates instead of just a single numeric value.
Let's say we have an image that is 160 pixels wide and 90 pixels high. In most representations as a data structure, the image is represented as a 2D array of pixels, so we can use the 2D indices of the pixels as if they were x and y coordinates. (If you are not familiar with 2D coordinate systems, think of it like the grid system in the board game Battleship: x tells us which column we're in and y tells us which row.) We'll say the lower-leftmost pixel is (0, 0) and the upper-rightmost pixel is (159, 89).
We'd like to map these pixel coordinates into the world of our video game. I mentioned in my last post that when dealing with 3D games we can employ some fundamental rendering concepts to get from a pixel on our screen to a spot in the 3D world, but for now let's stick to a simpler and more general example and say we're dealing with a 2D game with a camera that doesn't move and can always see the full span of the game world. Let's also say that out of a desire to help the game world fit to a widescreen image, the game world was made to be 160 units wide and 90 units high. Unlike the image's coordinate system, however the game world coordinates have (0,0) as the center of the world, with the lower-leftmost point being (-80, -45) and the upper-rightmost point being (80, 45).
As with any problem, there are multiple ways to conceptualize and approach it, but I like to think of this method as expressing the coordinates in terms of their magnitude of difference from the lower left most spot as an in-between to go from one system to the other.
In order to do this, the first thing we must take into account is the range of each coordinate system. This feels somewhat strange, because even though the pixel image has a width of 160 pixels, its range in coordinates is really only 159, because the difference in location from the leftmost coordinate to the rightmost coordinate is 159 – 0, or just 159 units. Similarly, the Y coordinate range is 89 – 0, or just 89.
For the game world coordinate system, the x range is 80- (-80) = 160, and the y range is 45 - (-45) = 90.
Now we have everything we need for the conversion. Let's do the x coordinates first. If we have a pixel in the image with the coordinates (pixelX, pixelY) then its corresponding coordinates in the game world should be (gameX, gameY) calculated as follows:
gameX = ((pixelX – pixelXMin) / pixelXRange) * gameXRange + gameXMin
The first step, subtracting the minimum x value from the pixel X value tells us the difference in distance in that coordinate system between our pixel's position on the x axis and the minimum possible x position it could have. Dividing that difference by the pixel x range expresses that distance as a percentage of the entire pixel x coordinate range. Multiplying by the game x coordinate range changes it from a percentage into a distance expressed in the game x coordinate system, and adding the game's minimum x coordinate value compensates for any minimum value offset.
The caluclations for gameY are similar:
gameY = ((pixelY – pixelYMin) / pixelYRange) * gameYRange + gameYMin
Let's try it out with the easy points: the lower left pixel (0, 0) would be:
((0 – 0)/ 159) * 160 + (-80) = -80 for gameX
((0 – 0)/ 89) * 90 + (-45) = -45 for gameY
so the lower left corner appears to convert properly from (0, 0) in the pixel coordinates to (-80, -45) in the game coordinates.
What about the upper right corner at the pixel coordinates of (159, 89)? For these I'll do them in multiple steps:
gameX = ((159 – 0)/ 159) * 160 + (-80)
gameY = ((89– 0)/ 89) * 90 + (-45)
and it looks like the upper right corner appears to convert correctly from (159, 89) to (80, 45) as well.
But we wrote the conversion equations using these point correspondences, so it's not surprising that they'd work. It's best to test other points in the middle, so let's do one more conversion in the center of the image. This would be (79.5, 44.5) in the pixel coordinate system. So, to figure out where that is in the game world's coordinates, we put it through our equations:
gameX = ((79.5 – 0) / 159) * 160 + (-80)
= (79.5 / 159) * 160 + (-80)
gameY = ((44.5 – 0) / 89) * 90 + (-45)
Thus, the middle coordinate of the pixel coordinate system (79.5, 44.5) converts precisely to (0, 0) in the game coordinate system the exact center.
In this way, we can convert the location of any pixel in our coordinate system into a location in the 2D space of our game. We can make our motion detection interactive by saying something like “There is motion detected in pixel (30, 50), so we should deal damage to the helicopter enemy at the corresponding location (-49.811, 5.562) in the game world.”
You may notice that there is some work that needs to be done with rounding. The pixel indices are all discrete integer values, but the game coordinates are continuous rational numbers. In order to account for the gap of space between pixels, we can't check to see if there are any interactable objects exactly at (-49.811, 5.562). What if the helicopter is at the game world coordinates (-50, 6) instead? We need to have a range of flexibility for the activity to fall into in order to trigger an interaction. Fortunately, that is part of collision checking and hit detection in general, which I will get into another time.