Introduction to Networking and Client-Side Prediction in Unity
Link to Slides
It can be quite difficult to get used to how Unity handles networking, especially if you're already inexperienced in writing network code. This blog post is the companion to a talk I'm giving tonight (May 22nd, 2014) at Uken Games in Toronto.
Clearly, much has been written on the subject of introductory networking. This talk is not aimed to be comprehensive, but instead to provide context specifically for first steps in network programming for fast-paced games, in Unity, through the use of 2 case studies:
Using iPhones to wirelessly control an arcade game, SpaceRace
Implementing a 16-player, 4-computer art installation at OCAD called POINTS
The second goal is to give a primer on the backbone of fast-paced multiplayer games: client-side prediction, in the historical context of the development of QuakeWorld and Half-Life in the mid-late 90’s.
Relevant Unity Documentation
Network Reference Guide
Network View
Network Manager
Network Class
Relevant Basic Networking Concepts
Data is sent over networks in the form of packets - chunks of data, with a header added to the front to explain where the data came from, where it's going, etc. Packets can either be sent reliably using a protocol named Transmission Control Protocol (TCP), or unreliably using a protocol named User Datagram Protocol (UDP). UDP packets are sometimes referred to as datagrams.
Some important characteristics of reliability (and thus of TCP) are:
Ensuring that the packet arrives at all (no lost packets)
Ensuring that the packets arrive in the order that they were sent
Packets are received error-free, and resent otherwise
UDP makes no such guarantees. These characteristics seem pretty crucial, and for most applications, they are. You might wonder at first why you would ever deviate from using TCP, but there are some serious performance implications when attempting to use TCP for fast-paced applications. TCP is only as strong as its weakest link, as we will explore throughout this talk.
Networks are built up over the course of several different layers - from the physical wires all the way up to the application using the network to send information. UDP and TCP are somewhere in the middle, and you can't modify how they work in your code. Because TCP is so unsuited for fast-paced games, what you end up needing to do is use UDP, and implement the reliability part in your application instead of at the transmission layer. This is how Unity works - under the hood, it uses RakNet, which is an open source C++ network engine based entirely on UDP.
Note on Terminology
Since we're talking about networked programs, it means there will be at least 2 instances of the program running, presumably (although not necessarily) on different machines. In our context, also assume there is a server and a client. A lot of the time I will want to just reference any instance of the program on the network, regardless of whether or not it is server or client - I will call this a machine.
Unity Project Setup
When I first tried to make a networked Unity game, I assumed it would be best to separate the server and the client into different Unity projects. As it turns out, this doesn't work, because for tasks like instantiating prefabs over the network, Unity needs a local copy of the prefab. Because of this, Unity doesn't need to send all of the information defining the prefab, instead just an indication of which prefab to instantiate.
Instead, what you should have is one project, but different scenes for server and client.
Network Views
Network Views are core components in Unity’s networking system - in order to send data across the network, a GameObject is required to have a Network View attahced. They are uniquely identified by 128-bit IDs. IDs can't be specified completely manually - they need to be doled out by Unity's Network class. There are two ways to do this: using Network.Instantiate on a prefab that has a Network View component attached, or using Network.AllocateViewID.
The machine that calls Network.Instantiate on a prefab is the owner of the Network View attached to the instantiated GameObjects. In order for the the prefab to instantiated across the different machines, they must all have a copy of the prefab available locally. The concept of ownership will come up again later in the talk when we examine state synchronization.
Case Study 1 - iPhones as controllers for SpaceRace
SpaceRace is a fast-paced arcade game, where players jump, smash and dodge their way along a racetrack in space. I originally designed it for the HALIFAXMAXMACHINE 5005, a Winnitron that I released in Halifax in February. In order to fit 4 players on the 2 player arcade machine, I made sure SpaceRace could be controlled with 2 buttons. This resulted in the nice property that it can also be played on an iPhone.
When planning for GDC this year, I decided to show off SpaceRace in the format of a briefcase containing a projector, speakers, a laptop running the game, and 4 old iPhones as controllers. I tried to sort networking out at the last minute, in a couple of days before the conference, and made some crucial missteps that resulted in the controllers not working out quite the way I would have liked.
Original Setup: The phones all receive their touch input through Unity, parse that input into a bit string describing the current action (move left, move right, jump, smash left, smash right, or some combination), and then send an RPC to the server with that message (encoded as an int).
RPCs - One of Two Ways to Send Data Over the Network in Unity
Remote Procedure Calls are used to call methods across the network. They are best-suited for infrequent events, as they are sent reliably. There is no way to use RPCs to send data unreliably.
RPCs are simply methods that return void, accept arguments from a limited set of argument types (int, float, string, NetworkPlayer, NetworkViewID, Vector3, Quaternion), and are marked using an attribute as an RPC.
[RPC]
void ExampleRPC(string text) {
Debug.Log(text);
}
I originally found it unclear exactly how you were supposed to send RPCs. I knew you have to add an attribute to a method to tell it it's an RPC, and use networkView.RPC. I figured, since I was just sending the RPC to the server (using RPCMode.Server in the RPC call), the method only had to exist on the server. Not so - the method being called also has to exist on a script attached to the Network View that is sending the RPC. I ended up hacking the Server script, which had the RPC method I wanted to call, onto the client, and doing some runtime checks to determine whether it should act as the server or just be present to allow me to send the RPC. Obviously this is a bad approach, a better way to do it is to just put the RPC method into script that is separated from client- or server-side logic, and attach it on both ends. It won't actually get called on the client if you specify RPCMode.Server, so the implementation can remain simple.
Problems With RPCS in a Fast-Paced Context
When I first tested this setup out, it seemed to work great, although there was a perceptible lag that didn't make it feel quite as nice as using a controller. On top of that, there was an even bigger problem.
Sometimes the players would stop responding for a few seconds. This would completely disrupt the flow of the game, ultimately making it unplayable. What was going on? It had to do with the fact that I was sending data reliably, using RPCs. The problem with doing it that way is that in order to determine if a packet is dropped and needs to be resent, the sender has to wait a little while (being a couple of seconds or so, which is actually a long while when you're trying to send upwards of 15 packets a second). So, every time a packet got dropped, the player would stop responding. Clearly not the best approach.
I should have followed the KISS (Keep It Simple, Stupid) principle - trying to cram iPhones as controllers in at the last minute was over-ambitious.
Case Study 2 - POINTS
Immediately following GDC, I had the opportunity to make a networked game that would run on 4 floors of OCAD. I had a weekend to jam the game out, so it needed to be as simple as possible. The idea I came up with was this: POINTS is a 16-player button mashing game, 4 buttons per console, one console per team (and each team was on a different floor of OCAD) - every time your team presses a button, a pixel is drawn onto a canvas in your team’s color, and your team gets a point. When the canvas is filled, the round ends, and a warthog named POINZE announces the score for each team for this round, and then the all-time scores. By the end of the installation, the teams had scored a collective 682,853 POINTS, with the top team scoring over 295,000 POINTS.
Key POINT - Responsiveness is King
I knew that it the key to the game was making it feel responsive - if your team didn’t get a point the exact second you hit the button, it would never be satisfying to play. This meant that the client should never have to wait for the server to update the game state. Also, I didn’t want the game to lag because it was waiting to resend lost packets - I needed to use unreliable state synchronization.
State Synchronization - The Other Way to Send Data Over the Network in Unity
State Synchronization observes a component and sends information about it across the network at a specified rate. State Synchronization can either be sent to 'Off' (you can still use the Network View to send RPCs), 'Unreliable' (just like our discussion of UDP), or 'Reliable Delta Compressed' (data is sent reliably like in our discussion of TCP, furthermore data is only sent when something has changed, and its compressed upon send / decompressed on receipt). Unity tries to send data as many times you specify in the sendrate option of the Network Manager. The default sendrate is 15, i.e. Unity will attempt to synchronize state 15 times per second.
The Observed property is the component you want to send information about over the network. If the component is a Transform, Animation, or RigidBody, Unity handles the sending of the information + updating on the other end automatically. If the observed component is a script, then you have to implement the OnSerializeNetworkView MonoBehaviour callback to tell Unity what to do. This gets called according to the sendrate, as discussed earlier.
void OnSerializeNetworkView(BitStream, NetworkMessageInfo);
Understanding how to implement this callback is crucial to networking in Unity, but unfortunately is poorly described in the documentation. A Network View can either read or write, based on whether or not the machine in question owns the view. The BitStream object passed in allows you to determine whether or not you are reading or writing, based on the instance variables isReading and isWriting. It also includes a method, Serialize, whose behaviour is different depending on that context. If you're writing, stream.Serialize(x) will send x's value across the network to the corresponding Network Views on the other machines. If you're reading, stream.Serialize will read in the sent value. As a result, you don't necessarily need to check whether you're reading or writing, as in this example from the Unity docs illustrates.
void OnSerializeNetworkView(BitStream stream, NetworkMessageInfo info) {
var horizontalInput = Input.GetAxis("Horizontal");
stream.Serialize(horizontalInput);
}
On the writing end, we get the value for horizontalInput locally from Input.GetAxis, and send that across the network. On the reading end, we also get that value locally, but it is then overwritten by the Serialized value. I found that I needed to think about this a couple of times before it made sense. Using the booleans to separate the two possible cases is straightforward.
void OnSerializeNetworkView(BitStream stream, NetworkMessageInfo info) {
if (stream.isWriting) {
var horizontalInput = Input.GetAxis("Horizontal");
stream.Serialize(horizontalInput);
// potentially do other write-specific tasks
}
else {
stream.Serialize(horizontalInput);
// potentitally do other read-specific tasks
}
}
This code is functionally equivalent to the code without the booleans, but makes the context explicit.
How POINTS are Transferred
In order for the game to always feel responsive, POINTS is set up so that as soon as a client presses a button, the point counter (which is on the right side of the screen) is incremented and a sound is played. The client then sends how many points it now has to the server, unreliably, and if this number is greater than the number of points that the client had previously, the server allocates that many pixels on the canvas and sends the new representation back to the client. It’s a little bit of a magic trick - it feels responsive because the score changes immediately, even though the canvas isn’t updated til the server gets back to the client. It also makes it very easy for clients to cheat, but I was not concerned about security in such a simple / hilarious project. It also means that teams can score more points than there are pixels on the canvas - only a few more, as long as the network is running at a reasonable rate. I decided this wasn’t important. Easy, right?
POINTS was easy to manage, because of the way I set it up. I wasn’t trying to send an ordered sequence of events, that might get lost in transmission if a packet was dropped. Instead I just kept track of score on both ends, spammed the score at the server from the client, did a calculation on the server to see what had changed, and spammed the canvas back at the client, which never stored canvas state. This allowed me to use unreliable transmission without any fear of losing information. If I had used reliable transmission instead, every time a packet was lost (a relatively frequent occurence on the OCAD network), the canvas would have taken a few seconds to update. If I had had a different setup that required all the packets to arrive, in-order, and I wanted to use unreliable synchronization, I would have needed a bunch more code in order to ensure these properties. Let’s get into what a more complicated system might look like:
Client-Side Prediction
Client-Side Prediction is where the client in a networked game responds to user input immediately based on the current game state, ‘predicting’ the current world state before receiving a response from the server. Both the client and the server run the simulation to determine what should happen. If there is little lag on the network, then the client and server’s simulations should give reasonably close results. If there is significant enough lag to cause a difference between the client and server state, the server’s version of the story is seen as authoritative.
This is the backbone of all modern fast-paced multiplayer games, and was pioneered by John Carmack way back in the days of Quake. Originally Quake’s multiplayer used TCP and thin clients (the clients send input and where they’re looking, the server computes game state, and sends back a send of objects to render), but issues like the ones we discussed earlier were manifest. So, as outlined by John Carmack in a series of company memos in August 1996, he set out on a new path, with something he called QuakeWorld - his “exclusively for internet play” fork of the Quake codebase. His plans, like logging all of the frags on the internet and ranking players, now commonplace, were radical at the time. The following quote illustrates the context perfectly:
“If it looks feasable, I would like to see internet focused gaming become a justifiable biz direction for us. Its definately cool, but it is uncertain if people can actually make money at it.“
He mentions that his original design targeted <200ms latencies, an easy mistake to make due to the T1 connection he had to his house (in 1996!).
“The new code has the unreliable packet as its basic primitive, and all the complexities that entails is now visible to the main code instead of hidden under the net api. This is A Good Thing.”
In other words, he switched from using TCP to using UDP and implementing reliability in the application layer. He goes on to say that the biggest change is “the addition of client side movement simulation”. In this basic form of client-side prediction, only the player’s movement is simulated client-side, and nothing else. This results in anamolies like projectiles from your weapons being spawned from where you were instead of where you are. He mentions that it’s potentially desirable to predict more of the game in his notes, but it is a slippery slope (i.e. how many things do you end up having to predict?).
How The Simluation Works
Yahn Bernier released a paper in 2001 documenting the approach used for client-side prediction in Half-Life. Included in this paper is a good description of how the basic simulation works on the client, which he says is similar to how QuakeWorld functions.
Client samples input, populates a data structure that represents relevant state (where the player is looking, player velocity, which buttons were pressed, etc).
The data is stored locally with a timestamp, as well as sent off to the server.
Typically a client will go through many frames since the server last responded. The simulation is run every frame, starting from that last state that was sent from the server. The stored commands from all the previous frames are run up to the current frame’s command, producing the current predicted state.
The server, which has also been carrying out simulations, eventually sends a new response, giving the state it currently considers the game world to be in, and the client uses that as the new starting point for simulation.
He mentions that you have to be careful when running the commands over and over again, as you don’t want associated sounds playing a bunch of times. The solution to this is simple: only play the sound the first time a command is run.
Beyond Movement: Client-Side Prediction in Half-Life
Yahn goes on to explain that Half-Life also does weapon prediction, where firing, ammo, animation, etc are simulated on both client and server side. This improves the feeling of responsiveness that client-side prediction aims to provide. A more complicated subject is then approached: client-side prediction of other players. Two techniques are discussed: interpolation and extrapolation.
Extrapolation is where you simply assume that the player continues to move in the same way as they were last moving, and update their position accordingly. This doesn’t work very well, because players tend to move around in jerky, non-continuous ways. This is facilitated by the unrealistic physics that games tend to use as well, where the player is able to “apply unrealistic forces to create huge accelerations at arbitrary angles”. As a result, extrapolation is only useful for small time periods. QuakeWorld, which used this technique, limited extrapolation to 100ms.
Interpolation is an approach where a slight artificial delay is introduced into the game, in order to be able to render the game as an interpolation (in the math sense of the word) between the current frame and the previous frame while waiting for the next update from the server. The idea is that just as we get to the end of the interpolation, the latest update from the server comes in, and we can start interpolating towards it. If this runs smoothly, we aren’t forced to make up information about the gamestate, as we do in extrapolation. If there is lag and we miss a beat, we can fallback on extrapolation to keep the game moving, or simply keep everything still (causing a “stuttering” effect). Interpolation is the approach that Half-Life uses.
Lag Compensation is a server-side technique that Valve introduced to solve the problem that in a system using client-side prediction, players aren’t shooting directly at where the other players are represented on the screen, but instead where they think the other player will end up. Lag compensation allows a player to aim directly at another player on his screen, and shoot them as they see them, even if the simulation is further along than the player’s version of it. When the server receives a command, it goes back in time to what the world state would have been when that command was issued, and bases its results on that instead of on the most up-to-date state.
While this removes some paradoxes, it introduces others. The example Yahn gives is as follows:
“For instance, if a highly lagged player shoots at a less lagged player and scores a hit, it can appear to the less lagged player that the lagged player has somehow "shot around a corner". In this case, the lower lag player may have darted around a corner. But the lagged player is seeing everything in the past. To the lagged player, s/he has a direct line of sight to the other player. The player lines up the crosshairs and presses the fire button. In the meantime, the low lag player has run around a corner and maybe even crouched behind a crate. If the high lag player is sufficiently lagged, say 500 milliseconds or so, this scenario is quite possible. Then, when the lagged player's user command arrives at the server, the hiding player is transported backward in time and is hit. This is the extreme case, and in this case, the low ping player says that s/he was shot from around the corner. However, from the lagged player's point of view, they lined up their crosshairs on the other player and fired a direct hit. From a game design point of view, the decision for us was easy: let each individual player have completely responsive interaction with the world and his or her weapons.”
Ultimately, Valve concluded that the benefits outweighed the drawbacks, and the Half-Life engine used interpolation with lag compensation in production.
Conclusion + Next Steps
This talk was aimed at guiding basic competence with the core pieces of Unity’s networking system, and then giving a high-level overview of the type of work you’ll be looking at when creating a fast-paced network game. In order to implement client-side prediction, you will have to carefully consider the problem-space of your project - it might not be as complicated as an FPS like Half-Life or Quake. SpaceRace, for example, has a constant camera angle, so I never have to worry about a variable set of different things I could be observing and thus rendering - it’s constant. I would recommend starting with something similarly simple and building outwards from there. Yahn’s paper goes into more implementation detail, and would be a good starting point to really dig into code. Good luck!









