Eventual Consistency in Real-time Web Apps
When building a real time web app, one has to figure out how to manage data that is coming from two data sources - one being the response from HTTP requests and the other being data sent over a web socket. In this environment, you need to deal with a couple of different situations:
Merging data that comes over the web socket into what's already in memory.
Handling the case where data comes over the web socket while a request for the same piece of data was done over HTTP. For example, a PUT request to update a resource.
Getting the most up-to-date state of your resource after you have received an HTTP response but before you have subscribed to data received over the web socket.
These are all examples that I've dealt with myself in my current project and they all surround the same general problem:
How do you ensure that your local model is in sync with what's stored on the backend?
Before I get into how we deal with this problem, I need to describe how we interact with our backend:
When working with web socket data, we refer to the received payloads as notifications.
In our client, we figure out which part of the app should respond to notifications by subscribing a function to a topic key name. When a notification is received, it includes this key and we use it to execute all functions that have registered to this key.(1)
In our application, data sent via notifications/over a web socket takes precedence over data received from an HTTP response.
The key to solve this problem comes down to utilizing two algorithms on your base model object:
These two techniques effectively do the same exact thing - given an existing model object and a new piece of data, combine the new piece of data into the existing object. However, there is a key difference here in what the two methods do during this process:
For a merge, new data should be added to the existing object. When encountering collisions between the existing object and new data, the new data's value should override the existing data.
For a fill, new data should be added to the existing object. When encountering collisions between the existing object and new data, the existing data's value will NOT be modified.
Looking at an example, lets take two objects:
https://gist.github.com/strife25/b99601536ffa85a7cdf1
When performing a merge function on myModel, the following would occur:
https://gist.github.com/strife25/6b7be0fc7db361fc998d
When performing a fill function on myModel, the following would occur:
https://gist.github.com/strife25/4d2a2538503ff0573ae2
So how do we utilize these methods in our UI to account for HTTP requests and Web Socket notifications? Well, we usually utilize these methods on page load when the javascript objects in our app are created. During this instantiation, we need to GET the data we want to display from the backend, subscribe for notifications, and respond to any web socket notifications. Here is the usual code we end up writing to do all this:
https://gist.github.com/strife25/ae41e2742231579ad4e7
In this example, a lot is going on. The way I usually think about is that there are two parallel tasks going on during page load:
Load the initial data from the backend once the page is loaded.
Subscribe to web socket notifications.
Looking at task one (loading the data from the backend), everything should look familiar. We do a GET HTTP request and create a new UserModel object from the response. However, for some reason, we need to account for when the user variable is already instantiated and perform a fill. This is where task two comes in.
In task two, we are performing a few steps:
first we subscribe a function (onNotification) to a topic key ("users.{id}").
Once we are successfully subscribed, perform another GET request via loadUser() to account for the situation where notifications were sent during the subscription process.
When a notification is received, execute onNotification() to perform the merging of data from the web socket into the current UserModel may have already created.
The important step in task two is step 1 - subscribe to notifications. The moment that this step completes, the page is now allowed to respond to web socket notifications. This is a critical state because it may result in a situation where a notification is received over the web socket before the first GET request from loadUser() (task 1) has completed. In this situation, the notification's data doesn't have an issue - onNotification() executes, it sees that the user variable is undefined, so it creates a new instance of UserModel(). However, what should we do once the response from loadUser()'s GET request is received?
Looking at the done() function in loadUser() - the code will see that user already has a value, so it performs a fill(). The reason for using fill here is to ensure that we get eventual consistency in the app because it will not override the notification that was received. We do not perform a merge() function here because our app declares that notifications take precedence over HTTP responses for the same resource. This is because the data from the HTTP request may be stale by the time it is received after the notification. If we performed a merge here, that stale data would have overwritten the data received from the notification and our UI would become out of sync with the backend data. Performing a fill() will instead just add in any missing data attributes to the user value and not overwrite any existing data.
The last step of the subscription logic is step two of when we subscribe to notifications:
2) Once we are successfully subscribed, perform another GET request via loadUser() to account for the situation where notifications were sent during the subscription process.
This second request is used to handle the situation where the first call to loadUser() has completed and notification data was sent from our backend but before we have finished subscribing onNotification() to respond to the notification. What this means is that the backend data has been updated after the first GET request has completed, but the update message was lost. To account for this situation, we need to perform a second call to loadUser() to ensure that we have the most up-to-date state of the user model from the backend. Fortunately, if any notifications are sent during this second request, onNotification will respond accordingly and those new values will not get overwritten when the second HTTP response is received.
After all is said and done, once the data is loaded and the page has subscribed to user notifications, our user variable will eventually become consistent with the state stored on the backend of our web app.
In summary, when dealing with eventual consistency in your web apps, here are the techniques I have seen success with:
Utilize/implement merge and fill algorithms on your model objects to handle the updating of existing model data in your app.
Treat data received from multiple data sources (web sockets and HTTP requests in this case) as input methods to your page that run in parallel.
Declare that one of these data inputs take precedence over the others. When data is received from the higher priority source, perform a merge to combine with the existing data. All other data input sources should instead perform a fill for new data.
Underneath the covers, we are essentially using Publish-Subscribe.
Many thanks goes to Matt Cheely for teaching me these techniques.