Dealing with scheduled jobs, the DevOps way
Finally, we have made the move.
The company transitioned, since Monday, to HipChat, and so far, it's fantastic. Of course, we get all the benefits it has to provide, and the fantastic integrations it has with many of the tools and services we are already using (Librato, Leftronic, Jenkins, etc), but some of our engineers already have found a way to make our life a tad bit more interesting.
In many of our games, we have what is called a league system: we essentially make a tournament schedule which will be played (in many cases, automatically based on what players have configured for a given match) at fixed time during the week.
We also have a whole bunch of subroutines based on a schedule, and those routines are really critical to the game experience, since they have a direct effect on gameplay. Problem is, what happens when they do not run? How can we follow up on how fast match processing routines actually run?
Of course, the first thing we did is set up some alerting through PagerDuty, but we felt like we could do a bit more to be able to follow up on the scheduled command and their execution.
So the first thing we did is integrate our scheduled routines themselves with Leftronic.
Whenever the "match" routine starts (what plays the matches which were scheduled at a given time), the "Match Progress" bar goes to zero, and then increments gradually as matches get played. Whenever the scheduled routine is completed, we add an entry to the "Past League Jobs", which shows which server got control of the operation.
The first time we saw that green bar go to zero and start to show progress, we all got excited about it.
A number of the people working on one of the game teams where we use a league system live then started to test HipChat to see if it would be a worthy tool to replace Skype (shrug). Of course, when they started to see Jenkins and PagerDuty notifications in there, light bulbs suddenly lit up: why not get start and completion notifications in our chat stream?
A few hours later, the deed was done, thanks to HipChat super simple API:
So, in a nutshell:
Our games register jobs for the league in Shokoti, a Node.js scheduling service we have developed tailored to our needs;
When the time comes, Shokoti calls back the game, and runs the job;
For matches, we reset the match progress to zero, and update it in real-time on our Leftronic operations dashboard;
For all jobs, we notify everyone through HipChat on job start and finish, and keep a recent schedule of past executed jobs on our dashboard;
This way of dealing with our scheduled job has made us very efficient - and most importantly aware of how things are going - when it comes to the execution of batched operations running according to a schedule.










