Eagle eyed depressionbot users (of which there are many) would have noticed the glaring errors issued by express when trying to access userlists views on the web interface earlier this week.
Following the solve, I thought it might be interesting enough for a blog post and provide a nice follow up from my introduction to the stats module. If you haven't read it already, you might want to consider reading that post for a brief overview of how the stats module works first, for some background.
TypeError: Object #<Object> has no method 'get' at api.getChanStats.reply.request.server at pages./user/:connection/:channel
So back to the story at hand; as we have enough context to now deduce that the error definitely seemed to be caused by one of these fields. A quick inspection of the error and stack trace provided by express when trying to render the channel stats view would seem to have indicated that these fields is missing its "get" function. Not quite realizing the full extent of the bug, @reality switched the bot to debug mode (via a nifty command that I'm sure he'll tell you about some time) while I grappled to run the most basic stats command to check the module was still alive and sure enough it spewed the same error in to the channel.
> - Message: TypeError: Object #<Object> has no method 'get'
Surely they can't be missing? Why would this suddenly appear now, I hadn't pushed anything new to stats since a language update at the end of the last month, so nothing has changed with regards to the way the module's structures are laid out or loaded, and I hadn't touched the user and channel templates since I added a seven day rolling frequency field for channels almost a month prior. ~ajs however, revealed that they really were missing and I was now compelled to find out why.
~ajs Object.keys(dbot.db.userStats.aberwiki.sam["#oaosidl"].lines)
> name,data,time,format,getRaw,get,__add,add,merge,__merge,toString
~ajs Object.keys(dbot.db.chanStats.aberwiki["#42"].lines)
So evidently the onLoad was correctly attaching the additional functions to each field in the user objects, but for some reason it seemed almost as though it were skipping over the loop for the chanStats store. Obviously leaving these fields without their enhancing functionality, they were left dumb and unable to respond to the likes of the get and add functions expected of them.
Loading up my local version of depressionbot and connecting to the sever, I couldn't seem to reproduce the problem. stats commands worked and the web interface appeared in all of its d3-and-datatables glory. Perhaps my repositories were out of date? They were, which is unsurprising given the pace of depressionbot development lately. Yet I still couldn't reproduce the errors, did the current instance of depressionbot @reality was running differ in some way from the repo? We'd already established that another instance of the software was not suffering problems either. Confirming that depressionbot functioned fine with a clean database, I requested a copy of the prestigious dbot-prime database and was promptly mailed a half megabyte json file and instantly the error was reproducible. The issue was the database?!
We waded through the length channel stats objects that had been serialized in to the json file but could not spot anything that seemed out of place at all. Obviously the file couldn't be corrupted or invalid (damn commas) as the bot would fail to load at all, so the issue must be far more subtle. A quick and dirty naive debugging attempt placed a console.log line immediately after the completion of the userStats loop in the module's onLoad.
The issue was not with the channel objects at all, the onLoad was aborting silently whilst processing the user objects all this time. But how was that possible? All the users appeared to be loading fine and functions that relied purely on userStats objects completed successfully. If it was aborting, it didn't seem to have any impact on the functionality of user statistics. Continuing my leisurely printing style debugging, I moved the console.log back a few lines, in to the heart of the travesty that is the userStats onLoad megaloop, this time printing the components of the "address" to the current user object field we were trying to apply the model template to; the server, user, channel and field name itself.
As various combinations of rooms and names flicked past I was about to deem the venture as a useless endeavour and at this point was happily considering "just nuking" the database so we could all get on with our evening.
Until I noticed one line of output.
aberwiki #oaosidl hootle_testing data
Data? But the user template definitely doesn't specify any field called data? Immediately I realized what had happened and what I had done.
Days before, I was poking around the database with ~ajs to try and find a solution for another bug (looking at the length of this post already, definitely a story for another day) but the rough gist is when a user has two nicks that the bot sees as separate (either because the bot was not in channel at the time they changed to a new nick, or they connected to IRC with a new nick from usual), the user will also have two separate stats profiles (ie. the two different keys under different names in userStats, not to be confused with the profiles module, although they'll have two of those as well!). It's expected that if the user were to then change from one of these "primary" nicks to the other, the users module merges all the aliases for both of those users together; the bot now sees those two names as the same person, but under some specific circumstances, the stats module will fail to merge those two stats profiles together, leaving both "primary" nicks in the database.
I had successfully reproduced this with a user called hootle_testing; joining the server under this new name, saying hello (to initiate a new stats profile), leaving and then changing my main nick to hootle_testing and back to Sam. Whilst as far as users was concerned, both nicks were the same person, stats thought differently. I tried to force a merge by calling the stats API function directly;
~ajs dbot.api.stats.mergeStats("aberwiki", "hootle_testing")
Message: TypeError: Cannot convert null to object
But the error message was rather useless, what was null? This could be anything! This route was clearly futile, the API was doing its best to ensure data integrity by not letting whatever error was occurring cause harm to the stored data or the calling function, it was probably returning a null object to represent failure. To provoke an error we'd have to go deeper (statsception?).
I circumvented my own API and tried to invoke the merge functions on the fields myself;
~ajs dbot.db.userStats.aberwiki.sam["#oaosidl"].words.merge(dbot.db.userStats.aberwiki.hootle_testing["#oaosidl"].words)
To no effect. No errors appeared at all. The merge for that particular field had worked, the ~words command (which was still working as it relies on only user object data) was showing my word count had increased by the count stored for hootle_testing. I had a feeling the field at fault was out_mentions whose data property is an empty object initially (as you've mentioned no users). I probed at the data property for the hootle_testing's out_mentions field and the bot issued no reply. It seemed to disagree that there was an object (albeit an empty one) there.
~ajs Object.keys(dbot.db.userStats.aberwiki.hootle_testing["#oaosidl"].out_mentions.data)
In hindsight, I realize that of course there would be no keys inside an empty object and that I should have not been checking with Object.keys. However the worse mistakes was yet to come, when I tried to initialize the data object that I didn't think was there.
~ajs dbot.db.userStats.aberwiki.hootle_testing["#oaosidl"].data = {}
I had forgotten the out_mentions field key completely and instead created a new, empty field called data. My debugging on this occasion was of course a failure, I was unable to find the issue as I'd missed my own mistake. Not realizing I'd added an entirely new field to what was otherwise a perfectly structured although unmergeable user object, I figured I'd come back to the bug in future when more evidence could present itself as to where to look.
This extraneous field easily explains the result from the loop. Of course it failed, the model user object structure had no idea what the "data" field was, thus attempting to access properties such as .toString() that are expected from the model crashes and burns, throwing us out of the loop and out of the onLoad. As it turns out the bot's reloading mechanism elects to quietly add errors to the module's status variable rather than report them on the console.
~ajs delete(dbot.db.userStats.aberwiki.hootle_testing)
All it took was removing the user in question (in fact just the data field would have been sufficient, but there was no need to keep this test user in the database - I don't like inconsistent data!)
But why did it take so long to notice and why was the solution not so obvious even when we knew it was the database at fault?
As explained, the user functionality worked fine, for every single user on the server! Surely if there was incorrect data or corruption, it would at least prevent stats (or even depressionbot itself) from working correctly, and yet all statistics for users were still being incremented, commands that did not require input from channel data (for example ~lines [user] will calculate [user] lines / total channel lines, and thus failed) worked and returned correct results. So initial theories led us to believe there must have been something in the chanStats store preventing those channels from loading and the lack of an onLoad error on the console misled me to believe that the chanStats loading loop must have been executed but was somehow failing to attach the additional functionality to the data fields, rather than actually aborting with an error.
Finally, how did it not affect other users?
As it turns out, hootle_testing was serialized as the last user in the userStats.aberwiki object, not only that but the mistaken "data" data field I had added was the last field to be serialized in the hootle_testing["#oaosidl"] user object, allowing the onLoad loop to successfully load all the additional structure data for all of the user objects on the server including those for hootle_testing before that spurious data field caused a silent assassination of the onLoad before the chanStats loop. Sneaky!