The Practicality of Sensu
If Jan Brady were a Linux SysOps, she wouldâve said âMonitoring, Monitoring, Monitoringâ.
Everyone is about monitoring. EVERYONE. There are a ton of great articles on the topic, my favorite being from Ian Unruh from his Monitor Everything posts which gives a good idea how to structure your data, centralize your logging, and level the intensity of the logging across your infrastructure.
From my experience as a DevOps engineer, I have used Sensu, Graphite (with Grafana), PagerDuty, DataDog, SumoLogic, and some other collectors like CollectD and StatsD. This experience covers logging, monitoring, alerting, and the actions of alerting.
Today, though, I am going to talk about Sensu but in a very practical way. A lot of this is going to be a bit of a how-to, not so much how to install, but how Sensu functions and how to work with it. Their documentation is great, but I always felt that I have a very specific question in regards to their syntax and that it wasnât documented. Now, Sensu is great, I hang out in their IRC Channel and the people in there are very very helpful. Sean Porter (@portertech), one who created Sensu, has answered questions directly. They are all great. But, that does not mean that some things can go unsaid. So, thatâs what this post is about.
For those of you who do not know, I spent 7 years in IT and I am somewhat new to the Linux SysAdmin/DevOps world. I love it. IF you are in IT, there are a lot of Linux solutions that can translate over and I think Sensu is one of them. But, before we get started here are somethings to note:
I am not going to go into how to install Sensu. Not only did I cover this on Centos7 but, realistically, Sensuâs documentation on installation is amazing. The team keeps their Chef Cookbooks and Puppet Manifests up-to-date regularly.
These talking points are just that, talking points. I will give you my best ideas and opinions. I am new to a lot of this so my ideas may be good, but might be inaccurate. My point is for you to learn from my ignorance. ;-)
And that youâve at least perused the documentation. Listen, Iâve only read it because Iâve skimmed it enough times. ;-). I also work for a team that believes in RTFM. So, thereâs that.
Structured Data from the Beginning
From the start, when you are deploying your Sensu server and clients, you need to make sure that you come up with a logical schema for which checks are run on different machines. Think of subscriptions as groups, you want a âcommonâ group, and an $application group. For example, you will want to have a bunch of basic checks that are ran on every machine: (e.g.) cpu usage, memory, disk space, if your conf. management client is running (chef/puppet/etc.), etc. But, if you have web-application servers, you will not want all the nodes to check to see if HTTPD is running. Why? Outside of the obvious answer, the cleanup for removing those checks is a pain. I was testing a check in the âcommonâ subscription group and it got pushed to every node with the âcommonâ subscription tag. Dumb.
Everyone is going to do it differently, but the naming convention for subscriptions I chose relate to the higher puppet roles/profiles that we have setting up the system. In chef, it would be based on the groups. So, every node gets a âcommonâ subscriber but then a âroleâ subscription. So, if I have a bunch of web nodes, then those servers in that role would have a âwebnodeâ subscriber. This makes it easier to configure the checks and deploy them across all the nodes. Now, I work in a masterless puppet system, so every node gets the entire manifest. Pushing all the nodes with all the checks does no real damage because one server will only use the checks associated with itâs subscription.
Now, you can change the naming convention (and it will be easier to change using a CM) but getting most of the things separated at the beginning is easier than doing it later. Trust me.
The other way that I keep my data structured, is I create subdirectories in /etc/sensu/conf.d/ for each different part of the application whether it be checks, handlers, or other tasks. In /etc/sensu the default is that the subfolders handlers, which have the action-based scripts, and plugins, which have all the ruby/perl/bash that are used for the checks themselves. In my configuration, the /etc/sensu/conf.d/ is where the checks folder has all of the defined checks for the system; where the handler folder has the json formats for what do do with the upper-level handlers, and some other subfolders. For example, I have an âalertâ folder, which has all of the json files for alerting-based mechanisms. So, that is where I have my hipchat json defined, my email, my sms, etc.
Personally, I donât like a subfolder with a mishmash of config files. I want to keep my infrastructure clean so that anyone can come on board and logically peruse the file tree and figure out where things are if they need to make a temporary manually a check.
Understanding and Testing Your Checks
Your check_yourchecking.json will need to go in two places for the check to work, the local machine you are checking, and the server so that the check registers with the server. I know, I know, it seems pretty sensible when you read it but it wasnât apparent when I was going through this method.
When youâre testing your json formats for the checks, I always use an online JSON checker. This always helps me when I miss a {} or a comma. This saves a ton of time. Again, seems logical but, again, something that helped me out when I was doing this.
But, when you have your check ready, always test it. DUHHHHHH.
I always write the check to a dummy machine (test.mycompany.com) and I run the test in the subscriber group âTESTâ or something dumb.
It will be easier before you use your CM to deploy it.
Find your start point, for me? Well, I started with system and processes checks. Something basic and something you can easily test - well, mostly if a process is or is not running.
It was easy to generate the quick results so that when youâre building your system out itâs easy to test.
Well, I have been writing this out for so long BUT I hope this helps all you beginners out there.
I am always asking questions and I am always learning. Trust me, Iâve asked some dumb questions and still do.