maandag 5 februari 2018

Have your users monitor their own kubernetes services in a multi-tenant setup

If you are running a kubernetes cluster (I'll call it k8s from now on), or if you are a user of a k8s environment, you're probably doing that for a couple of reasons. If one of those reasons is because you run some webservice that your company or livelihood depends on, then you want to be damn sure that this service is up and running. You can't trust k8s to fix all problems for you, because when your DC goes down, what can k8s do for you? Despite the magic that k8s is, you still need monitoring. And with that, alerting.
Black-box alerting won't change much when you switch from whatever you were using before you used k8s. You'll still want to do synthetic tests from multiple locations, checking for response codes, page content and response times. Perhaps you've even got some simulated user stories that get executed by your external monitoring solution. If you don't, it may be a good idea. Because if the login page loads, that doesn't mean the login is actually working ;)
But white-box monitoring, where you look at the components within k8s, is where you can save your ass. When you get a black-box alert, that is an alert that tells you something bad was visible externally, there were probably signals up front from the internals of the system that could have warned you. Wouldn't it be nice to have those signals available to you, so you could have fixed whatever problem there was before customers were affected?

To monitor k8s internals and the services running on top, there is one 'de-facto' opensource solution available, which is the combination of prometheus and grafana. The reason for this is that monitoring modern and dynamic systems such as k8s is not easy to do using old-world tools such as nagios, zabbix, etc. That is because they don't have the concept of moving targets ("where is that container running this time?") included in their design and retrofitting them to do so is either hard, complex or user unfriendly to work with. The old-style monitoring tools usually don't support high frequency monitoring either (up to 1x per minute at most).

If you run a k8s cluster, you may deploy prometheus and grafana. You'll get the insights you'll need (up to a point, this stuff is in rapid development and it's not a deploy and forget solution). But what about your users? If you run some cluster, chances are that you are not the only user. Perhaps you have external customers that you are running this cluster for and they are paying you to do so. Or perhaps you run this for one or more development teams within your company. Whatever the reason you are running k8s, your users are probably the ones deploying applications. And when something breaks in those namespaces where those applications are running, who should be the one waking up?You, because you are the guy who set up k8s? Or should it really be the person or the team that deployed the software/service into that namespace? Who knows most about that application, it's needs, it's design and it's problems? Exactly.
So then, if the users should be the ones waking up, they should also be the ones to monitor their own namespaces and everything in it. This may be hard in the beginning. You're the k8s expect after all, they are 'just coders' and don't know anything about it. Or they may tell you some other excuse. Or maybe they'll grab the chance with both hands to do these things themselves and take ownership. However it goes, they need their own monitoring implemented. Their own prometheus and grafana.

Yet one thing that is not written about much, and indeed is not yet that easy to set up, is how to handle the situation where you have a multi-tenant prometheus/grafana setup on top of k8s. Almost all documentation assumes having only 1 prometheus and 1 grafana to monitor the entire cluster. There are some important reasons to split this up:
- self-reliance: You should not be involved when applications are deployed, so you should also not be involved when monitoring and alerting for that application is implemented. You could be an advisor, but you should not be the bottleneck.
- differences in needs: Each user or group of users has different needs. Some of those may conflict with each other. If the monitoring is separated, these needs can co-exist.
- performance: When you have 1 prometheus, gathering and storing all that data can become quite resource intensive. By splitting all of this up into smaller pieces, the whole scales better.

When using helm to deploy prometheus, it's possible to specify a destination namespace to which prometheus should be deployed. Development teams could install helm (and tiller) themselves, and then deploy prometheus and grafana using that helm installation. Doing this, helm, prometheus and grafana only have the permissions that you granted them within that namespace, nothing more. So if you've set up RBAC correctly (and did not give the users cluster-admin access), you'll be fine.