Dr Jekyll’s potion famously owes its effectiveness to an ‘unknown impurity’. This is why, at the end of Stevenson’s tale, the protagonist has to confess to himself and the world that he will never regain control of his destructive alter ego. Configuration errors sometimes present a similar problem. It can be very hard to figure out why an earlier, throwaway version of a service worked when our painstaking attempts to recreate it fail. As I hope to show, creating regular backups of our projects can help.
I’d like to distinguish between two kinds of backup here. On the one hand, there’s a spare vial in the fridge. Its contents match the original potion exactly. This is essentially a database snapshot. On the other hand, there’s a laboratory analysis of the original potion, which represents our only chance of identifying the ‘unknown impurity’.
In many cases, the vial in the fridge is what is needed. Its direct equivalent in the Kubernetes world is a database backup of the master’s etcd
store. I want to concentrate instead on the laboratory analysis. It is less convenient when time is short, but it does offer a clear, human-readable glimpse of a particular moment in time when our service was working correctly.
While this approach will probably not allow you to restore the entire cluster to a working state, it enables you to look at an individual project, dissect its parts and hopefully identify the tiny, inadvertent configuration change that separates a failed deployment from a successful one.
There is no need to lock the database prior to taking the backup. We are exporting individual objects to pretty-printed JSON, not dumping blobs.
Why, considering our infrastructure is expressed in code, should we go to the trouble of requesting laboratory analyses? Surely the recipe will suffice as everything of consequence is persisted in Git? The reason is that too often the aspiration to achieve parity between code and infrastructure is never realised. Few of us can say that we never configure services manually (a port changed here, a health check adjusted there); even fewer can claim that we regularly tear down and rebuild our clusters from scratch. If we consider ourselves safe from Dr Jekyll’s error, we may well be deluding ourselves.
Project export
Our starting point is the script export_project.sh
in the repository openshift/openshift-ansible-contrib . We will use a substantially modified version (see pull request , now merged).
One of the strengths of the Kubernetes object store is that its contents are serialisable and lend themselves to filtering using standard tools. We decide which objects we deem interesting and we also decide which fields can be skipped. For example, the housekeeping information stored in the .status
property is usually a good candidate for deletion.
oc export
has been deprecated, so we use oc get -o json
(followed by jq
pruning) to export object definitions. Take pods, for example. Most pod properties are worth backing up, but some are dispensable: they include not only a pod’s .status
, but also its .metadata.uid
, .metadata.selfLink
, .metadata.resourceVersion
, .metadata.creationTimestamp
and .metadata.generation
fields.
Some caveats are in order. We store pod and replication controller definitions, yet we also store deployment configurations. Clearly the third is perfectly capable of creating the first two. Still, rather than second-guess a given deployment sequence, the backup comprises all three. It is after all possible that the pod definition (its readinessProbe
property, for example) has been modified. The resulting history may be repetitive, but we cannot rule out the possibility of a significant yet unseen change.
Another important caveat is that this approach does not back up images or application data (whether stored ephemerally or persistently on disk). It complements full disk backups, but it cannot take their place.
Why not use the original export script? The pull request addresses three central issues: it continues (with a warning) when the cluster does not recognise a resource type, thus supporting older OpenShift versions. It also skips resource types when the system denies access to the user or service account running the export, thus adding support for non-admin users. (Usually the export will be run by a service account, and denying the service account access to secrets is a legitimate choice.) Finally, it always produces valid JSON. The stacked JSON output of the original is supported by jq
and indeed oc
, but expecting processors to accept invalid, stacked JSON is a risky choice for backup purposes. python -m json.tool
, for instance, requires valid JSON input and rejects the output of the original script. Stacked JSON may be an excellent choice for chunked streaming (log messages come to mind) but here it seems out of place.
Backup schedule
Now that the process of exporting the resources is settled, we can automate it. Let’s assume that we want the export to run nightly backups. We want to zip up the output, add a date stamp and write it to persistent storage. If that succeeds we finish by rotating backup archives, that is, deleting all exports older than a week. The parameters (when and how often the export runs, the retention period, and so on) are passed to the template at creation time.
Let’s say we are up and running. What is happening in our backup project?
A nightly CronJob object instantiates a pod that runs the script project_export.sh
. Its sole dependencies are oc
and jq
. It’s tempting at first glance to equip this pod with the ability to restore the exported object definitions, but that would require sweeping write access to the cluster. As mentioned earlier, the pod writes its output to persistent storage. The storage mode is ReadWriteMany
, so we can access our files whether an export is currently running or not. Use the spare pod deployed alongside the CronJob object to retrieve the backup archives:
$ oc project cluster-backup
$ POD=$(oc get po | grep Running | cut -d' ' -f1)
$ oc exec ${POD} -- ls -1 /openshift-backup
openshift-backup20180911.zip
openshift-backup20180912.zip
openshift-backup20180913.zip
Policy
The permissions aspect is crucial here. The pod’s service account is granted cluster reader access and an additional, bespoke cluster role secret-reader
. It is defined as follows:
kind: ClusterRole
apiVersion: v1
metadata:
name: ${NAME}-secret-reader
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
Perhaps the greatest benefit of custom cluster roles is that they remove the temptation to grant cluster-admin
rights to a service account.
The export should not fail just because we decide that a given resource type (e.g. secrets or routes) is out of bounds. Nor should it be necessary to comment out parts of the export script. To restrict access, simply modify the service account’s permissions. For each resource type, the script checks whether access is possible and exports only resources the service account can view.
Administrator permissions are required only to create the project at the outset. The expectation is that this would be done by an authenticated user rather than a service account. As Fig. 2 illustrates, the pod that does the actual work is given security context constraint ‘restricted’ and security context ‘non-privileged’. For the most part, the pod’s service account has read access to the etcd
object store and write access to its persistent volume.
How to get started, and why
To set up your own backup service, enter:
$ git clone https://github.com/gerald1248/openshift-backup
$ make -C openshift-backup
If you’d rather not wait until tomorrow, set the permanent pod’s name in variable POD
as before and enter:
$ oc exec ${POD} openshift-backup
Exporting 'rc' resources to myproject/rcs.json
Exporting 'rolebindings' resources to myproject/rolebindings.json
Skipped: list empty
Exporting 'serviceaccounts' resources to myproject/serviceaccounts.json
...
Please check that the output has been written to /openshift-backup
as intended. You can use the script project_import.sh
(found next to project_export.sh
in the openshift/openshift-ansible-contrib repository) to restore one project at a time. However, in most cases it will be preferable to use this backup as an analytical tool, and restore individual objects as required.
It’s worth considering the sheer number of objects the object store holds for a typical project. Each of them could have been edited manually or patched programmatically. It could also lack certain properties that are present in the version that is stored in Git. Kubernetes is prone to drop incorrectly indented properties at object creation time.
In short, there is ample scope for ‘unknown impurities’. Given how few computing resources are required, and how little space a week’s worth of project backups takes up, I would suggest that there is every reason to have a laboratory analysis to hand when the vials in the fridge run out.
More articles
fromGerald Schmidt
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog author
Gerald Schmidt
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.