PostgreSQL Operator

The Postgres operator manages PostgreSQL clusters on Kubernetes (K8s):

  1. The operator watches additions, updates, and deletions of PostgreSQL cluster manifests and changes the running clusters accordingly. For example, when a user submits a new manifest, the operator fetches that manifest and spawns a new Postgres cluster along with all necessary entities such as K8s StatefulSets and Postgres roles. See this Postgres cluster manifest for settings that a manifest may contain.
  2. The operator also watches updates to its own configuration and alters running Postgres clusters if necessary. For instance, if the Docker image in a pod is changed, the operator carries out the rolling update, which means it re-spawns pods of each managed StatefulSet one-by-one with the new Docker image.
  3. Finally, the operator periodically synchronizes the actual state of each Postgres cluster with the desired state defined in the cluster's manifest.
  4. The operator aims to be hands free as configuration works only via manifests. This enables easy integration in automated deploy pipelines with no access to K8s directly.

— postgres-operator.readthedocs.io

In the Shivering-Isles Infrastructure the Zalando Postgres Operator is used to manage highly available database clusters for all applciations. Takes away a lot of the common pain points for applications, such as postgresql updates, and standardises backups using point-in-time recovery as well as monitoring.

Tools

The tooling assumes there is only one postgresql clsuter object per namespace.

For postgres database management the SI-Infra repository contains multiple tools.

  • psql - is a simple wrapper for psql inside one of the postgresql cluster
  • patronictl - allows easy access to patronictl of a cluster and manage everything from failovers to simple member status
  • zpg - provides a simplied interface for common tasks with multiple steps, such as recreating a member of a postgresql cluster

Backups

Zalando Postgres Operator provides a two built-in mechanisms for backups:

In both cases the target is an S3 compatible endpoint, but they don't have to be the same.

While these can be different per cluster, for simplicity, in the SI-Infra the credentails are globally managed.

Restore

Automatic restore

If PostgreSQL-Pods are missing the content volumes at any point, they'll automatically grab the latest available backup, restore it and then stream the remaining delta from currently active cluster members if possible.

This mechanism is quite battle-tested in SI-Infra and the default way of solving single-member issues in postgresql clusters.

Manual restore base backup

If a cluster object was deleted a restore can be done using the old cluster uid:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: example-postgres
spec:
  clone:
    uid: "<original-uid>"
    cluster: "example-postgres"

The original uid can easily be found in S3 as they are part of the path to the backups.

Manual restore logical backup

If logical backups have been enabled for the cluster, they are available as simple pg_dump files in S3.

To manually restore these, download the dump and run pg_restore.