Crunchy Operator (pgBackRest)

PGO Crunchy Operator uses pgBackRest for managing backups.

pgBackRest - is a backup and restore solution for PostgreSQL databases that offers several features, such as parallel backup and restore, compression, full, differential, and incremental backups, backup rotation and archive expiration, backup integrity and etc. It supports multiple repositories, which can be located locally or remotely via TLS/SSH, or be cloud provided storage as S3/GCS/Azure.

Backup configuration

Backup configuration is done through the spec.backups.pgbackrest parameter. See the example below.

aidboxdb.yaml

yaml

spec:
  backups:
    pgbackrest:
      repos:
        - name: repo1                   # repo
          schedules:
            full: "0 1 * * 0"           # Full backup once a week on Sunday at 1 AM
            incremental: "0 1 * * 1-6"  # incremental - from Monday to Saturday at 1 AM   
          gcs:
            bucket: "<BUCKET_NAME>"     # GCS bucket name 
      configuration:
        - secret:
            name: pgo-gcs-creds         # GCS credentials
        - configMap:
            name: pgbackrest-config     # pgbackrest config
      global:
        repo1-path: /backup/aidboxdb    # Backup path in bucket
        repo1-retention-full-type: time # Retention policy  
        repo1-retention-full: "30"      # Delete backups after 30 days
      manual:
        repoName: repo1
        options:                        # Manual backup configuration
          - '--type=full'
          - '--compress-level=6'
          - '--start-fast=y'
          - '--process-max=20'
          - '--log-level-console=info'

And create additional configs and secrets

yaml

---
apiVersion: v1
kind: Secret
metadata:
  name: pgo-gcs-creds
  namespace: aidboxdb-db
dataString:
  gcs.conf: |-
    [global]
    repo1-gcs-key=/etc/pgbackrest/conf.d/gcs-key.json
  gcs-key.json: |-
    <GCP SA JSON access file>
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: pgbackrest-config
  namespace: aidboxdb-db
data:
  db.conf: |-
    [global]
    compress-level=6
    start-fast=y
    process-max=20

Repositories

repos: - Defines a pgBackRest repository. This allows you to configure where and how your backups and WAL archive are stored. You can keep backups in up to four (4) different locations.

Supported 4 locations (see full Backup Configuration instructions):

azureFor use with Azure Blob Storage.
gcsFor use with Google Cloud Storage (GCS).
s3For use with Amazon S3 or any S3compatible storage system such as MinIO.
volume For use with a Kubernetes Persistent Volume.

GCS configuration example:

Specify GCS bucket and secret with credentials

yaml

spec:
  backups:
    pgbackrest:
      repos:
        - name: repo1        
          gcs:
            bucket: "<BUCKET_NAME>" 
      configuration:
        - secret:
            name: pgo-gcs-creds

2. Create secret with GCS connection credentials

yaml

apiVersion: v1
kind: Secret
metadata:
  name: pgo-gcs-creds
  namespace: aidboxdb-db
dataString:
  gcs.conf: |-
    [global]
    repo1-gcs-key=/etc/pgbackrest/conf.d/gcs-key.json
  gcs-key.json: |-
    <GCP SA JSON access file>

Schedule

In this spec, we define incremental backup from Monday to Saturday and take one full backup every Sunday at 1 AM:

yaml

spec:
  backups:
    pgbackrest:
      repos:
        - name: repo1
          schedules:
            full: "0 1 * * 0"           # Full backup once a week on Sunday at 1AM
            incremental: "0 1 * * 1-6"  # incremental - from Monday to Saturday at 1AM

Backup retention

Define backup retention policy. In this spec we store all backups for 30 days, after that period - delete them:

yaml

spec:
  backups:
    pgbackrest:
      global:
        repo1-path: /backup/aidboxdb    # Backup path in bucket
        repo1-retention-full-type: time # Retention policy  
        repo1-retention-full: "30"      # Delete backups after 30 days

Create backup

At certain instances, you may find it necessary to perform a singular backup, especially before making significant modifications or updates to an application. To do so, you must first configure the spec.backups.pgbackrest.manual section, which includes details about the type of backup desired and any additional pgBackRest configuration settings required:

yaml

spec:
  backups:
    pgbackrest:
      manual:
        repoName: repo1
        options:                  # Manual backup configuration
          - '--type=full'         # Take full backup
          - '--compress-level=6'  # Compress GZ
          - '--start-fast=y'      # Do no wait checkpoint
          - '--process-max=20'    # Max processes to use for compressing and transfer

For creating a manual backup you should annotate postgrescluster resource with postgres-operator.crunchydata.com/pgbackrest-backup annotation:

bash

$ kubectl annotate -n aidboxdb-db postgrescluster aidboxdb --overwrite \
          postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"

Recovery

Sometimes you need to recover your database or clone your production database to the stage environment. Generally in the recovery process, we can define two types of recovery: clone the existing cluster to another environment, PITR - recovery database at a specific point in time.

Clone

To create a new clone of the existing PG cluster you should specify dataSource parameter for the new cluster. In the sample below we create stage cluster as a copy of aidboxdb cluster in aidboxdb-db namespace.

yaml

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: stage
  namespace: stage
spec:
  dataSource:
    postgresCluster:
      clusterName: aidboxdb
      repoName: repo1
      clusterNamespace: aidboxdb-db
  image: healthsamurai/aidboxdb:15.2.0-crunchy
  postgresVersion: 15
  instances:
    - dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi

PITR

When you need recovery to a specific point in time you should add recovery options to the new cluster configuration.

yaml

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: stage-pitr
  namespace: stage-pitr
spec:
  dataSource:
    postgresCluster:
      clusterName: aidboxdb
      repoName: repo1
      clusterNamespace: aidboxdb-db
      options:
      - --type=time
      - --target="2023-04-09 10:00:00-04"
  image: healthsamurai/aidboxdb:15.2.0-crunchy
  postgresVersion: 15
  instances:
    - dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi

Look at dataSource . In this section, you can specify the type of recovery and target.

yaml

spec:
  dataSource:
    postgresCluster:
      clusterName: aidboxdb
      clusterNamespace: aidboxdb-db
      repoName: repo1
      options:
      - --type=time
      - --target="2023-04-09 10:00:00-04"

Inspect backup

You can list of backups via direct exec pgbackrest info command on database image

bash

$ export NS=aidboxdb-db
$ kubectl exec  -n $NS \
  $(kubectl get pod -n $NS -l "postgres-operator.crunchydata.com/data=postgres" -o jsonpath='{.items[0].metadata.name}') \
  -- bash -c 'pgbackrest info'

For verifying existing backups you can run pgbackrest verify command

bash

$ export NS=aidboxdb-db
$ kubectl exec  -n $NS \
  $(kubectl get pod -n $NS -l "postgres-operator.crunchydata.com/data=postgres" -o jsonpath='{.items[0].metadata.name}') \
  -- bash -c 'pgbackrest --stanza=db --log-level-console=info verify'