-
Notifications
You must be signed in to change notification settings - Fork 29
docs(platform): add guides related managed apps backups #536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
androndo
wants to merge
1
commit into
main
Choose a base branch
from
feat/apps-backups-guides
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
326 changes: 326 additions & 0 deletions
326
content/en/docs/next/applications/backup-and-recovery.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,326 @@ | ||
| --- | ||
| title: "Application Backup and Recovery" | ||
| linkTitle: "Backup and Recovery" | ||
| description: "Back up and restore managed databases (Postgres, MariaDB, ClickHouse, FoundationDB) with BackupJob, Plan, and RestoreJob." | ||
| weight: 4 | ||
| --- | ||
|
|
||
| This guide covers backing up and restoring **Cozystack-managed databases** — Postgres, MariaDB, ClickHouse, and FoundationDB — as a tenant user: running one-off and scheduled backups, checking status, and restoring from a backup either in place or into a separate target instance. | ||
|
|
||
| {{% alert color="warning" %}} | ||
| **These backups are data-only.** Each strategy snapshots the database contents through the operator's native mechanism (CloudNativePG barman, mariadb-operator dumps, Altinity `clickhouse-backup`, FoundationDB `backup_agent`). They do **not** capture the `apps.cozystack.io/*` CR, its `HelmRelease`, chart values, or operator-managed Secrets. | ||
|
|
||
| To restore you must either: | ||
| - keep the source application alive and restore in place (each driver re-bootstraps data into the existing operator-managed cluster), **or** | ||
| - pre-provision an empty target application of the same Kind, then restore into it. | ||
|
|
||
| For backups that include the application's Helm release, CRs, and PVC snapshots (used for VMInstance / VMDisk), see [Backup and Recovery (VMs)]({{% ref "/docs/next/virtualization/backup-and-recovery" %}}). | ||
| {{% /alert %}} | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A `BackupClass` exists in the cluster for the application Kind you want to back up. Run `kubectl get backupclasses` to confirm; if none is present, ask your administrator to follow the [Managed Application Backup Configuration]({{% ref "/docs/next/operations/services/managed-app-backup-configuration" %}}) guide. | ||
| - S3-compatible storage. Either provision an in-cluster `Bucket` (shown below) or use external S3 coordinates supplied by your administrator. | ||
| - `kubectl` and kubeconfig for the management cluster. | ||
|
|
||
| ## List available BackupClasses | ||
|
|
||
| `BackupClass` resources are cluster-scoped and tell you which application Kinds can be backed up and which driver handles each: | ||
|
|
||
| ```bash | ||
| kubectl get backupclasses | ||
| ``` | ||
|
|
||
| Example output: | ||
|
|
||
| ``` | ||
| NAME AGE | ||
| postgres-data-backup 14m | ||
| mariadb-data-backup 14m | ||
| clickhouse-data-backup 14m | ||
| foundationdb-data-backup 14m | ||
| velero 1d | ||
| ``` | ||
|
|
||
| Use the `BackupClass` name when creating a `BackupJob` or `Plan`. The examples below assume `tenant-user` for the tenant namespace; substitute your own. | ||
|
|
||
| ## Provision the storage Bucket | ||
|
|
||
| If your administrator has not pre-configured external S3, provision an in-cluster `Bucket` in the tenant namespace: | ||
|
|
||
| ```yaml | ||
| apiVersion: apps.cozystack.io/v1alpha1 | ||
| kind: Bucket | ||
| metadata: | ||
| name: db-backups | ||
| namespace: tenant-user | ||
| spec: | ||
| users: | ||
| backup: | ||
| readonly: false | ||
| ``` | ||
|
|
||
| ```bash | ||
| kubectl apply -f bucket.yaml | ||
| kubectl -n tenant-user wait hr/bucket-db-backups --for=condition=ready --timeout=300s | ||
| ``` | ||
|
|
||
| The `Bucket` controller materialises a `bucket-<name>-backup` Secret in the namespace carrying a `BucketInfo` JSON blob. The S3 endpoint, bucket name, and access keys come from there. | ||
|
|
||
| ## Create per-application backup credentials | ||
|
|
||
| Each driver expects per-application credential Secrets in the application namespace — the strategy templates reference them by name. The snippets below assume a single shell session: first read the bucket credentials, then run only the per-driver block for the application Kind you are setting up. | ||
|
|
||
| ### Read the bucket credentials | ||
|
|
||
| Run this once per shell session. Every per-driver block below reuses `$ACCESS_KEY`, `$SECRET_KEY`, and `/tmp/bucket.json`: | ||
|
|
||
| ```bash | ||
| kubectl -n tenant-user get secret bucket-db-backups-backup \ | ||
| -o jsonpath='{.data.BucketInfo}' | base64 -d > /tmp/bucket.json | ||
| ACCESS_KEY=$(jq -r .spec.secretS3.accessKeyID /tmp/bucket.json) | ||
| SECRET_KEY=$(jq -r .spec.secretS3.accessSecretKey /tmp/bucket.json) | ||
| ``` | ||
|
|
||
| If you start a new shell, re-run that snippet before continuing. | ||
|
|
||
| ### Postgres | ||
|
|
||
| Project the credentials in the keys CNPG's barman client expects: | ||
|
|
||
| ```bash | ||
| kubectl -n tenant-user create secret generic my-postgres-cnpg-backup-creds \ | ||
| --from-literal=ACCESS_KEY_ID="$ACCESS_KEY" \ | ||
| --from-literal=ACCESS_SECRET_KEY="$SECRET_KEY" | ||
| ``` | ||
|
|
||
| When the S3 endpoint uses a self-signed certificate (the SeaweedFS default), also create a CA Secret: | ||
|
|
||
| ```bash | ||
| kubectl -n tenant-user create secret generic my-postgres-cnpg-backup-ca \ | ||
| --from-file=ca.crt=/path/to/ca.crt | ||
| ``` | ||
|
|
||
| ### MariaDB | ||
|
|
||
| ```bash | ||
| kubectl -n tenant-user create secret generic my-mariadb-mariadb-backup-creds \ | ||
| --from-literal=AWS_ACCESS_KEY_ID="$ACCESS_KEY" \ | ||
| --from-literal=AWS_SECRET_ACCESS_KEY="$SECRET_KEY" | ||
| ``` | ||
|
|
||
| For self-signed endpoints, add `my-mariadb-mariadb-backup-ca` carrying `ca.crt` the same way. | ||
|
|
||
| ### ClickHouse | ||
|
|
||
| ClickHouse backups read S3 credentials from the chart-emitted `<release>-backup-s3` Secret directly. Set `backup.enabled: true` on the ClickHouse application and fill in `backup.*` with the bucket coordinates — no extra Secret is needed for the BackupClass flow. See the [ClickHouse application reference]({{% ref "/docs/next/applications/clickhouse" %}}) for the `backup.*` values. | ||
|
|
||
| ### FoundationDB | ||
|
|
||
| FoundationDB's `backup_agent` requires a `blob_credentials.json` payload in a specific shape. This block reads the bucket endpoint from `/tmp/bucket.json` (created by [Read the bucket credentials](#read-the-bucket-credentials) above) and reuses `$ACCESS_KEY` / `$SECRET_KEY` from the same step: | ||
|
|
||
| ```bash | ||
| ENDPOINT_FULL=$(jq -r .spec.secretS3.endpoint /tmp/bucket.json) | ||
| ENDPOINT_HOSTPORT=${ENDPOINT_FULL#http://} | ||
| ENDPOINT_HOSTPORT=${ENDPOINT_HOSTPORT#https://} | ||
| ACCOUNT_NAME="${ACCESS_KEY}@${ENDPOINT_HOSTPORT}" | ||
|
|
||
| jq -nc \ | ||
| --arg account "$ACCOUNT_NAME" \ | ||
| --arg key "$ACCESS_KEY" \ | ||
| --arg secret "$SECRET_KEY" \ | ||
| '{accounts: {($account): {api_key: $key, secret: $secret}}}' \ | ||
| > /tmp/blob_credentials.json | ||
|
|
||
| kubectl -n tenant-user create secret generic my-fdb-fdb-backup-creds \ | ||
| --from-file=blob_credentials.json=/tmp/blob_credentials.json | ||
| ``` | ||
|
|
||
| Your administrator must also patch the FoundationDB `BackupClass` parameters with the resolved `accountName`, `bucket`, `region`, and `secureConnection` values before the first backup runs. Otherwise the first `BackupJob` fails fast with a validation error (`accountName is required`) — this is the intentional fail-loud behaviour for a half-configured tenant. | ||
|
|
||
| ## Run a backup | ||
|
|
||
| ### One-off backup | ||
|
|
||
| Use a `BackupJob` for an ad-hoc backup (for example, before a risky change): | ||
|
|
||
| ```yaml | ||
| apiVersion: backups.cozystack.io/v1alpha1 | ||
| kind: BackupJob | ||
| metadata: | ||
| name: my-postgres-adhoc | ||
| namespace: tenant-user | ||
| spec: | ||
| applicationRef: | ||
| apiGroup: apps.cozystack.io | ||
| kind: Postgres | ||
| name: my-postgres | ||
| backupClassName: postgres-data-backup | ||
| ``` | ||
|
|
||
| ```bash | ||
| kubectl apply -f backupjob.yaml | ||
| kubectl -n tenant-user get backupjobs | ||
| kubectl -n tenant-user describe backupjob my-postgres-adhoc | ||
| ``` | ||
|
|
||
| When the `BackupJob` reaches `phase: Succeeded`, the driver creates a `Backup` object with the same name. That name is what you reference when restoring. | ||
|
|
||
| Replace `Postgres` / `postgres-data-backup` with `MariaDB` / `mariadb-data-backup`, `ClickHouse` / `clickhouse-data-backup`, or `FoundationDB` / `foundationdb-data-backup` for the other drivers. | ||
|
|
||
| ### Scheduled backup | ||
|
|
||
| Use a `Plan` for cron-driven recurring backups: | ||
|
|
||
| ```yaml | ||
| apiVersion: backups.cozystack.io/v1alpha1 | ||
| kind: Plan | ||
| metadata: | ||
| name: my-postgres-daily | ||
| namespace: tenant-user | ||
| spec: | ||
| applicationRef: | ||
| apiGroup: apps.cozystack.io | ||
| kind: Postgres | ||
| name: my-postgres | ||
| backupClassName: postgres-data-backup | ||
| schedule: | ||
| type: cron | ||
| cron: "0 */6 * * *" # every 6 hours | ||
| ``` | ||
|
|
||
| Each scheduled run creates a `BackupJob` (and, on success, a `Backup`) named after the `Plan` with a timestamp suffix. | ||
|
|
||
| ```bash | ||
| kubectl apply -f plan.yaml | ||
| kubectl -n tenant-user get plans | ||
| kubectl -n tenant-user get backupjobs -l backups.cozystack.io/plan=my-postgres-daily | ||
| ``` | ||
|
|
||
| ## Check backup status | ||
|
|
||
| List `BackupJob` and `Backup` resources in the namespace: | ||
|
|
||
| ```bash | ||
| kubectl -n tenant-user get backupjobs | ||
| kubectl -n tenant-user get backups | ||
| ``` | ||
|
|
||
| Inspect a failed run: | ||
|
|
||
| ```bash | ||
| kubectl -n tenant-user get backupjob my-postgres-adhoc -o jsonpath='{.status.message}' | ||
| kubectl -n tenant-user describe backupjob my-postgres-adhoc | ||
| ``` | ||
|
|
||
| For driver-side detail, inspect the operator-native CR each driver materialises (one of `cnpg.io/Backup`, `k8s.mariadb.com/Backup`, `apps.foundationdb.org/FoundationDBBackup`, or the ClickHouse strategy `Pod`). | ||
|
|
||
| ## Restore in place | ||
|
|
||
| An **in-place restore** replays the backup into the **same** application. Use this to roll back accidental deletion or corruption on a live database you intend to keep using under the same name. | ||
|
|
||
| {{% alert color="warning" %}} | ||
| In-place restore is **destructive**. Each driver wipes or replaces existing data on the source application; any writes since the backup point are lost. If you cannot afford to lose recent writes, use [Restore to a copy](#restore-to-a-copy) instead. | ||
| {{% /alert %}} | ||
|
|
||
| ```yaml | ||
| apiVersion: backups.cozystack.io/v1alpha1 | ||
| kind: RestoreJob | ||
| metadata: | ||
| name: my-postgres-restore-inplace | ||
| namespace: tenant-user | ||
| spec: | ||
| backupRef: | ||
| name: my-postgres-adhoc | ||
| # targetApplicationRef omitted: driver restores into Backup.spec.applicationRef. | ||
| # options: | ||
| # recoveryTime: "2026-05-01T12:00:00Z" # Postgres only; RFC3339 PITR | ||
| ``` | ||
|
|
||
| ```bash | ||
| kubectl apply -f restorejob.yaml | ||
| kubectl -n tenant-user get restorejobs | ||
| kubectl -n tenant-user describe restorejob my-postgres-restore-inplace | ||
| ``` | ||
|
|
||
| ### Per-driver caveats | ||
|
|
||
| - **Postgres (CNPG)** — the driver deletes the live `cnpg.io/Cluster` and its PVCs, then re-bootstraps from the Barman archive. Connections drop for the duration. `spec.options.recoveryTime` (RFC3339) is supported for point-in-time recovery; omit it to restore to the latest WAL. | ||
| - **MariaDB** — the operator replays the logical dump into the live `MariaDB` via `mariadb-import`. Pre-existing tables will collide; pre-truncate the relevant schemas if your dump does not include `DROP TABLE`. | ||
| - **ClickHouse** — the Altinity strategy does **not** pass `clickhouse-backup --rm`. You are responsible for dropping conflicting tables on the source before submitting the `RestoreJob`; otherwise the operation fails with a duplicate-table error. | ||
| - **FoundationDB** — the operator pauses the FoundationDB cluster, clears the keyspace, and replays the backup via `fdbrestore`. Any data written after the snapshot is lost. Only one `FoundationDBBackup` directory may exist per cluster at a time — the driver stops any prior backup before starting a new one. | ||
|
|
||
| ## Restore to a copy | ||
|
|
||
| A **to-copy restore** replays the backup into a **different**, freshly-provisioned application of the same Kind. Use this for disaster-recovery drills, side-by-side validation, branch databases, or migrating to a new version of the upstream operator. | ||
|
|
||
| First, provision an empty target application with the same Kind. For example, an empty `Postgres`: | ||
|
|
||
| ```yaml | ||
| apiVersion: apps.cozystack.io/v1alpha1 | ||
| kind: Postgres | ||
| metadata: | ||
| name: my-postgres-restored | ||
| namespace: tenant-user | ||
| spec: | ||
| # ...same shape as the source, no bootstrap data required... | ||
| ``` | ||
|
|
||
| Wait for the target to become Ready, then submit a `RestoreJob` that points at it: | ||
|
|
||
| ```yaml | ||
| apiVersion: backups.cozystack.io/v1alpha1 | ||
| kind: RestoreJob | ||
| metadata: | ||
| name: my-postgres-restore-to-copy | ||
| namespace: tenant-user | ||
| spec: | ||
| backupRef: | ||
| name: my-postgres-adhoc | ||
| targetApplicationRef: | ||
| apiGroup: apps.cozystack.io | ||
| kind: Postgres | ||
| name: my-postgres-restored | ||
| ``` | ||
|
|
||
| The source application stays untouched. Cross-namespace restores are **not** supported — `targetApplicationRef` is a local reference; the target must live in the same namespace as the `RestoreJob`. | ||
|
|
||
| ## Limitations and lifecycle | ||
|
|
||
| - **Data-only scope.** Application CRs, HelmReleases, chart values, and operator-managed Secrets (e.g. `cnpg.io` superuser secret, `clickhouse-installation` users) are not captured. Pre-provision the target application before a to-copy restore. | ||
| - **Archive retention is driver-owned.** Deleting a Cozystack `Backup` CR removes the artefact reference but leaves the actual S3 object intact. Each driver enforces its own retention: | ||
| - CNPG: `retentionPolicy` on the strategy (`30d` default in the admin example). | ||
| - MariaDB: configure `cleanupStrategy` on the operator-side `Backup` CR or rotate at the bucket level. | ||
| - ClickHouse: governed by the in-pod sidecar's retention configuration. Tenants who need to purge an archive call `DELETE /backup/<name>/remote` on the sidecar. | ||
| - FoundationDB: each `BackupJob` owns a discrete blob-store directory; clean up at the bucket level. | ||
| - **One running backup per FoundationDB cluster.** The driver enforces this by stopping any prior `FoundationDBBackup` on the same cluster before starting a new one. | ||
| - **ClickHouse depends on the in-chart sidecar.** The Altinity strategy is a thin HTTP client; the backup itself runs inside each `chi-*` Pod via `clickhouse-backup`. Disabling `backup.enabled` on the application also disables the BackupClass flow. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| If a `BackupJob` or `RestoreJob` ends in `phase: Failed`, check the message field: | ||
|
|
||
| ```bash | ||
| kubectl -n tenant-user get backupjob my-postgres-adhoc -o jsonpath='{.status.message}' | ||
| kubectl -n tenant-user get restorejob my-postgres-restore-inplace -o jsonpath='{.status.message}' | ||
| ``` | ||
|
|
||
| Then look at the operator-native CR the driver created: | ||
|
|
||
| ```bash | ||
| # Postgres | ||
| kubectl -n tenant-user get backups.cnpg.io | ||
| # MariaDB | ||
| kubectl -n tenant-user get backups.k8s.mariadb.com,restores.k8s.mariadb.com | ||
| # ClickHouse | ||
| kubectl -n tenant-user logs -l backups.cozystack.io/owned-by.BackupJobName=my-clickhouse-adhoc | ||
| # FoundationDB | ||
| kubectl -n tenant-user get foundationdbbackups.apps.foundationdb.org,foundationdbrestores.apps.foundationdb.org \ | ||
| -l backups.cozystack.io/owned-by.BackupJobName=my-fdb-adhoc | ||
| ``` | ||
|
|
||
| ## See also | ||
|
|
||
| - [Managed Application Backup Configuration]({{% ref "/docs/next/operations/services/managed-app-backup-configuration" %}}) — how administrators define strategies and `BackupClass` resources. | ||
| - [Backup and Recovery (VMs)]({{% ref "/docs/next/virtualization/backup-and-recovery" %}}) — the parallel guide for VMInstance / VMDisk backups (HelmRelease + CRs + PVC snapshots). | ||
| - [Velero Backup Configuration]({{% ref "/docs/next/operations/services/velero-backup-configuration" %}}) — administrator setup for the Velero-driven VM backups. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the dynamic
$ACCESS_KEYas part of theACCOUNT_NAMEmakes it difficult for administrators to pre-configure a cluster-scopedBackupClass, as they would need to know each tenant's access key in advance.It is recommended to use a fixed, descriptive account name (e.g.,
fdb-backup) and ensure it matches theaccountNameparameter defined by the administrator in theBackupClass. This allows the sameBackupClassto be used by multiple tenants with their own credentials.