Scheduling backups in a remote server

This page describes how to schedule backups for Cassandra without the Cloud Storage. In this method, backups are stored on a remote server specified by you instead of a Cloud Storage bucket. Apigee uses SSH to communicate with the remote server.

You must schedule the backups as cron jobs. Once a backup schedule has been applied to your hybrid cluster, a Kubernetes backup job is periodically executed according to the schedule in the runtime plane. The job triggers a backup script on each Cassandra node in your hybrid cluster that collects all the data on the node, creates an archive (compressed) file of the data, and sends the archive to the server specified in your overrides.yaml file.

The following steps include common examples for completing specific tasks, like creating an SSH key pair. Use the methods that are appropriate to your installation.

The procedure has the following parts:

Set up the server and SSH

  1. Select a Backup Server: Choose a Linux or Unix server with adequate storage for your backups and ensure it can be accessed via SSH from your Apigee hybrid runtime plane.
  2. Configure the SSH Server: Either install an SSH server or confirm that an existing one is secure.
  3. Create an SSH Key Pair: Generate an SSH key pair without a passphrase:For example:
    ssh-keygen -t rsa -b 4096 -C exampleuser@example.com
      Enter file in which to save the key (/Users/exampleuser/.ssh/id_rsa): $APIGEE_HOME/hybrid-files/certs/ssh_key
      Enter passphrase (empty for no passphrase):
      Enter same passphrase again:
      Your identification has been saved in ssh_key
      Your public key has been saved in ssh_key.pub
      The key fingerprint is:
      SHA256:DWKo334XMZcZYLOLrd/8HNpjTERPJJ0mc11UYmrPvSA exampleuser@example.com
      The key's randomart image is:
      +---[RSA 4096]----+
      |          +.  ++X|
      |     .   . o.=.*+|
      |    . o . . o==o |
      |   . . . =oo+o...|
      |  .     S +E oo .|
      |   . .   .. . o .|
      |    . . .  . o.. |
      |     .  ...o ++. |
      |      .. .. +o+. |
      +----[SHA256]-----+

    Where: [email protected] is a string. Any string that follows -C in the ssh-keygen command becomes a comment included in the newly created ssh key. The input string can be any string. When you use an account name in the form of [email protected], you can quickly identify which account goes with the key.

    The command will generate two SSH key files, A private key file (for example `ssh_key.rsa`) and a public key file (for example, `ssh_key.pub`).

    Save the private key to a location that your runtime plane can access.

  4. Add a User Account: On the backup server, create a user named apigee with a home directory under /home/apigee. Make sure the new apigee user has a home directory under /home.
  5. Set Up the .ssh Directory: On the backup server, create a .ssh directory in /home/apigee/.ssh. For example:
    cd /home/apigee
          mkdir .ssh
          cd .ssh
          vi authorized_keys
  6. Install the Public Key: Place the public key into the authorized_keys file within the /home/apigee/ directory. The backup directory can be any directory as long as the apigee user has access to it.Paste the contents of the ssh public key file into the file.
  7. Verify SSH Access: Test the connection from your local machine or a cluster node:
  8. ssh -i PATH_TO_PRIVATE_KEY_FILE apigee@BACKUP_SERVER_IP

Set the schedule and destination for backup

You set the schedule and destination for backups in your overrides.yaml file.

  1. Add the following parameters to your overrides.yaml file:

    Parameters

    cassandra:
      backup:
        enabled: true
        keyFile: "PATH_TO_PRIVATE_KEY_FILE"
        server: "BACKUP_SERVER_IP"
        storageDirectory: "/home/apigee/BACKUP_DIRECTORY"
        cloudProvider: "HYBRID" # required verbatim "HYBRID" (all caps)
        schedule: "SCHEDULE"

    Example

    cassandra:
      backup:
        enabled: true
        keyFile: "private.key" # path relative to apigee-datastore path
        server: "34.56.78.90"
        storageDirectory: "/home/apigee/cassbackup"
        cloudProvider: "HYBRID"
        schedule: "0 2 * * *"

    Where:

    Property Description
    backup:enabled Backup is disabled by default. You must set this property to true.
    backup:keyFile

    PATH_TO_PRIVATE_KEY_FILE

    The path on your local file system to the SSH private key file (named ssh_key in the step where you created the SSH key pair). This path must be relative to the apigee-datastore chart directory.

    backup:server

    BACKUP_SERVER_IP

    The IP address of your backup server.

    backup:storageDirectory

    BACKUP_DIRECTORY

    The name of the backup directory on your backup server. This must be a directory within home/apigee (the backup directory is named cassandra_backup in the step where you created the backup directory).

    backup:cloudProvider

    GCP/HYBRID

    For a Cloud Storage backup, set the property to GCP. For example, cloudProvider: "GCP".

    For a remote server backup, set the property to HYBRID. For example, cloudProvider: "HYBRID".

    backup:schedule

    SCHEDULE

    The time when the backup starts, specified in standard crontab syntax. Times are in the local time zone of the Kubernetes cluster. Default: 0 2 * * *

  2. Apply the backup configuration to the storage scope of your cluster:
    helm upgrade datastore apigee-datastore/ \
      --namespace apigee \
      --atomic \
      -f OVERRIDES_FILE.yaml
    

    Where OVERRIDES_FILE is the path to the overrides file you just edited.

  3. Verify the backup job. For example:
    kubectl get cronjob -n apigee
    NAME                      SCHEDULE     SUSPEND   ACTIVE   LAST SCHEDULE   AGE
    apigee-cassandra-backup   33 * * * *   False     0        <none>          94s

Troubleshooting

  1. Test the connection from a Cassandra pod. You need to make sure that your Cassandra pods can connect to your backup server using SSH:
    1. Log into the shell of your Cassandra pod. For example:
      kubectl exec -it -n apigee APIGEE_CASSANDRA_DEFAULT_0 -- /bin/bash

      Where APIGEE_CASSANDRA_DEFAULT_0 is the name of a Cassandra pod. Change this to the name of the pod you want to connect from.

    2. Connect by SSH to your backup server, using the private SSH key mounted the Cassandra pod and server IP address:
      ssh -i /var/secrets/keys/key apigee@BACKUP_SERVER_IP
  2. If you have problems accessing your remote server from the Cassandra pod, please check your ssh configuration on the remote server again and also make sure that upgrading the datastore was successful.
  3. You can check if Cassandra uses the correct private key by running the following command while you are logged in to your Cassandra pod, and compare the output with the private key you created:
    cat /var/secrets/keys/key