agility-docs

General Troubleshooting

AGILITY Database Backup and Restore (VM)

S3 Backup Setting Doesn’t Create New Backups

After configuring backups to a remote S3 bucket (e.g., AWS S3, MinIO, etc.), new backups may not appear. This issue is often caused by misconfiguration in the S3 location:

  1. Check the status by running agility backup s3 status --debug.
  2. Alternatively, inspect other logs:

     sudo su -
        
     # Operator logs
     kubectl -n kube-system logs -l postgres-operator.crunchydata.com/control-plane=pgo
    
     # Backups jobs logs
     kubectl -n agility logs -l postgres-operator.crunchydata.com/pgbackrest-repo=repo2
    
  3. Identify if there’s a problem with credentials or connectivity by running agility backup s3 settings list --sensitive
  4. Delete the existing configuration with agility backup s3 settings delete.
  5. Reconfigure the S3 backup location.
  6. Trigger a backup with agility backup s3 create.
  7. Verify that the backups are taken by using agility backup s3 status.

S3 Restore from Backups Failed

There are cases where Point in Time Recovery (PITR) fails due to various issues, with the most common being the absence of a reference for that specific time in the backups.

If the restore command fails, it provides details about the failure, and the database may not be functional after the restore.

  1. Check the database backups status with agility backup s3 status --debug.
  2. Examine the logs of other components:

     sudo su -
    
     # Operator logs
     kubectl -n kube-system logs -l postgres-operator.crunchydata.com/control-plane=pgo
    
     # Database logs
     kubectl -n agility logs postgres-operator.crunchydata.com/role=master
    
     # Restore job logs
     kubectl -n agility logs -l batch.kubernetes.io/job-name=agility-db-pgbackrest-restore
    
  3. To restore the database to a functional state, simply restore to the latest available backup using agility backup s3 restore.
  4. Then, attempt to restore to another time if needed based on your requirements.
  5. Check the general status and validate that new backups can be taken after a successful restore.

S3 Clone from Another Backups Location Doesn’t Work

This operation is destructive as it deletes the current AGILITY database running on the system and clones/restores it from another location. This is primarily used for disaster recovery cases and instances migrations.

  1. Check the logs from the clone command.
  2. Identify if there is a problem with credentials or connectivity.
  3. Run the clone again, ensuring correct values (e.g., credentials enclosed in double quotes or other properly defined values).
  4. If the issue persists, the original location may not contain a valid backup and could be corrupted. Use another source location if available.
  5. When re-enabling backups using agility backup s3 settings apply, DO NOT USE existing backup locations used previously, as this might prevent the database from starting. Instead, use an empty S3 folder location where no other backups are available.