Restore a managed instance v1.1 when it is completely gone

SOC2/CI-110

Are you experiencing a zonal failure? Follow the failover process

In MI v1.1, we now have resources running outside the VM, i.e Cloud SQL, hence the restoration comes with multiple stages. Perform the restoration in the following order.

Restoring Cloud SQL

Use cases:

  • Cloud SQL data is corrupted by a broken database migration
  • Cloud SQL data is deleted

Restore from automated backup

Below process is derived from GCP documentation

The restortion process will be performed with gcloud. Learn more about why not terraform?.

Locate the SQL instance, note the name of the instance as SQL_INSTANCE

gcloud sql instances list --project $PROJECT_ID

List all backups, note the name of the latest (or the one right before database state is corrupted) SUCCESSFUL backup as SQL_BACKUP_ID

gcloud sql backups list --instance $SQL_INSTANCE --project $PROJECT_ID

Restore the backup to the current instance.

gcloud sql backups restore $SQL_BACKUP_ID --restore-instance $SQL_INSTANCE --project $PROJECT_ID

Restore the compute instance (VM)

Use cases:

  • Docker daemon state is corrupted
  • Application data is deleted
  • VM is deleted

Assess what is deleted

Navigate to sourcegraph-managed-$CUSTOMER project, and look at existing compute instance.

Does the VM still exist?

Re-create the VM with existing data disk

  1. Open sourcegraph/deploy-sourcegraph-managed and check out to the $CUSTOMER directory
  2. Run terraform apply to reconcile the infrastructure to its definition in code.
  3. Follow confirm instance health

Re-create the VM with new data disk from disk snapshot

  1. Run gcloud compute snapshots list --project=sourcegraph-managed-$CUSTOMER --sort-by="~creationTimestamp" --limit=5 --format="table(name,creationTimestamp)" and copy the name of the latest snapshot

  2. Go to sourcegraph/deploy-sourcegraph-managed and create a new branch $CUSTOMER/restore-instance

  3. cd $CUSTOMER

  4. Edit $CUSTOMER/terraform.tfvars. NOTES: the key could be black depending on the current active instance

    disks = {
      red = { from_snapshot = "REPLACE_ME_WITH_SNAPSHOT_NAME" }
    }
    
  5. Run terraform apply to reconcile the infrastructure to its definition in code.

  6. Follow confirm instance health

  7. Commit your changes and open a Pull Request

Restore snapshot on a live VM

  1. Run gcloud compute snapshots list --project=sourcegraph-managed-$CUSTOMER --sort-by="~creationTimestamp" --limit=5 --format="table(name,creationTimestamp)" and copy the name of the latest snapshot

  2. Go to sourcegraph/deploy-sourcegraph-managed and create a new branch $CUSTOMER/restore-instance

  3. cd $CUSTOMER

  4. Edit $CUSTOMER/terraform.tfvars. NOTES: the key could be black depending on the current active instance

    disks = {
      red = { from_snapshot = "REPLACE_ME_WITH_SNAPSHOT_NAME" }
    }
    
  5. Run terraform apply twice

  6. Run gcloud compute instances stop default-$OLD_DEPLOYMENT-instance --project $PROJECT_ID

  7. Run gcloud compute instances start default-$OLD_DEPLOYMENT-instance --project $PROJECT_ID

  8. Follow confirm instance health

  9. Commit your changes and open a Pull Request