Assuming you already set-up Velero on your primary cluster, you can restore all the configuration and applications in a similar cluster using Velero. This can be really useful for scenarios where you want to test dangerous changes without impacting app dev teams using a cluster.
I recently had this scenario where I wanted to experiment with far fetched ideas on a cluster. However, I needed to use all the existing configurations. Because I was not using GitOps or extensive re-hydration CI/CD pipelines, I was able to use Velero to restore my Kubernetes configuration onto a new 1-node “sandbox” cluster.
Install Velero on new cluster
Assuming you already configured Velero on your main cluster, like this, then start by creating a namespace.
kubectl create ns velero
Then from your existing cluster, get the velero-credentials
secret holding the connection information to your Azure storage account. For example,
# Against the main cluster (cluster #1)
kubectl get secret -o yaml velero-credentials -n velero > velero-credentials.yaml
Then switch to the new cluster and place it back.
# Against the new cluster (cluster #2)
kubectl apply -f velero-credentials.yaml
rm velero-credentials.yaml
Then install Velero on the new cluster. I used the same Helm command I used on the main cluster (cluster #1) some while back.
# Against new cluster (cluster #2)
export AZURE_SUBSCRIPTION_ID="" # to get current sub: az account show --query "id" -o tsv
export STORAGE_RESOURCE_GROUP=""
export STORAGE_ACCOUNT=""
export STORAGE_CONTAINER_NAME=""
helm install velero vmware-tanzu/velero --namespace velero --version 2.13.2 \
--set "initContainers[0].image=velero/velero-plugin-for-microsoft-azure:v1.1.0" \
--set "initContainers[0].imagePullPolicy=IfNotPresent" \
--set "initContainers[0].volumeMounts[0].mountPath=/target" \
--set "initContainers[0].volumeMounts[0].name=plugins" \
--set "initContainers[0].name=velero-plugin-for-azure" \
--set credentials.existingSecret='velero-credentials' \
--set configuration.provider='azure' \
--set configuration.backupStorageLocation.bucket=$STORAGE_CONTAINER_NAME \
--set configuration.backupStorageLocation.config.resourceGroup=$STORAGE_RESOURCE_GROUP \
--set configuration.backupStorageLocation.config.storageAccount=$STORAGE_ACCOUNT \
--set configuration.backupStorageLocation.config.subscriptionId=$AZURE_SUBSCRIPTION_ID \
--set configuration.volumeSnapshotLocation.name='azure-eastus' \
--set configuration.volumeSnapshotLocation.config.resourceGroup=$STORAGE_RESOURCE_GROUP \
--set configuration.volumeSnapshotLocation.config.subscriptionId=$AZURE_SUBSCRIPTION_ID
This also assumes that you’re using Azure as the provider/backend for Velero.
Modify Velero’s backend to read-only
Find the backup location configured. It’s probably default
.
# Against new cluster (cluster #2)
$ kubectl -n velero get backupstoragelocation.velero.io
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE
default azure backups-azaks-blah-blahh-staging-001 Available 2021-01-29 12:57:49 -0800 PST ReadWrite
Now, unfortunately, there’s no CLI option on Velero 1.5.1 to edit a backup-location
. So, let’s edit the CRD instead.
kubectl -n velero edit backupstoragelocation.velero.io <NAME>
Using Vim, add/set the spec.accessMode
to ReadOnly
.
Then, verify that the backup-location is ReadOnly
instead of ReadWrite
.
# Against new cluster (cluster #2)
$ velero get backup-locations
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE
default azure backups-azaks-blah-blahh-staging-001 Available 2021-01-29 12:57:49 -0800 PST ReadOnly
Try a restore
Great! Now let’s try restoring from a scheduled backup. This is assuming that you have some scheduled backups on your main cluster (cluster #1). If not, you’ll have to skip this and create a manual backup.
# Against new cluster (cluster #2)
$ velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
every-day-at-1-20210129010007 Completed 0 2 2021-01-28 17:00:07 -0800 PST 29d default <none>
every-day-at-1-20210128010006 Completed 0 2 2021-01-27 17:00:06 -0800 PST 28d default <none>
every-day-at-1-20210127010006 Completed 0 2 2021-01-26 17:00:06 -0800 PST 27d default <none>
every-day-at-1-20210126010006 Completed 0 2 2021-01-25 17:00:06 -0800 PST 26d default <none>
every-day-at-1-20210125010005 Completed 0 2 2021-01-24 17:00:05 -0800 PST 25d default <none>
every-day-at-1-20210124010005 Completed 0 2 2021-01-23 17:00:05 -0800 PST 24d default <none>
Pick a backup and restore it.
velero restore create --from-backup every-day-at-1-20210129010007
Velero will also give you the command to check the progress. For example:
# Against new cluster (cluster #2)
$ velero restore describe every-day-at-1-20210129010007-20210129130618
Name: every-day-at-1-20210129010007-20210129130618
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: InProgress
Started: 2021-01-29 13:06:19 -0800 PST
Completed: <n/a>
Backup: every-day-at-1-20210129010007
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
It took only about a minute or two. For me, there were a few warnings and some errors. It tried to restore itself, therefore, there were some warnings related to that. But, most of the deployments did not work because of the container registry. In my case, I did not add the ACR integration with my new cluster so many of the images failed to pull down. Once I “attached” the ACR to AKS, the deployments self-healed.
There might be unhealthy deployments
Expect some of the Kubernetes deployments to not work. In my case, the Rancher agent was not working because one with the same name and secret was already running in my main cluster (cluster #1). Therefore, Rancher server was denying my second agent.
Also, my NGINX ingress controller was not healthy. This was because when I configured it on my main cluster, I tied it to a public static IP on Azure. The new cluster’s identity did not have permissions to modify that static IP nor did I want it to.
That’s it. Thanks for reading.