Today, it’s possible to stop the virtual machine scale set (vmss) driving an AKS cluster. You can do this in many ways, including the Azure CLI. In this post, I’ll guide you through running an Azure CLI script to stop the vmss of an AKS cluster for dev/test purposes. We’ll use Azure DevOps pipelines for the scheduling portion since Azure Automation Accounts do not support Azure CLI.
Based on rough calculations, this approach could save you roughly 46% on costs.
Edit: As suspected, Microsoft released a way to start/stop AKS clusters through the CLI, you no longer need to stop the AKS scale sets yourself.
Option A (Preferred)
Microsoft released the preview feature as an Azure CLI extension. Follow the steps on this Microsoft docs page on how to install this extension. You can come back to this page to see how to schedule a script to start/stop a cluster during off hours using Azure Pipelines.
Option B (DIY)
Stopping the cluster
Create a bash script called aks-stop.sh
.
#!/bin/bash
CLUSTER_NAME=$1
RESOURCE_GROUP=$2
NODE_RESOURCE_GROUP=$(az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME --query "nodeResourceGroup" -o tsv)
az vmss list -g $NODE_RESOURCE_GROUP --query "[].name" -o tsv | while read -r scale_set
do
echo "Shutting down scaleset in the AKS resource group. Scale Set: $scale_set"
az vmss deallocate -g $NODE_RESOURCE_GROUP -n $scale_set
done
To test it, login to Azure (az login
) and the correct subscription (az account set -s [subscription id]
.
Invoke the script.
CLUSTER_NAME="[your cluster name]"
RESOURCE_GROUP="[resource group name]"
source ./aks-stop.sh "$CLUSTER_NAME" "$RESOURCE_GROUP"
Once the script runs, a kubectl
command will show that the nodes are not ready.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-default-1-vmss000000 NotReady agent 5d2h v1.18.6
aks-default-2-vmss000001 NotReady agent 5d2h v1.18.6
aks-default-3-vmss000002 NotReady agent 5d2h v1.18.6
What does it do?
It finds the resource group where the underlying vmss are for the AKS cluster. Then it iterates through all the vmss in the resource group and deallocates them.
If you have more than one node pool, it will stop all the node pools.
Start the vmss
Create a bash script called aks-start.sh
.
#!/bin/bash
CLUSTER_NAME=$1
RESOURCE_GROUP=$2
NODE_RESOURCE_GROUP=$(az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME --query "nodeResourceGroup" -o tsv)
az vmss list -g $NODE_RESOURCE_GROUP --query "[].name" -o tsv | while read -r scale_set
do
echo "Starting scaleset in the AKS resource group. Scale Set: $scale_set"
az vmss start -g $NODE_RESOURCE_GROUP -n $scale_set
done
Invoke the script.
CLUSTER_NAME="[your cluster name]"
RESOURCE_GROUP="[resource group name]"
source ./aks-start.sh "$CLUSTER_NAME" "$RESOURCE_GROUP"
Then you will see shortly that the nodes are ready.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-default-1-vmss000000 Ready agent 5d2h v1.18.6
aks-default-2-vmss000001 Ready agent 5d2h v1.18.6
aks-default-3-vmss000002 Ready agent 5d2h v1.18.6
Scheduling for off-hours
Here’s how to “stop” your AKS cluster after-hours and then start it in the morning.
Azure Automation accounts unfortunately don’t support the Azure CLI. I find using Azure DevOps very good for this since YAML pipelines support cron-style scheduled triggers.
For example, this is how to trigger a pipeline starting at 6 pm EST. (Note: Azure Pipeline schedules have to be in UTC). For more information on the format, here’s the detailed documentation.
pr: none
trigger: none
schedules:
- cron: "0 23 * * 1-5"
displayName: "After-hours (11 pm UTC)"
always: true
branches:
include:
- master
Another example, to trigger pipelines at 7 am EST:
pr: none
trigger: none
schedules:
- cron: "0 11 * * 1-5"
displayName: "Mornings (11 am UTC)"
always: true
branches:
include:
- master
Putting it together
Here’s an example of a pipeline starting the cluster each morning at 7 am EST.
pr: none
trigger: none
schedules:
- cron: "0 11 * * 1-5"
displayName: "Mornings (11 am UTC)"
always: true
branches:
include:
- master
variables:
ClusterName: [name of cluster]
ResourceGroup: [name of resource group]
steps:
- task: AzureCLI@2
inputs:
azureSubscription: '[subscription service connection]'
scriptType: 'bash'
scriptLocation: 'scriptPath'
scriptPath: './aks-start.sh'
arguments: '"$(ClusterName)" "$(ResourceGroup)"'
Here’s an example of a pipeline stopping the cluster each morning at 7 pm EST.
pr: none
trigger: none
schedules:
- cron: "0 23 * * 1-5"
displayName: "After-hours (11 pm UTC)"
always: true
branches:
include:
- master
variables:
ClusterName: [name of cluster]
ResourceGroup: [name of resource group]
steps:
- task: AzureCLI@2
inputs:
azureSubscription: '[subscription service connection]'
scriptType: 'bash'
scriptLocation: 'scriptPath'
scriptPath: './aks-stop.sh'
arguments: '"$(ClusterName)" "$(ResourceGroup)"'