Well Architected Framework With The Azure Verified Module For Azure Kubernetes

I came across the Azure Verified Module for Azure Kubernetes Service and in its Github repo I found a Well Architected Framework (WAF) Aligned example for deploying this Terraform module. So asked myself, “What exactly makes this example of deploying AKS WAF Aligned?”

Before I get into that, let me explain what is WAF.  It is when you have a workload or solution architecture and you want to apply the ‘how’ of applying best practices against the WAF five pillars of architectural excellence.

  • Reliability – resilient, available and recoverable
  • Security – zero-trust, adhere to confidentiality, integrity and availability
  • Cost optimization – deliver sufficient return on investment
  • Operational excellence – DevOps best practices, development, observability and release management
  • Performance efficiency – workload adjusting to changes in demand without compromising user experience and conserve resources.

And the goal being to provide prescriptive technical guidance for designing and continuously refining high-quality Azure workloads.

So reading through the MS Learn WAF documentation , I found this page Architecture best practices for Azure Kubernetes Service (AKS) which list a set detailed recommendations to configure an AKS cluster. I’ll do my best to compare the WAF aligned terraform AKS deployment with some of these WAF AKS recommendations.

First get familiar with the WAF Aligned example code https://github.com/Azure/terraform-azurerm-avm-res-containerservice-managedcluster/tree/main/examples/waf-aligned

 Terraform Module Configuration WAF Alignment
private_cluster_enabled = trueSecurity
Secure network traffic to your API server by using private AKS cluster
private_dns_zone_id = azurerm_private_dns_zone.zone.idSecurity
Internal DNS resolution for the cluster’s API. Ties AKS to the private DNS zone created above, keeping traffic private.
managed_identities = {
user_assigned_resource_ids = […]
}
Security
You can avoid the overhead associated with managing and rotating service principles.
Grants the cluster a dedicated identity for resource access, aligning with least-privilege.
network_profile = {
dns_service_ip = “10.10.200.10” service_cidr = “10.10.200.0/24”
network_plugin = “azure”
}
Reliability, Performance Efficiency
The right network plugin can help ensure better compatibility and performance.
oms_agent = {
log_analytics_workspace_id = azurerm_log_analytics_workspace.workspace.id
}
Operational Excellence
Centralized monitoring & logging. Sends cluster logs/metrics to the Log Analytics workspace, simplifying observability and day-2 operations.
azure_active_directory_role_based_access_control = {
tenant_id = data.azurerm_client_config.current.tenant_id
azure_rbac_enabled = true
}
Security
The Azure Active Directory role-based access control for the Kubernetes cluster. Enforced Azure AD-based identity and role assignments.
defender_log_analytics_workspace_id = azurerm_log_analytics_workspace.workspace.id Security
Microsoft Defender for Containers helps you monitor and maintain the security of your clusters, containers, and their applications.
default_node_pool = {
name = “default”
vm_size = “Standard_DS2_v2”
node_count = 3
zones = [2, 3]
auto_scaling_enabled = true
max_count = 3
max_pods = 50
min_count = 3
vnet_subnet_id = azurerm_subnet.subnet.id
only_critical_addons_enabled = true
upgrade_settings = { max_surge = “10%” }
}
Cost Optimization, Performance Efficiency, Reliability
Selecting the right VM instance type is crucial because it directly affects the cost to run applications on AKS.

– zones = [2,3] Improves resilience by distributing nodes across availability zones.
– auto_scaling_enabled = true Dynamically scales to match demand, lowering cost when usage is lower.
– max_surge = “10%” Enables rolling upgrades in small batches to minimize downtime.
– max_pods = 50 Ensures sufficient IP addresses and resource capacity for workload scaling.
automatic_upgrade_channel = “stable” Reliability, Operational Excellence
Opting for stable, well-tested cluster releases.
Reduces the risk of instability compared to preview or rapid-release channels.
node_os_channel_upgrade = “Unmanaged”Reliability
Automatic OS patching and upgrades are disabled . The onus is on you (the operator or DevOps team) to manage and install updates manually
By manually controlling updates, you can test and schedule OS patches on your schedule, reducing the risk of unplanned reboots or incompatibilities introduced by automatic updates.
maintenance_window_auto_upgrade = {
frequency = “Weekly”

}

maintenance_window_node_os = {
frequency = “Weekly”

}
Reliability, Operational Excellence
Scheduled maintenance for predictable upgrades and patching.
Defining exact weekly update windows reduces unexpected disruptions and ensures systematic patching cycles.

In this example terraform code, there can be many more configurations that support the WAF 5 pillars of architectural excellence, but this example is just a starting point. Adding and adjusting to the configuration is dependent on your project and organizational technology priorities.

Conclusion

In developing Azure workloads and solutions with terraform modules, it is highly recommended to reference the corresponding Azure workload WAF service guides for recommendations and checklists against the five pillars of security, reliability, operational excellence, cost optimization and performance efficiency. This will further guide you to be an effective architect, designer and engineer to align to your organization’s technology principals.

References

Leave a Reply