AKS-Based Microservices Platform

Cloud-Native Application Architecture with Secure Orchestration & Full-Stack Observability

github

https://github.com/sergeksfumey/aks-microservices-platform

ARCHITECTURE OVERVIEW

AKS Microservices Platform — Containerised Workloads with CI/CD, HPA Autoscaling & AAD Security

Docker-containerised microservices on Azure Kubernetes Service with Azure DevOps and GitHub Actions CI/CD, Helm-managed deployments, Horizontal Pod Autoscaling, Azure Key Vault secret injection, and Prometheus/Grafana observability

The architecture migrates a monolithic application into independently deployable, containerised microservices orchestrated by Azure Kubernetes Service. Each microservice is packaged as a Docker image, built through Azure DevOps CI pipelines triggered by GitHub repository events, and pushed to Azure Container Registry with versioned tags. The CD pipeline deploys services to AKS using Helm charts — providing parameterised, rollback-capable releases — with a rolling update strategy that replaces pods incrementally, sustaining near-zero downtime throughout every deployment cycle.

Inside the AKS cluster, node pools separate workload types for isolation and resource efficiency. Kubernetes Network Policies enforce explicit allow rules between services, preventing lateral movement within the cluster. Pod Security Policies restrict privilege escalation and host resource access, reducing the blast radius of any compromised container. RBAC is integrated with Azure Active Directory, ensuring that both human operators and automated pipelines authenticate through AAD before accessing the Kubernetes API server — with namespace-scoped roles enforcing least privilege across teams.

Secrets — database connection strings, API keys, certificates — are stored in Azure Key Vault and injected into pod environments at runtime via Managed Identity. No credentials appear in manifests, Helm values files, or container images. Microservices access Azure SQL, Blob Storage, and Cosmos DB through private endpoints authenticated by Managed Identity, eliminating credential exposure across all data persistence layers.

External traffic enters through Azure Application Gateway, which handles SSL termination and Layer 7 routing before passing requests to the Kubernetes Ingress Controller for path-based and host-based dispatch to the correct microservice. The Horizontal Pod Autoscaler continuously monitors CPU and memory metrics, scaling pod replicas dynamically in response to demand — optimising resource allocation without manual intervention. Prometheus scrapes metrics from all pods, feeding Grafana dashboards that provide real-time visibility into request rates, error rates, HPA scaling events, and node health — with Azure Monitor and Log Analytics providing the cluster-level observability layer for operations and compliance teams.

Description

This case study is an independent architecture design exercise developed to demonstrate microservices platform architecture methodology on Azure Kubernetes Service. It was not associated with a production deployment. The scenario is based on the application modernisation requirements typical of organisations migrating from monolithic application architectures toward independently deployable, scalable microservices on Kubernetes.This study focuses on microservices runtime architecture — service decomposition, inter-service communication, data tier design, ingress governance, and full-stack observability. The GitOps deployment delivery model is covered separately in the Cloud-Native Application Factory case study.

Key Focus Areas:

Kubernetes & Microservices Architecture
Inter-Service Communication Design
Cloud-Native Security & Governance
Observability & Elastic Scalability
Application Gateway WAF Ingress
Polyglot Data Architecture

Executive Summary

Architected a cloud-native microservices platform on Microsoft Azure using Azure Kubernetes Service — modernising a legacy monolithic application into independently deployable, scalable services with secure inter-service communication, polyglot data architecture, Application Gateway WAF-enabled ingress, centralized secret governance through Azure Key Vault CSI Driver, elastic scaling through HPA, and full-stack observability through Azure Monitor, Prometheus, and Grafana.

The architecture addresses the specific engineering challenges of microservices decomposition — defining service boundaries, governing inter-service communication patterns, designing per-service data isolation, and establishing observability that spans both cloud infrastructure and Kubernetes application layers simultaneously.

The design demonstrates how legacy monolithic applications can be modernised into resilient, independently scalable microservices ecosystems while maintaining operational governance, security controls, and observability maturity.

Business Drivers

Monolithic application architectures create compounding operational constraints as application complexity and user demand grow — tightly coupled components cannot be scaled independently, deployments affect the entire application simultaneously, and component failures cascade across the full system.

This architecture was designed to address the application modernisation requirements of organisations where monolithic architecture results in:

Scalability bottlenecks — high-demand services cannot be scaled independently without scaling the entire application
High-risk deployments — any code change requires full application redeployment creating broad blast radius for failures
Tight component coupling — internal dependencies prevent independent service evolution and create cascading failure risk
Limited fault isolation — a failure in any component can propagate across the entire application
Slow delivery cycles — full application testing and deployment for any change, regardless of scope
Inefficient resource utilisation — entire application must be sized for peak demand of its most resource-intensive component

Operational Constraints

The architecture was designed to operate within the following constraints typical of microservices migration scenarios:

Existing application components must be decomposed into services without rewriting all business logic simultaneously — incremental decomposition from monolith boundaries
Service-to-service communication must be governed — unrestricted inter-service communication recreates monolithic coupling in distributed form
Each service requires independent data storage — shared databases between services create coupling that defeats microservices independence
Infrastructure scaling must be elastic — services with different demand profiles require independent autoscaling without manual intervention
Security controls must be centralized — per-service secret management and access control creates governance inconsistency at scale
Observability must span both infrastructure and application layers — Kubernetes cluster health alone is insufficient for microservices troubleshooting
Deployment pipelines must support independent service deployment — a change to one service must not require redeployment of unrelated services

Objectives

Decompose monolithic application into independently deployable microservices with clear domain boundaries
Define inter-service communication patterns — synchronous REST/gRPC for request-response, asynchronous messaging for event-driven workflows
Implement polyglot data architecture matching each service's data requirements to appropriate storage technology
Deploy Application Gateway with WAF for Layer 7 ingress governance and external threat protection
Enforce service isolation through Kubernetes Network Policies governing east-west traffic between services
Centralise secret management through Azure Key Vault CSI Secrets Store Driver — no hardcoded credentials in workloads
Implement elastic scaling through HPA responding to per-service demand independently
Establish full-stack observability combining Azure Monitor (infrastructure) with Prometheus and Grafana (application layer)
Standardise service deployments through Helm charts ensuring consistent, versioned, repeatable deployments

Architecture Principles

Domain-driven service boundaries — services aligned to business domains not technical layers
Independent service lifecycle — each service deployable, scalable, and updatable without affecting others
Data isolation per service — no shared databases between services; each service owns its data store
Communication governance — synchronous patterns for required consistency, asynchronous for resilience and decoupling
Immutable container images — services deployed as versioned, immutable container images through ACR
Secure-by-default workload communication — network policies enforcing least-privilege inter-service connectivity
Centralized secret governance — Key Vault CSI Driver providing secrets to workloads without environment variable exposure
Full-stack observability — infrastructure metrics, Kubernetes metrics, and application-level metrics correlated in unified dashboards
Managed platform operations — AKS managed control plane reducing Kubernetes operational overhead

Architecture Overview

The solution is structured as a seven-layer cloud-native microservices platform integrating Kubernetes orchestration, container registry, ingress governance, CI/CD automation, polyglot data services, security and governance, and full-stack observability.

1. Kubernetes Orchestration Layer

The platform leverages Azure Kubernetes Service with a multi-node pool architecture optimised for heterogeneous microservice workload profiles.

AKS Cluster Configuration:

Component	Configuration	Rationale
Network plugin	Azure CNI	Pod-level VNet integration enabling Network Policy support
Authentication	Azure AD + Kubernetes RBAC	Unified identity governance through Entra ID
Workload Identity	Azure Workload Identity	Secretless pod authentication to Azure services
Node pools	Multiple — by workload profile	CPU-optimised for compute services, memory-optimised for data services

Multi-Node Pool Design:

Node Pool	VM SKU	Purpose	Autoscale
System pool	Standard DS2 v2	AKS system components	Fixed — 3 nodes
General workload	Standard D4s v3	Standard microservices	2–10 nodes
Memory-optimised	Standard E4s v3	Data-intensive services	1–5 nodes

Namespace Architecture:

Namespace	Services	Access Boundary
api-gateway	External ingress service	Ingress controller
order-service	Order domain services	Internal only
inventory-service	Inventory domain services	Internal only
payment-service	Payment domain services	Restricted — payment team only
notification-service	Event-driven notification	Internal only
data-access	Shared data access layer	Service namespaces only
monitoring	Prometheus, Grafana	Platform team only

2. Microservices Decomposition

The monolithic application is decomposed along domain-driven boundaries — each service encapsulating a distinct business capability with independent deployment and data ownership.

Service Decomposition Model:

Service	Domain	Responsibility	Communication Pattern
API Gateway Service	Routing	External request routing, auth validation	Synchronous — proxies to internal services
Order Service	Orders	Order lifecycle management	Synchronous (REST) + Async (events)
Inventory Service	Inventory	Stock management and availability	Synchronous (REST)
Payment Service	Payments	Payment processing and validation	Synchronous (REST) — strict consistency required
Notification Service	Notifications	Email, SMS, push notifications	Asynchronous (event-driven)
User Service	Identity	User profile and preference management	Synchronous (REST)

Decomposition Principles Applied:

Single Responsibility — each service owns one business domain exclusively
Bounded Context — service interfaces expose only what consuming services require, not internal data models
Database per Service — each service owns its data store; no cross-service database access
Strangler Fig Pattern — new microservices replace monolithic modules incrementally rather than requiring full rewrites

3. Inter-Service Communication Design

Inter-service communication patterns are explicitly governed — the choice between synchronous and asynchronous communication is a deliberate architectural decision based on consistency and coupling requirements.

Synchronous Communication — REST/gRPC: Used where the calling service requires an immediate response and strong consistency guarantees:

Order Service → Inventory Service: stock reservation requires synchronous confirmation before order commitment
Order Service → Payment Service: payment authorisation requires synchronous confirmation before order completion
API Gateway → all services: external request routing requires synchronous response

REST over HTTPS for service-to-service communication with mutual authentication through Workload Identity tokens.

Asynchronous Communication — Event-Driven: Used where services can operate independently and eventual consistency is acceptable:

Order Service → Notification Service: order confirmation events published to Azure Service Bus — Notification Service consumes asynchronously without Order Service waiting for notification delivery
Inventory Service → Notification Service: low stock alert events
Payment Service → Order Service: payment completion events triggering order status updates

Azure Service Bus provides the message broker for asynchronous inter-service communication — durable message delivery with dead letter queue support for failed message handling.

Circuit Breaker Pattern: Synchronous service-to-service calls implement circuit breaker logic — if a downstream service becomes unavailable, the circuit breaker opens and returns a fallback response rather than allowing cascading timeouts to propagate across the service mesh.

4. Ingress & Traffic Management Layer

External traffic enters through Azure Application Gateway with WAF enabled — providing Layer 7 load balancing, SSL/TLS termination, and web application firewall protection before traffic reaches the Kubernetes cluster.

Azure Application Gateway with WAF:

OWASP Core Rule Set (CRS) 3.2 enforcement protecting against OWASP Top 10 attack categories
SSL/TLS termination at the Application Gateway — backend services receive decrypted HTTP within the private VNet
Path-based and host-based routing directing external requests to the API Gateway service namespace
Bot protection rules filtering automated malicious traffic before cluster ingress
Rate limiting preventing abuse of public-facing API endpoints

Why Application Gateway over NGINX Ingress: NGINX Ingress Controller handles Layer 7 routing effectively but provides no WAF capability without additional configuration overhead. Azure Application Gateway WAF provides managed OWASP rule sets with Microsoft-maintained updates — appropriate for externally exposed enterprise application endpoints where managed WAF reduces operational security overhead.

Kubernetes Ingress Controller (Internal): An internal Kubernetes ingress controller manages routing within the cluster — directing Application Gateway-forwarded requests to the correct service namespace and pod endpoints based on path and host rules.

5. Polyglot Data Architecture

Each microservice owns its data store — the technology selected matches the service's specific data access and consistency requirements.

Service	Data Store	Technology	Rationale
Order Service	Relational — transactional	Azure SQL Database	ACID transactions required for order lifecycle
Inventory Service	Relational — structured	Azure SQL Database	Structured inventory data with referential integrity
Payment Service	Relational — auditable	Azure SQL Database	Financial transactions requiring full audit trail
User Service	Document — flexible schema	Azure Cosmos DB	User preferences with evolving schema requirements
Notification Service	Queue + blob	Azure Service Bus + Blob	Message-driven with attachment storage
Session/Cache	Key-value	Azure Cache for Redis	Low-latency session state for API Gateway

Database per Service Enforcement: Network Policies and service connection string governance enforce database isolation — Order Service connection strings provide access to the order database only, with no technical capability to access inventory or payment databases.

6. Security & Governance Layer

Security controls are integrated at the platform layer — applied consistently across all services rather than configured per-service.

Azure Workload Identity — Secretless Authentication: All services authenticate to Azure resources (Key Vault, SQL, Blob Storage, Service Bus) through Azure Workload Identity — federated identity tokens rather than connection string credentials stored in environment variables or Kubernetes Secrets.

Azure Key Vault CSI Secrets Store Driver: Secrets required during application startup (database connection strings, API keys, third-party service credentials) are mounted as files through the CSI Secrets Store Driver:

yaml

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "azure-keyvault-provider"

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "azure-keyvault-provider"

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "azure-keyvault-provider"

Secrets are mounted as files rather than environment variables — preventing secret values from appearing in process environment dumps or Kubernetes pod describe output.

Kubernetes Network Policies — East-West Traffic Control:

Source	Destination	Port	Policy
API Gateway	Order Service	8080	Allow
API Gateway	User Service	8080	Allow
Order Service	Inventory Service	8080	Allow
Order Service	Payment Service	8443	Allow
All services	Notification Service	Deny inbound	Notification is event-driven only
All services	data-access namespace	5432/1433	Allow from authorised namespaces only
Default	Default	All	Deny — explicit allow required

Kubernetes RBAC — Cluster Access Governance:

Developers: namespace-scoped read access to their service namespaces — no cluster-admin
Platform team: full cluster access for infrastructure management
CI/CD service account: deploy permissions scoped to specific namespaces — no cluster-wide privileges
Monitoring service account: read access to all namespaces for metrics collection

7. CI/CD & DevOps Automation Layer

Deployment automation provides independent per-service CI/CD pipelines — a change to the Order Service does not trigger redeployment of the Payment Service.

Per-Service Pipeline: Each microservice has an independent pipeline triggered by commits to its source directory:

trigger:
  paths:
    include:
      - services/order-service/*   # Only triggers on Order Service changes

trigger:
  paths:
    include:
      - services/order-service/*   # Only triggers on Order Service changes

trigger:
  paths:
    include:
      - services/order-service/*   # Only triggers on Order Service changes

Pipeline Stages:

Build — Docker image build from service Dockerfile
Test — unit and integration test execution
Scan — container image vulnerability scanning through Microsoft Defender for Containers
Push — image push to ACR with commit SHA tag
Deploy — Helm chart upgrade to target environment namespace

Helm Chart Structure: Each service is packaged as a Helm chart — values files provide environment-specific configuration without modifying chart templates:

order-service/
├── Chart.yaml
├── values.yaml          # Default values
├── values-dev.yaml      # Development overrides
├── values-prod.yaml     # Production overrides
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    └── networkpolicy.yaml

order-service/
├── Chart.yaml
├── values.yaml          # Default values
├── values-dev.yaml      # Development overrides
├── values-prod.yaml     # Production overrides
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    └── networkpolicy.yaml

order-service/
├── Chart.yaml
├── values.yaml          # Default values
├── values-dev.yaml      # Development overrides
├── values-prod.yaml     # Production overrides
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    └── networkpolicy.yaml

8. Elastic Scaling Layer

Horizontal Pod Autoscaler governs independent per-service scaling based on service-specific demand metrics.

HPA Configuration per Service:

Service	Scale Metric	Min Replicas	Max Replicas	Scale-Out Threshold
API Gateway	CPU 70%	3	20	Immediate — latency-sensitive
Order Service	CPU 75%	2	10	Standard
Payment Service	CPU 60%	3	8	Conservative — financial workload
Notification Service	Queue depth	2	15	Message backlog-driven
Inventory Service	CPU 75%	2	8	Standard

Payment Service uses a more conservative scale threshold reflecting the sensitivity of payment processing workloads — aggressive scaling of payment services introduces risk of partial transaction states during rapid scale events.

Notification Service scales based on Azure Service Bus queue depth rather than CPU — message backlog is the appropriate scaling metric for event-driven consumer services where CPU utilisation may remain low even under high message volume.

9. Monitoring & Observability Layer

Full-stack observability spans three visibility layers — cloud infrastructure, Kubernetes platform, and application service metrics — correlated in unified Grafana dashboards.

Azure Monitor — Infrastructure Layer:

AKS cluster health, node utilisation, and control plane metrics
Container Insights providing pod-level CPU, memory, and restart monitoring
Azure Application Gateway access logs, WAF rule hits, and backend health
Azure Service Bus message throughput, dead letter queue depth, and consumer lag

Prometheus — Kubernetes & Application Layer:

Kubernetes cluster metrics through kube-state-metrics
Node-level metrics through node-exporter
Per-service application metrics through service-level Prometheus endpoints
Custom business metrics exposed by each service (order creation rate, payment success rate, inventory update throughput)

Grafana — Unified Dashboards:

Service health dashboard — per-service error rate, latency p50/p95/p99, and throughput
Infrastructure dashboard — node utilisation, pod scheduling, and cluster capacity
Business metrics dashboard — order volume, payment success rate, and inventory levels
Alerting rules forwarding critical service degradation to PagerDuty or Azure Monitor Alerts

Distributed Tracing: Application Insights SDK integration providing distributed trace correlation across service-to-service call chains — enabling end-to-end request tracing from API Gateway through Order → Inventory → Payment service calls with latency breakdown per service hop.

Technologies Used

Category	Technologies
Container Orchestration	Azure Kubernetes Service (AKS), Kubernetes
Containerisation	Docker
Package Management	Helm
Container Registry	Azure Container Registry (ACR)
Ingress & WAF	Azure Application Gateway with WAF, Kubernetes Ingress Controller
CI/CD	Azure DevOps, GitHub Actions
Message Broker	Azure Service Bus
Data Services	Azure SQL Database, Azure Cosmos DB, Azure Cache for Redis, Azure Blob Storage
Identity & Access	Microsoft Entra ID, Azure Workload Identity, Kubernetes RBAC
Secret Management	Azure Key Vault, CSI Secrets Store Driver
Network Security	Kubernetes Network Policies, Azure CNI
Monitoring	Azure Monitor, Container Insights, Prometheus, Grafana, Application Insights

Key Challenges Addressed

Defining service boundaries without recreating monolithic coupling — addressed through domain-driven decomposition applying bounded context principles — each service encapsulates a distinct business domain with explicit interface contracts rather than shared internal data models.

Governing inter-service communication in distributed systems — addressed through explicit communication pattern selection — synchronous REST for consistency-required interactions, asynchronous Service Bus messaging for event-driven workflows — preventing uncontrolled inter-service dependencies.

Achieving near-zero downtime during deployments — addressed through Kubernetes rolling update strategy with configured maxUnavailable and maxSurge parameters — new pod versions are progressively deployed while existing pods continue serving traffic until health checks confirm new pod readiness.

Securing workloads within a multi-service Kubernetes environment — addressed through Kubernetes Network Policies enforcing least-privilege east-west traffic control, Azure Workload Identity for secretless authentication, and CSI Secrets Store Driver for secure secret injection without environment variable exposure.

Achieving full-stack observability across application layers — addressed through three-tier observability stack — Azure Monitor for infrastructure and cluster health, Prometheus for Kubernetes-native and application-level metrics, and Application Insights for distributed tracing across service call chains.

Standardising deployment workflows across independent microservices — addressed through per-service Helm charts with environment-specific values files — consistent deployment templating with service-specific and environment-specific parameterisation.

Design Decisions & Rationale

Microservices over Monolithic Architecture : Monolithic scaling requires scaling the entire application for the demand of its most resource-intensive component. Microservices allow independent scaling per service — Order Service scales independently from Payment Service based on respective demand. The operational cost is increased distributed systems complexity — network latency, partial failure handling, and distributed tracing requirements that do not exist in monolithic architectures.

AKS over Self-Managed Kubernetes : Self-managed Kubernetes requires control plane management — etcd backup, API server availability, certificate rotation, and upgrade orchestration. AKS eliminates this operational overhead through a Microsoft-managed control plane — reducing Kubernetes operational scope to node pool management and workload configuration. The trade-off is reduced control plane customisation capability — acceptable for the vast majority of enterprise workload requirements.

Application Gateway WAF over NGINX Ingress : NGINX Ingress provides flexible Kubernetes-native ingress routing but has no built-in WAF capability. Azure Application Gateway WAF provides managed OWASP Core Rule Set protection with Microsoft-maintained rule updates — appropriate for externally exposed enterprise applications where WAF operational management overhead should be minimised. The trade-off is higher cost than NGINX Ingress and tighter Azure platform coupling.

Azure Service Bus for Asynchronous Communication : Direct service-to-service HTTP calls for all inter-service communication creates tight coupling and cascading failure risk — a slow downstream service blocks upstream callers. Azure Service Bus provides durable, decoupled message delivery — Notification Service processes events at its own pace without blocking Order Service. Dead letter queue support ensures failed messages are retained for investigation rather than silently dropped.

Database per Service over Shared Database : Shared databases between microservices create coupling through shared schema — changes to a database table affect all services using it. Database per service enforces strict data isolation — each service's data is independent and can evolve without cross-service coordination. The operational cost is multiple database instances to manage rather than one.

CSI Secrets Store Driver over Kubernetes Secrets : Kubernetes Secrets are base64-encoded — not encrypted by default in etcd without additional cluster configuration. The CSI Secrets Store Driver retrieves secrets from Azure Key Vault at pod startup and mounts them as files — secrets never reside in Kubernetes Secrets objects or etcd, eliminating the base64 encoding security limitation. Secrets expire and rotate in Key Vault without requiring pod restarts when configured with auto-rotation.

Trade-offs & Design Constraints

Microservices Distributed Systems Complexity : Microservices introduce distributed systems problems that do not exist in monolithic architectures — network partitions, partial failures, eventual consistency, and distributed tracing requirements. Engineers familiar with monolithic development require significant skill development to effectively debug distributed service call failures and reason about eventual consistency semantics. The scalability and deployment independence benefits must be weighed against this operational complexity increase — microservices are not universally appropriate for all application sizes and team capabilities.

Inter-Service Latency Overhead : Service-to-service HTTP calls introduce network latency that in-process monolithic function calls do not. An order creation that required a single monolithic function call chain may now involve three network round trips (Order → Inventory → Payment). At high transaction volumes, this latency accumulation affects user-facing response times. gRPC over REST for performance-sensitive synchronous calls reduces serialisation overhead — and caching strategies for frequently-read inventory data reduce round-trip requirements.

Application Gateway Cost at Scale : Azure Application Gateway WAF_v2 pricing includes both capacity unit and data processing charges. At high traffic volumes with WAF enabled, Application Gateway costs accumulate significantly. Organisations with very high traffic volumes should model Application Gateway capacity unit consumption against expected traffic profiles before finalising ingress architecture.

Database per Service Operational Overhead : Managing multiple independent database instances — Azure SQL for transactional services, Cosmos DB for document services, Redis for caching — increases operational complexity compared to a single shared database. Each database requires independent backup policies, scaling configuration, and monitoring. Platform teams must establish consistent database governance standards across service-owned databases to prevent management fragmentation.

Prometheus Storage Scaling : Prometheus stores metrics data locally by default — without external storage integration (Azure Monitor Managed Prometheus or Thanos), Prometheus storage grows continuously with metric volume and retention period. Long-term metric retention at enterprise scale requires either Azure Monitor Managed Prometheus for cloud-native managed storage or Thanos for horizontally scalable Prometheus storage — local-only Prometheus is not a sustainable long-term metrics architecture for high-volume microservices environments.

Projected Outcomes

The architecture is designed to deliver the following operational and application outcomes in a production enterprise environment:

Independent per-service scaling responding to service-specific demand without scaling unrelated services
Near-zero downtime deployments through Kubernetes rolling update strategy with health-gate progression
Improved fault isolation — failures contained within service boundaries rather than cascading across the application
Enhanced security posture through Network Policy east-west traffic governance and CSI Driver secret management
Full-stack observability across infrastructure, Kubernetes platform, and application service layers through unified Grafana dashboards
Independent service deployment velocity — changes to one service deployed without redeploying unrelated services
WAF-protected external traffic through Application Gateway OWASP rule enforcement
Distributed request tracing through Application Insights enabling end-to-end latency investigation across service call chains

Future Evolution

Service mesh integration (Istio) for mutual TLS between all services, advanced traffic management, and service-level observability without application code changes
GitOps deployment orchestration through FluxCD replacing pipeline-driven Kubernetes deployments with declarative state management
Kubernetes policy enforcement through OPA/Gatekeeper preventing non-compliant workload deployment through admission controller validation
Multi-region Kubernetes federation for geographic distribution and disaster recovery through Azure Kubernetes Fleet Manager
Advanced workload security scanning through Microsoft Defender for Containers with runtime threat detection
FinOps-aware Kubernetes governance through Kubecost namespace-level cost allocation per service team
Automated chaos engineering through Chaos Mesh validating service resilience under simulated component failures
Confidential computing integration through AKS confidential node pools for sensitive payment workload isolation

Key Takeaways

Service boundary definition is the most consequential microservices architecture decision — poorly defined boundaries recreate monolithic coupling in distributed form
Inter-service communication pattern selection (synchronous vs asynchronous) must be explicitly governed — default synchronous communication for all services creates tight coupling and cascading failure risk
Database per service is a hard architectural requirement for microservices independence — shared databases between services defeat the deployment and evolution independence that microservices provide
CSI Secrets Store Driver is the correct secret injection mechanism for AKS — Kubernetes Secrets base64 encoding is not encryption; Key Vault CSI mounting eliminates secret exposure in etcd
Application Gateway WAF provides managed OWASP protection appropriate for externally exposed enterprise APIs — the operational cost premium over NGINX Ingress is justified by managed rule maintenance and WAF capability
Full-stack observability requires three layers — infrastructure metrics, Kubernetes platform metrics, and application-level metrics — correlated in unified dashboards; any single layer alone is insufficient for microservices troubleshooting
Microservices complexity is real and must be justified by scale and team capability — microservices are not universally superior to well-structured monolithic architectures for all application contexts

Executive Summary

Business Drivers

This architecture was designed to address the application modernisation requirements of organisations where monolithic architecture results in:

Scalability bottlenecks — high-demand services cannot be scaled independently without scaling the entire application
High-risk deployments — any code change requires full application redeployment creating broad blast radius for failures
Tight component coupling — internal dependencies prevent independent service evolution and create cascading failure risk
Limited fault isolation — a failure in any component can propagate across the entire application
Slow delivery cycles — full application testing and deployment for any change, regardless of scope
Inefficient resource utilisation — entire application must be sized for peak demand of its most resource-intensive component

Operational Constraints

The architecture was designed to operate within the following constraints typical of microservices migration scenarios:

Existing application components must be decomposed into services without rewriting all business logic simultaneously — incremental decomposition from monolith boundaries
Service-to-service communication must be governed — unrestricted inter-service communication recreates monolithic coupling in distributed form
Each service requires independent data storage — shared databases between services create coupling that defeats microservices independence
Infrastructure scaling must be elastic — services with different demand profiles require independent autoscaling without manual intervention
Security controls must be centralized — per-service secret management and access control creates governance inconsistency at scale
Observability must span both infrastructure and application layers — Kubernetes cluster health alone is insufficient for microservices troubleshooting
Deployment pipelines must support independent service deployment — a change to one service must not require redeployment of unrelated services

Objectives

Decompose monolithic application into independently deployable microservices with clear domain boundaries
Define inter-service communication patterns — synchronous REST/gRPC for request-response, asynchronous messaging for event-driven workflows
Implement polyglot data architecture matching each service's data requirements to appropriate storage technology
Deploy Application Gateway with WAF for Layer 7 ingress governance and external threat protection
Enforce service isolation through Kubernetes Network Policies governing east-west traffic between services
Centralise secret management through Azure Key Vault CSI Secrets Store Driver — no hardcoded credentials in workloads
Implement elastic scaling through HPA responding to per-service demand independently
Establish full-stack observability combining Azure Monitor (infrastructure) with Prometheus and Grafana (application layer)
Standardise service deployments through Helm charts ensuring consistent, versioned, repeatable deployments

Architecture Principles

Domain-driven service boundaries — services aligned to business domains not technical layers
Independent service lifecycle — each service deployable, scalable, and updatable without affecting others
Data isolation per service — no shared databases between services; each service owns its data store
Communication governance — synchronous patterns for required consistency, asynchronous for resilience and decoupling
Immutable container images — services deployed as versioned, immutable container images through ACR
Secure-by-default workload communication — network policies enforcing least-privilege inter-service connectivity
Centralized secret governance — Key Vault CSI Driver providing secrets to workloads without environment variable exposure
Full-stack observability — infrastructure metrics, Kubernetes metrics, and application-level metrics correlated in unified dashboards
Managed platform operations — AKS managed control plane reducing Kubernetes operational overhead

Architecture Overview

1. Kubernetes Orchestration Layer

The platform leverages Azure Kubernetes Service with a multi-node pool architecture optimised for heterogeneous microservice workload profiles.

AKS Cluster Configuration:

Component	Configuration	Rationale
Network plugin	Azure CNI	Pod-level VNet integration enabling Network Policy support
Authentication	Azure AD + Kubernetes RBAC	Unified identity governance through Entra ID
Workload Identity	Azure Workload Identity	Secretless pod authentication to Azure services
Node pools	Multiple — by workload profile	CPU-optimised for compute services, memory-optimised for data services

Multi-Node Pool Design:

Node Pool	VM SKU	Purpose	Autoscale
System pool	Standard DS2 v2	AKS system components	Fixed — 3 nodes
General workload	Standard D4s v3	Standard microservices	2–10 nodes
Memory-optimised	Standard E4s v3	Data-intensive services	1–5 nodes

Namespace Architecture:

Namespace	Services	Access Boundary
api-gateway	External ingress service	Ingress controller
order-service	Order domain services	Internal only
inventory-service	Inventory domain services	Internal only
payment-service	Payment domain services	Restricted — payment team only
notification-service	Event-driven notification	Internal only
data-access	Shared data access layer	Service namespaces only
monitoring	Prometheus, Grafana	Platform team only

2. Microservices Decomposition

The monolithic application is decomposed along domain-driven boundaries — each service encapsulating a distinct business capability with independent deployment and data ownership.

Service Decomposition Model:

Service	Domain	Responsibility	Communication Pattern
API Gateway Service	Routing	External request routing, auth validation	Synchronous — proxies to internal services
Order Service	Orders	Order lifecycle management	Synchronous (REST) + Async (events)
Inventory Service	Inventory	Stock management and availability	Synchronous (REST)
Payment Service	Payments	Payment processing and validation	Synchronous (REST) — strict consistency required
Notification Service	Notifications	Email, SMS, push notifications	Asynchronous (event-driven)
User Service	Identity	User profile and preference management	Synchronous (REST)

Decomposition Principles Applied:

Single Responsibility — each service owns one business domain exclusively
Bounded Context — service interfaces expose only what consuming services require, not internal data models
Database per Service — each service owns its data store; no cross-service database access
Strangler Fig Pattern — new microservices replace monolithic modules incrementally rather than requiring full rewrites

3. Inter-Service Communication Design

Synchronous Communication — REST/gRPC: Used where the calling service requires an immediate response and strong consistency guarantees:

Order Service → Inventory Service: stock reservation requires synchronous confirmation before order commitment
Order Service → Payment Service: payment authorisation requires synchronous confirmation before order completion
API Gateway → all services: external request routing requires synchronous response

REST over HTTPS for service-to-service communication with mutual authentication through Workload Identity tokens.

Asynchronous Communication — Event-Driven: Used where services can operate independently and eventual consistency is acceptable:

Order Service → Notification Service: order confirmation events published to Azure Service Bus — Notification Service consumes asynchronously without Order Service waiting for notification delivery
Inventory Service → Notification Service: low stock alert events
Payment Service → Order Service: payment completion events triggering order status updates

Azure Service Bus provides the message broker for asynchronous inter-service communication — durable message delivery with dead letter queue support for failed message handling.

4. Ingress & Traffic Management Layer

Azure Application Gateway with WAF:

OWASP Core Rule Set (CRS) 3.2 enforcement protecting against OWASP Top 10 attack categories
SSL/TLS termination at the Application Gateway — backend services receive decrypted HTTP within the private VNet
Path-based and host-based routing directing external requests to the API Gateway service namespace
Bot protection rules filtering automated malicious traffic before cluster ingress
Rate limiting preventing abuse of public-facing API endpoints

5. Polyglot Data Architecture

Each microservice owns its data store — the technology selected matches the service's specific data access and consistency requirements.

Service	Data Store	Technology	Rationale
Order Service	Relational — transactional	Azure SQL Database	ACID transactions required for order lifecycle
Inventory Service	Relational — structured	Azure SQL Database	Structured inventory data with referential integrity
Payment Service	Relational — auditable	Azure SQL Database	Financial transactions requiring full audit trail
User Service	Document — flexible schema	Azure Cosmos DB	User preferences with evolving schema requirements
Notification Service	Queue + blob	Azure Service Bus + Blob	Message-driven with attachment storage
Session/Cache	Key-value	Azure Cache for Redis	Low-latency session state for API Gateway

6. Security & Governance Layer

Security controls are integrated at the platform layer — applied consistently across all services rather than configured per-service.

yaml

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "azure-keyvault-provider"

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "azure-keyvault-provider"

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "azure-keyvault-provider"

Secrets are mounted as files rather than environment variables — preventing secret values from appearing in process environment dumps or Kubernetes pod describe output.

Kubernetes Network Policies — East-West Traffic Control:

Source	Destination	Port	Policy
API Gateway	Order Service	8080	Allow
API Gateway	User Service	8080	Allow
Order Service	Inventory Service	8080	Allow
Order Service	Payment Service	8443	Allow
All services	Notification Service	Deny inbound	Notification is event-driven only
All services	data-access namespace	5432/1433	Allow from authorised namespaces only
Default	Default	All	Deny — explicit allow required

Kubernetes RBAC — Cluster Access Governance:

Developers: namespace-scoped read access to their service namespaces — no cluster-admin
Platform team: full cluster access for infrastructure management
CI/CD service account: deploy permissions scoped to specific namespaces — no cluster-wide privileges
Monitoring service account: read access to all namespaces for metrics collection

7. CI/CD & DevOps Automation Layer

Deployment automation provides independent per-service CI/CD pipelines — a change to the Order Service does not trigger redeployment of the Payment Service.

Per-Service Pipeline: Each microservice has an independent pipeline triggered by commits to its source directory:

trigger:
  paths:
    include:
      - services/order-service/*   # Only triggers on Order Service changes

trigger:
  paths:
    include:
      - services/order-service/*   # Only triggers on Order Service changes

trigger:
  paths:
    include:
      - services/order-service/*   # Only triggers on Order Service changes

Pipeline Stages:

Build — Docker image build from service Dockerfile
Test — unit and integration test execution
Scan — container image vulnerability scanning through Microsoft Defender for Containers
Push — image push to ACR with commit SHA tag
Deploy — Helm chart upgrade to target environment namespace

Helm Chart Structure: Each service is packaged as a Helm chart — values files provide environment-specific configuration without modifying chart templates:

order-service/
├── Chart.yaml
├── values.yaml          # Default values
├── values-dev.yaml      # Development overrides
├── values-prod.yaml     # Production overrides
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    └── networkpolicy.yaml

order-service/
├── Chart.yaml
├── values.yaml          # Default values
├── values-dev.yaml      # Development overrides
├── values-prod.yaml     # Production overrides
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    └── networkpolicy.yaml

order-service/
├── Chart.yaml
├── values.yaml          # Default values
├── values-dev.yaml      # Development overrides
├── values-prod.yaml     # Production overrides
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    └── networkpolicy.yaml

8. Elastic Scaling Layer

Horizontal Pod Autoscaler governs independent per-service scaling based on service-specific demand metrics.

HPA Configuration per Service:

Service	Scale Metric	Min Replicas	Max Replicas	Scale-Out Threshold
API Gateway	CPU 70%	3	20	Immediate — latency-sensitive
Order Service	CPU 75%	2	10	Standard
Payment Service	CPU 60%	3	8	Conservative — financial workload
Notification Service	Queue depth	2	15	Message backlog-driven
Inventory Service	CPU 75%	2	8	Standard

9. Monitoring & Observability Layer

Full-stack observability spans three visibility layers — cloud infrastructure, Kubernetes platform, and application service metrics — correlated in unified Grafana dashboards.

Azure Monitor — Infrastructure Layer:

AKS cluster health, node utilisation, and control plane metrics
Container Insights providing pod-level CPU, memory, and restart monitoring
Azure Application Gateway access logs, WAF rule hits, and backend health
Azure Service Bus message throughput, dead letter queue depth, and consumer lag

Prometheus — Kubernetes & Application Layer:

Kubernetes cluster metrics through kube-state-metrics
Node-level metrics through node-exporter
Per-service application metrics through service-level Prometheus endpoints
Custom business metrics exposed by each service (order creation rate, payment success rate, inventory update throughput)

Grafana — Unified Dashboards:

Service health dashboard — per-service error rate, latency p50/p95/p99, and throughput
Infrastructure dashboard — node utilisation, pod scheduling, and cluster capacity
Business metrics dashboard — order volume, payment success rate, and inventory levels
Alerting rules forwarding critical service degradation to PagerDuty or Azure Monitor Alerts

Technologies Used

Category	Technologies
Container Orchestration	Azure Kubernetes Service (AKS), Kubernetes
Containerisation	Docker
Package Management	Helm
Container Registry	Azure Container Registry (ACR)
Ingress & WAF	Azure Application Gateway with WAF, Kubernetes Ingress Controller
CI/CD	Azure DevOps, GitHub Actions
Message Broker	Azure Service Bus
Data Services	Azure SQL Database, Azure Cosmos DB, Azure Cache for Redis, Azure Blob Storage
Identity & Access	Microsoft Entra ID, Azure Workload Identity, Kubernetes RBAC
Secret Management	Azure Key Vault, CSI Secrets Store Driver
Network Security	Kubernetes Network Policies, Azure CNI
Monitoring	Azure Monitor, Container Insights, Prometheus, Grafana, Application Insights

Key Challenges Addressed

Design Decisions & Rationale

Trade-offs & Design Constraints

Projected Outcomes

The architecture is designed to deliver the following operational and application outcomes in a production enterprise environment:

Independent per-service scaling responding to service-specific demand without scaling unrelated services
Near-zero downtime deployments through Kubernetes rolling update strategy with health-gate progression
Improved fault isolation — failures contained within service boundaries rather than cascading across the application
Enhanced security posture through Network Policy east-west traffic governance and CSI Driver secret management
Full-stack observability across infrastructure, Kubernetes platform, and application service layers through unified Grafana dashboards
Independent service deployment velocity — changes to one service deployed without redeploying unrelated services
WAF-protected external traffic through Application Gateway OWASP rule enforcement
Distributed request tracing through Application Insights enabling end-to-end latency investigation across service call chains

Future Evolution

Service mesh integration (Istio) for mutual TLS between all services, advanced traffic management, and service-level observability without application code changes
GitOps deployment orchestration through FluxCD replacing pipeline-driven Kubernetes deployments with declarative state management
Kubernetes policy enforcement through OPA/Gatekeeper preventing non-compliant workload deployment through admission controller validation
Multi-region Kubernetes federation for geographic distribution and disaster recovery through Azure Kubernetes Fleet Manager
Advanced workload security scanning through Microsoft Defender for Containers with runtime threat detection
FinOps-aware Kubernetes governance through Kubecost namespace-level cost allocation per service team
Automated chaos engineering through Chaos Mesh validating service resilience under simulated component failures
Confidential computing integration through AKS confidential node pools for sensitive payment workload isolation

Key Takeaways

Service boundary definition is the most consequential microservices architecture decision — poorly defined boundaries recreate monolithic coupling in distributed form
Inter-service communication pattern selection (synchronous vs asynchronous) must be explicitly governed — default synchronous communication for all services creates tight coupling and cascading failure risk
Database per service is a hard architectural requirement for microservices independence — shared databases between services defeat the deployment and evolution independence that microservices provide
CSI Secrets Store Driver is the correct secret injection mechanism for AKS — Kubernetes Secrets base64 encoding is not encryption; Key Vault CSI mounting eliminates secret exposure in etcd
Application Gateway WAF provides managed OWASP protection appropriate for externally exposed enterprise APIs — the operational cost premium over NGINX Ingress is justified by managed rule maintenance and WAF capability
Full-stack observability requires three layers — infrastructure metrics, Kubernetes platform metrics, and application-level metrics — correlated in unified dashboards; any single layer alone is insufficient for microservices troubleshooting
Microservices complexity is real and must be justified by scale and team capability — microservices are not universally superior to well-structured monolithic architectures for all application contexts

Open to discussing infrastructure architecture, cloud transformation, or high-availability system design.

Whether the objective is infrastructure modernization, operational resilience, hybrid cloud transformation, or enterprise security architecture, I am always interested in discussing complex infrastructure environments and strategic technical initiatives.

Get in touch

Open to discussing infrastructure architecture, cloud transformation, or high-availability system design.

Whether the objective is infrastructure modernization, operational resilience, hybrid cloud transformation, or enterprise security architecture, I am always interested in discussing complex infrastructure environments and strategic technical initiatives.

Get in touch

Open to discussing infrastructure architecture, cloud transformation, or high-availability system design.

Whether the objective is infrastructure modernization, operational resilience, hybrid cloud transformation, or enterprise security architecture, I am always interested in discussing complex infrastructure environments and strategic technical initiatives.