Operational Best Practices For Azure Kubernetes Service

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74
At a glance
Powered by AI
The key takeaways from the document are operational best practices for Azure Kubernetes Service (AKS) including cluster isolation and resource management, networking, securing environments, scaling applications and clusters, and logging and monitoring.

The different ways to isolate clusters mentioned are using namespaces, applying network policies, and using role-based access control and pod security policies.

Resource quotas can help limit aggregate resource consumption per namespace and constrain compute resources like CPU, memory, and storage as well as limit the number of objects. This prevents overcommitting resources.

Operational best practices

for Azure Kubernetes Service


( 邦題: Azure Kubernetes Service (AKS) 管理の
ベスト プラクティス )

Saurya Das
Senior Program Manager
Azure Kubernetes Service
Microsoft Corporation.

CI32
Agenda

• Cluster Isolation and Resource Management


• Networking
• Securing your Environment
• Scaling your Applications and Cluster
• Logging and Monitoring
Dev Cluster Staging Cluster

Node0 Node1 Node0 Node1

PodA PodB PodE PodF PodA PodB PodE PodF

PodC PodD PodG PodH PodC PodD PodG PodH

Prod Team1 Cluster Prod Team2 Cluster

Node0 Node1 Node0 Node1

PodA PodB PodE PodF PodA PodB PodE PodF

PodC PodD PodG PodH PodC PodD PodG PodH


Dev and Staging Cluster Prod Cluster

Node0 Node1 Node0 Node1


DevTeam1 Team1

Pod Pod Pod Pod Pod Pod Pod Pod

Pod Pod Pod Pod Staging Pod Pod Pod Pod Team2

Node2 Node3 Node3 Node4

Pod Pod Pod Pod Pod Pod Pod Pod

Pod Pod Pod Pod DevTeam2 Pod Pod Pod Pod Team3
Kubernetes Namespaces

• Namespaces Object is the logical Isolation boundary


• Kubernetes has features to help us safely isolate tenants
• Scheduling: Resource Quota
• Network Isolation using Network Policies
• Authentication and Authorization: RBAC and Pod Security Policy

• Note: Container Level isolation still need to be done to achieve hard Isolation
• Constraints that limit aggregate resource consumption Create a namespace:
per namespace
$ kubectl create namespace ignite
• You can limit Compute Resources (CPU,Memory,
Strage,…) and/or limit the number of Objects (Pods, Apply a resource quota to the namespace:
Services, etc..) and admin/resource/ignite.yaml
• When enabled, users must specify requests or limits, apiVersion: v1
otherwise the quota system will fail the request. kind: ResourceQuota
metadata:
• Kubernetes will not overcommit
name: mem-cpu-demo
spec:
hard:
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 2Gi
Physical Logical
Pod Density Low to Medium Medium to High

Cost $$ $

Kubernetes Experience Low to Medium Medium to High

Security High (Surface is small) High*

Blast Radius of Changes Small Big

Management and Operations Owner Team Single or Cross Functional Team

*Logical Isolation via Namespaces can achieve hard isolation assuming the cluster admin has applied all the required security controls
Kube-advisor

• Diagnostic tool for Kubernetes clusters. At the moment, it returns pods that are missing
resource and request limits.
• More info can be found at https://github.com/Azure/kube-advisor
VS Code extension for warnings
• Kubernetes VS Code extension adding warnings for resource request/limits
Cluster Isolation - Summary

• Think of the sensitivity of the workload, cost, organization culture, operations model, and
blast radius, when trying to choose which isolation pattern to use, a mixture is fine too.

• Always use Namespaces even in physical isolation, never use the Default Namespace for
production workloads

• Apply Resource Quotas


AKS Basic Networking

• Done using Kubenet network plugin and has the following features
• Nodes and Pods are placed on different IP subnets
• User Defined Routing and IP Forwarding is for connectivity between Pods across Nodes.

• Drawbacks
• 2 different IP CIDRs to manage
• Performance impact
• Peering or On-Premise connectivity is hard to achieve
AKS Advanced Networking

• Done using the Azure CNI (Container Networking Interface)


• CNI is a vendor-neutral protocol, used by container runtimes to make requests to
Networking Providers
• Azure CNI is an implementation which allows you to integrate Kubernetes with your
VNET

• Advantages
• Single IP CIDR to manage
• Better Performance
• Peering and On-Premise connectivity is out of the box
AKS with Advanced Networking
Azure VNet A

Backend
AKS subnet services subnet

AKS cluster SQL Server

On-premises
infrastructure

Enterprise
system

Azure
Express
Route
VNet peering
VNet B

Other peered VNets


• Service Type LoadBalancer
• Basic Layer4 Load Balancing (TCP/UDP)
• Each service as assigned an IP on the Azure AKS VNet

ALB
Public IP

Public LB

apiVersion: v1 AKS subnet


kind: Service
metadata: AKS cluster

name: frontendservice
spec: FrontEndService

loadBalancerIP: X.X.X.X
type: LoadBalancer
Pod1 Pod2 Pod3
ports: label:Frontend label:Frontend label:Frontend
- port: 80
selector:
app: frontend
On-premises
VNet B infrastructure

Enterprise
Other peered VNets
system

• Used for internal services that should


be accessed by other VNETs or On-
Premise only VNet peering Azure Express Route

Azure AKS VNet


apiVersion: v1
kind: Service Internal IP
metadata:
name: internalservice
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal:
"true" Internal LB

spec: AKS subnet


type: LoadBalancer
loadBalancerIP: 10.240.0.25 AKS cluster
ports:
- port: 80 InternalService
selector:
app: internal
Pod1 Pod2 Pod3
label:Internal label:Internal label:Internal
Ingress and Ingress Controllers

• Ingress is a Kubernetes API that manages external access to the services in the cluster
• Supports HTTP and HTTPs
• Path and Subdomain based routing
• SSL Termination
• Save on public Ips

• Ingress controller is a daemon, deployed as a Kubernetes Pod, that watches the Ingress
Endpoint for updates. Its job is to satisfy requests for ingresses. Most popular one being
Nginx.
On-premises
infrastructure

Enterprise
system

kind: Ingress
metadata:
name: contoso-ingress
annotations: kubernetes.io/ingress.class: Azure Express Route
”PublicIngress"
Azure AKS VNet
spec:
tls:
- hosts: Private IP

- contoso.com
secretName: contoso-secret
Public IP
rules:
- host: contoso.com Public LB Internal LB
http:
paths: AKS subnet

- path: /a AKS cluster


backend: PublicIngress PrivateIngress
serviceName: servicea
servicePort: 80 contoso.com/A contoso.com/B serviceC.contoso.com

- path: /b
backend: ServiceA ServiceB ServiceC
serviceName: serviceb
servicePort: 80
Azure
Management VNET

APP GW
Subnet

VNet peering
Azure AKS VNet

Private IP

Internal LB
AKS subnet

AKS cluster

Ingress

contoso.com/A contoso.com/B service.contoso.com

ServiceA ServiceB ServiceC


On-premises Azure
infrastructure Management VNET
Admin Enterprise SSH Bastion APP GW
system Subnet Subnet

Bastion
Host
Azure
Express
Route
VNet peering
Azure AKS VNet

Private IP

Internal LB
AKS subnet

AKS cluster

Ingress

contoso.com/A contoso.com/B service.contoso.com

ServiceA ServiceB ServiceC


Summary

➢ Use AKS Advanced networking for seamless integration with your VNET
➢ Use Ingress and Ingress controllers for HTTP and HTTPs services
➢ Use Azure Application Gateway or any other alternative from the Azure Market place to
secure your services using a WAF
➢ Use Bastion Hosts to access your nodes when needed
Cluster Level Security
Cluster Level Security

• Securing endpoints for API server and cluster nodes


o Ensuring authentication and authorization (AAD + RBAC)
o Setting up & keeping least privileged access for common tasks
Cluster Level - Identity and Access Management through
AAD and RBAC

1. Kubernetes Developer
authenticates with AAD

2. The AAD token issuance Azure Active


Directory
endpoint issues the access token

3. Developer performs action w/


AAD token.
Eg. kubectl create pod
Token

4. Kubernetes validates token with


<¥>
Developer
AAD and fetches the Developer’s
AKS
AAD Groups
Eg. Dev Team A, App Group B

5. Kubernetes RBAC and cluster


policies are applied

6. Request is successful or not based


on the previous validation
Provisioning AD-enabled AKS

$ az aks create --resource-group myAKSCluster --name myAKSCluster --generate-ssh-keys ¥


--aad-server-app-id <Azure AD Server App ID> ¥
--aad-server-app-secret <Azure AD Server App Secret> ¥
--aad-client-app-id <Azure AD Client App ID> ¥
--aad-tenant-id <Azure AD Tenant>

$ az aks get-credentials --resource-group myAKSCluster –name myAKSCluster --admin


Merged "myCluster" as current context ..

$ kubectl get nodes

NAME STATUS ROLES AGE VERSION


aks-nodepool1-42032720-0 Ready agent 1h v1.9.6
aks-nodepool1-42032720-1 Ready agent 1h v1.9.6
aks-nodepool1-42032720-2 Ready agent 1h v1.9.6
Provisioning AD-enabled AKS

Setting up a Cluster Role Bind the Cluster Role to a user


apiVersion: rbac.authorization.k8s.io/v1beta1 apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole kind: ClusterRoleBinding
metadata: metadata:
labels: name: contoso-cluster-admins
kubernetes.io/cluster-service: "true" roleRef:
name: cluster-admin apiGroup: rbac.authorization.k8s.io
rules: kind: ClusterRole
- apiGroups: name: cluster-admin
- extensions subjects:
- apps - apiGroup: rbac.authorization.k8s.io
resources: kind: User
- deployments name: "[email protected]"
verbs:
- get
- list
- watch Bind the Cluster Role to a group
- update
apiVersion: rbac.authorization.k8s.io/v1
- patch
- apiGroups: kind: ClusterRoleBinding
metadata:
- ""
name: contoso-cluster-admins
resources:
roleRef:
- events
apiGroup: rbac.authorization.k8s.io
- namespaces
kind: ClusterRole
- nodes
name: cluster-admin
- pods
subjects:
verbs:
- apiGroup: rbac.authorization.k8s.io
- get
kind: Group
- list
name: "894656e1-39f8-4bfe-b16a-510f61af6f41"
- watch
Azure Level - Identity and Access Management through
AAD and RBAC
Kubernetes
1. Kubernetes Administrator Administrator

authenticates with AAD

2. The AAD token issuance Azure Active


Directory
endpoint issues the access token Token

3. Administrator fetches the admin


kubeconfig and configures RBAC
roles and bindings
Token

4. Kubernetes Developer fetches the


<¥>
Developer
user kubeconfig
AKS
Provisioning AD-enabled AKS
$ az aks get-credentials --resource-group myAKSCluster --name myAKSCluster

$ kubectl get nodes

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and
enter the code BUJHWDGNL to authenticate.

NAME STATUS ROLES AGE VERSION


aks-nodepool1-42032720-0 Ready agent 1h v1.9.6
aks-nodepool1-42032720-1 Ready agent 1h v1.9.6
aks-nodepool1-42032720-2 Ready agent 1h v1.9.6

Or

Error from server (Forbidden): nodes is forbidden: User [email protected] cannot


list nodes at the cluster scope
Cluster Level Security

• Securing endpoints for API server and cluster nodes


o Ensuring authentication and authorization (AAD + RBAC)
o Setting up & keeping least privileged access for common tasks
o Admission Controllers
▪ DenyEscalatingExec
▪ ValidatingAdmissionWebhooks
▪ MutatingAdmissionWebhooks
▪ ServiceAccount
▪ Coming soon:
➢ NodeRestriction
➢ PodSecurityPolicy
Cluster Level – Nodes, Upgrade and Patches

• Regular maintenance, security and cleanup Upgrade to version 1.10.6


tasks $ az aks upgrade --name myAKSCluster ¥
--resource-group myResourceGroup ¥
--kubernetes-version 1.10.6
o Maintain, update and upgrade hosts and
kubernetes
o Monthly ideal, 3 months minimum • SSH Access
o DenyEscalatingExec
o Security patches
▪ AKS automatically applies security • Running benchmarks and tests to
patches to the nodes on a nightly validate cluster setup
schedule o Kube-bench
o Aqua Hunter
▪ You’re responsible to reboot as required
o Others
▪ Kured DaemonSet
https://github.com/weaveworks/kured
Container Level Security and Isolation
Container Level – The images
• Trusted Registry
• Regularly apply security updates to the container images
Container Level – Images and Runtime
• Scan your images, scan your containers
Aqua and Twistlock
• Runtime enforcement and Helm
remediationcontainer security
chart

AKS production cluster

Web tier

Azure
Inner loop Source Container Azure
code control Registry Monitor
AKS dev
VSCode cluster
Test Aqua, Twistlock, Neuvector
Container Security
Business tier
Debug

Auto-build

Azure Pipelines/
DevOps Project

Database tier

CI/CD
Container Level – The access
• Avoid access to HOST IPC namespace - only if absolutely necessary
• Avoid access to Host PID namespace - only if absolutely necessary
• Avoid root / privileged access
o Consider Linux Capabilities
Container Level – App Armor

Securing a Pod with a deny-write.profile deny-write.profile

apiVersion: v1 #include <tunables/global>


kind: Pod
metadata:
$ kubectl exec hello-apparmor touch /tmp/test
name: hello-apparmor profile k8s-apparmor-example-deny-
annotations: write flags=(attach_disconnected) {
container.apparmor.security.beta.kubernetes.io/
touch: #include <abstractions/base>
hello: /tmp/test: Permission denied
localhost/k8s-apparmor-example-deny-
write error executing remote command: command
error: terminated with non-zero exit
code: Error executing in Docker Container: 1 file,
spec:
containers:
- name: hello # Deny all file writes.
image: busybox
command: [ "sh", "-c", "echo 'Hello deny /** w,
AppArmor!' && sleep 1h" ]
}
Container Level - Seccomp
Securing a Pod with a prevent-chmod profile Seccomp Profile
/var/lib/kubelet/seccomp/prevent-chmod
apiVersion: v1
kind: Pod {
metadata:
name: chmod-prevented "defaultAction": "SCMP_ACT_ALLOW",
annotations:
seccomp.security.alpha.kubernetes.io/pod: "syscalls": [
localhost/prevent-chmod
{
spec: "name": "chmod",
containers:
- name: chmod "action": "SCMP_ACT_ERRNO"
image: busybox
command: }
- "chmod“
args: ]
- “777“
- /etc/hostname }
restartPolicy: Never
Container Level

$ kubectl create -f seccomp-pod.yaml


pod "chmod-prevented" created

$ kubectl get pods


NAME READY STATUS RESTARTS AGE
chmod-prevented 0/1 Error 0 8s
Pod Level Security
Pod Level – Pod Security Context
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: sec-ctx-demo
image: ignite.azurecr.io/nginx-demo
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
seLinuxOptions:
level: "s0:c123,c456"
Pod Level – Pod Security Context
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: sec-ctx-demo
image: ignite.azurecr.io/nginx-demo
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
seLinuxOptions:
level: "s0:c123,c456"
Pod Level – Pod Security Policies
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default’
apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default’
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default’
apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default’
spec:
privileged: false
allowPrivilegeEscalation: false # Required to prevent escalations to root.
requiredDropCapabilities: # This is redundant with non-root + disallow privilege escalation, but we can provide it for defense in depth.
- ALL
volumes: # Allow core volume types.
- 'configMap’
- 'emptyDir’
- 'projected’
- 'secret’
- 'downwardAPI’
- 'persistentVolumeClaim’ # Assume that persistentVolumes set up by the cluster admin are safe to use.
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot’ # Require the container to run without root privileges.
seLinux:
rule: 'RunAsAny’ # This policy assumes the nodes are using AppArmor rather than SELinux.
supplementalGroups:
rule: 'MustRunAs’
ranges:
- min: 1 # Forbid adding the root group.
max: 65535
fsGroup:
rule: 'MustRunAs’
ranges:
- min: 1 # Forbid adding the root group.
max: 65535
readOnlyRootFilesystem: false
Pod level
• Pod Security Context
• Pod Security Policies
• AlwaysPull Images
Securing Workloads
Pod Identity

1. Kubernetes operator defines an Developer


identity map for K8s service
accounts <¥>

2. Node Managed Identity (NMI) Kubernetes


watches for mapping reaction and
syncs to Managed Service Identify Kubernetes Azure
controller Identity Azure SQL
(MSI) Binding Pod Server

3. Developer creates a pod with a


service account. Pod uses
standard Azure SDK to fetch a
Active
token bound to MSI Directory
Token

Pod Identity Azure MSI

4. Pod uses access token to


consume other Azure services; NMI + EMSI
services validate token
Securing workloads
• Managing secrets and privileged information
o Azure Key Vault

AKS w/
Key Vault RBAC

Azure SQL
Storage Database Cosmos DB
Securing workloads

• Service Endpoints
• Filter secrets from the logs
• Encrypted Service to Service Communication
o mTLS between services
o Service Meshes
Compliance

• AKS is SOC 1/2 , PCI , HIPPA and ISO certified


• All the details are listed in the Azure Trust Center
Manual scaling is tedious and ineffective

• Horizontal pod autoscaling(HPA) -> Scaling pods/containers

• Cluster Autoscaling -> Scaling infrastructure/VM’s

• AKS + ACI + VK for burst scenarios -> Scaling pods/containers


Node1
Horizontal Pod
Pod
Autoscaler Deployment ReplicaSet

Kublet
replicas++

replicas--
Pod cAdvisor

Gets
metrics NodeX
from
Node2
Collects metrics from all nodes
Pod
Metrics Server
Collects metrics
Kublet from all containers
on the node

cAdvisor
• Scales nodes based on pending pods 2. Additional
node(s) needed
CA
• Scale up and scale down 1. Pods are in
pending state
• Reduces dependency on monitoring Pod Pod

• Removes need for users to manage


3. Node is granted

nodes and monitor service usage


manually 4. Pending pods
are scheduled

Node Node

Pod Pod Pod Pod

AKS Cluster
VM VM
Pods Pods

Azure Container Instances (ACI)

Application Pods
Architect
Kubernetes
control pane

Infrastructure
Architect
Deployment/ ACI
tasks Connector

VM VM
Pods Pods
Fast container autoscaling
Cluster autoscaler
AKS + VK burst ACI
• Minimize downtime risk Azure Traffic
Manager

• One live region


• Another backup
• Or weighted traffic
• A/B testing AKS Cluster 1
Region 1
AKS Cluster 2
Region 2

Azure paired regions


Monitoring/Logging your cluster

• Log Everything to stdout / stderr


• Key Metrics:
o Node metrics (CPU Usage, Memory Usage, Disk Usage, Network Usage)
o Kube_node_status_condition
o Pod memory usage / limit; memory_failures_total
▪ container_memory_working_set_bytes
o Pod CPU usage average / limit
o Filesystem Usage / limit
o Network receive / transmit errors
• Azure Monitor for Containers

In the roadmap
Overview health of AKS cluster
Node event Logs
Pod usage and details
Customer control plane logs

• Use the Azure portal to enable diagnostics logs


• Pipe logs to log analytics, event hub or a
storage account
• Metrics available today
• Kube-controller-manager
• Kube-api-server
• Kube-scheduler
• Audit logs on the roadmap
Example control plane logs
Multi cluster monitoring
Monitoring and logging (Saurya)
1) Node/pod usage, kube events
2) Pod hogging resource, show resource request limits
3) Talk about percentile for capacity planning
4) Show filter –kube-system
5) https://aka.ms/multiaksinsights
Resources

• AKS Best Practices GitHub: https://github.com/Azure/k8s-best-practices


• AKS Hackfest: aka.ms/k8s-hackfest & https://github.com/Azure/kubernetes-hackfest
• Distributed systems Labs by Brendan Burns
• Kube Advisor: https://github.com/Azure/kube-advisor
• VSCode Kubernetes Extension
• Documentation resources
• Ebook for distributed systems
• AKS HoL
• Connect with us on twitter:
• Jorge Palma - @jorgefpalma Andrew Randall - @andrew_randall
• Mohammad Nofal - @mohmd_nofal
• Saurya Das - @sauryadas_
© 2018 Microsoft Corporation. All rights reserved.
本コンテンツの著作権、および本コンテンツ中に出てくる商標権、団体名、ロゴ、製品、サービスなどはそれぞれ、各権利保有者に帰属します。

You might also like