mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-06-13 03:33:15 +08:00
* feat(skills): add kubernetes-patterns skill * fix(skills): address CodeRabbit review on kubernetes-patterns - Add When to Use alias section (repo skill-format requirement) - Add How It Works overview section (required schema) - Add Examples quick-reference table (required schema) - Fix RBAC: split into Pattern A (no API, token disabled) and Pattern B (needs API, token enabled) to resolve contradiction between automountServiceAccountToken: false and Role/RoleBinding - Fix missing -n my-namespace flag on OOMKilled kubectl describe command
756 lines
20 KiB
Markdown
756 lines
20 KiB
Markdown
---
|
||
name: kubernetes-patterns
|
||
description: Kubernetes workload patterns, resource management, RBAC, probes, autoscaling, ConfigMap/Secret handling, and kubectl debugging for production-grade deployments.
|
||
origin: ECC
|
||
---
|
||
|
||
# Kubernetes Patterns
|
||
|
||
Production-grade Kubernetes patterns for deploying, managing, and debugging workloads reliably.
|
||
|
||
## When to Activate
|
||
|
||
- Writing Kubernetes manifests (Deployments, Services, Ingress, Jobs)
|
||
- Configuring resource requests/limits, liveness/readiness probes
|
||
- Setting up RBAC, namespaces, or ServiceAccounts
|
||
- Managing configuration and secrets in K8s
|
||
- Debugging CrashLoopBackOff, OOMKilled, pending pods, or image pull errors
|
||
- Configuring HPA (Horizontal Pod Autoscaler) or PodDisruptionBudgets
|
||
- Reviewing K8s YAML for security or correctness
|
||
|
||
## When to Use
|
||
|
||
> Same as **When to Activate** above. This alias satisfies repo skill-format conventions. Use this skill any time you are writing, reviewing, or debugging Kubernetes YAML and workloads.
|
||
|
||
## How It Works
|
||
|
||
This skill provides **copy-pasteable, production-grade YAML patterns** and **kubectl debugging commands** organized by task:
|
||
|
||
1. **Deployment template** — A fully configured production `Deployment` with security context, rolling update strategy, all three probe types, resource limits, and environment injection from ConfigMap/Secret.
|
||
2. **Probes** — Decision table for startup vs liveness vs readiness, with correct `failureThreshold × periodSeconds` math.
|
||
3. **Services & Ingress** — ClusterIP, LoadBalancer, and TLS Ingress patterns with cert-manager annotations.
|
||
4. **ConfigMaps & Secrets** — `envFrom`, file-mount, and external secrets guidance.
|
||
5. **Resource management** — Requests vs limits rules of thumb by workload type (web API, JVM, worker, sidecar).
|
||
6. **RBAC** — Least-privilege ServiceAccount → Role → RoleBinding chain.
|
||
7. **HPA & PDB** — Autoscaling and node-drain safety configurations.
|
||
8. **Jobs & CronJobs** — One-off and scheduled workload patterns with correct `restartPolicy`.
|
||
9. **kubectl cheatsheet** — Logs, exec, rollback, port-forward, dry-run, and common error diagnosis commands.
|
||
10. **Anti-patterns & checklist** — What NOT to do, and a security/reliability/observability checklist.
|
||
|
||
## Examples
|
||
|
||
See the sections below for complete, runnable examples. Quick references:
|
||
|
||
| Task | Jump to |
|
||
|------|---------|
|
||
| Full production Deployment YAML | [Core Workload Patterns](#core-workload-patterns) |
|
||
| Probe configuration | [Probes](#probes--liveness-readiness-startup) |
|
||
| RBAC least-privilege setup | [RBAC](#rbac--roles-and-serviceaccounts) |
|
||
| Debug a CrashLoopBackOff | [kubectl Debugging Cheatsheet](#kubectl-debugging-cheatsheet) |
|
||
| Autoscaling | [HPA](#horizontal-pod-autoscaler-hpa) |
|
||
|
||
---
|
||
|
||
## Core Workload Patterns
|
||
|
||
### Deployment — Production Template
|
||
|
||
```yaml
|
||
apiVersion: apps/v1
|
||
kind: Deployment
|
||
metadata:
|
||
name: my-app
|
||
namespace: my-namespace
|
||
labels:
|
||
app: my-app
|
||
version: "1.0.0"
|
||
spec:
|
||
replicas: 3
|
||
selector:
|
||
matchLabels:
|
||
app: my-app
|
||
strategy:
|
||
type: RollingUpdate
|
||
rollingUpdate:
|
||
maxSurge: 1 # Allow 1 extra pod during update
|
||
maxUnavailable: 0 # Never reduce below desired count
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: my-app
|
||
version: "1.0.0"
|
||
spec:
|
||
# Security context at pod level
|
||
securityContext:
|
||
runAsNonRoot: true
|
||
runAsUser: 1001
|
||
fsGroup: 1001
|
||
|
||
# Graceful shutdown
|
||
terminationGracePeriodSeconds: 30
|
||
|
||
containers:
|
||
- name: my-app
|
||
image: ghcr.io/org/my-app:1.0.0 # Never use :latest
|
||
imagePullPolicy: IfNotPresent
|
||
|
||
ports:
|
||
- containerPort: 8080
|
||
protocol: TCP
|
||
|
||
# Resource requests AND limits are both required
|
||
resources:
|
||
requests:
|
||
cpu: "100m"
|
||
memory: "128Mi"
|
||
limits:
|
||
cpu: "500m"
|
||
memory: "256Mi"
|
||
|
||
# Container security context
|
||
securityContext:
|
||
allowPrivilegeEscalation: false
|
||
readOnlyRootFilesystem: true
|
||
capabilities:
|
||
drop:
|
||
- ALL
|
||
|
||
# Probes (see Probes section below)
|
||
startupProbe:
|
||
httpGet:
|
||
path: /health
|
||
port: 8080
|
||
failureThreshold: 30
|
||
periodSeconds: 5
|
||
livenessProbe:
|
||
httpGet:
|
||
path: /health
|
||
port: 8080
|
||
initialDelaySeconds: 0
|
||
periodSeconds: 30
|
||
failureThreshold: 3
|
||
readinessProbe:
|
||
httpGet:
|
||
path: /ready
|
||
port: 8080
|
||
initialDelaySeconds: 5
|
||
periodSeconds: 10
|
||
failureThreshold: 2
|
||
|
||
# Environment from ConfigMap and Secret
|
||
envFrom:
|
||
- configMapRef:
|
||
name: my-app-config
|
||
env:
|
||
- name: DB_PASSWORD
|
||
valueFrom:
|
||
secretKeyRef:
|
||
name: my-app-secrets
|
||
key: db-password
|
||
|
||
# Writable tmp directory when readOnlyRootFilesystem: true
|
||
volumeMounts:
|
||
- name: tmp
|
||
mountPath: /tmp
|
||
|
||
volumes:
|
||
- name: tmp
|
||
emptyDir: {}
|
||
```
|
||
|
||
---
|
||
|
||
## Probes — Liveness, Readiness, Startup
|
||
|
||
Understanding when to use each probe is critical:
|
||
|
||
| Probe | Failure Action | Use For |
|
||
|-------|---------------|---------|
|
||
| `startupProbe` | Kills container if slow to start | Slow-starting apps (JVM, Python) |
|
||
| `livenessProbe` | Restarts container | Deadlock / hung process detection |
|
||
| `readinessProbe` | Removes from Service endpoints | Temporary unavailability (DB reconnect) |
|
||
|
||
```yaml
|
||
# Correct pattern: startupProbe covers slow startup,
|
||
# then liveness/readiness take over
|
||
startupProbe:
|
||
httpGet:
|
||
path: /health
|
||
port: 8080
|
||
failureThreshold: 30 # 30 * 5s = 150s max startup time
|
||
periodSeconds: 5
|
||
|
||
livenessProbe:
|
||
httpGet:
|
||
path: /health
|
||
port: 8080
|
||
periodSeconds: 30
|
||
failureThreshold: 3 # 3 * 30s = 90s before restart
|
||
|
||
readinessProbe:
|
||
httpGet:
|
||
path: /ready # Separate endpoint: checks DB, cache, etc.
|
||
port: 8080
|
||
periodSeconds: 10
|
||
failureThreshold: 2
|
||
```
|
||
|
||
```yaml
|
||
# WRONG: initialDelaySeconds without startupProbe
|
||
# If the app takes 60s to start, set a startupProbe instead
|
||
livenessProbe:
|
||
httpGet:
|
||
path: /health
|
||
port: 8080
|
||
initialDelaySeconds: 60 # BAD: Arbitrary wait, race condition
|
||
```
|
||
|
||
---
|
||
|
||
## Services and Ingress
|
||
|
||
### Service Types
|
||
|
||
```yaml
|
||
# ClusterIP (default) — internal-only
|
||
apiVersion: v1
|
||
kind: Service
|
||
metadata:
|
||
name: my-app
|
||
namespace: my-namespace
|
||
spec:
|
||
selector:
|
||
app: my-app
|
||
ports:
|
||
- port: 80
|
||
targetPort: 8080
|
||
protocol: TCP
|
||
type: ClusterIP
|
||
```
|
||
|
||
```yaml
|
||
# LoadBalancer — external traffic (cloud providers)
|
||
spec:
|
||
type: LoadBalancer
|
||
ports:
|
||
- port: 443
|
||
targetPort: 8080
|
||
```
|
||
|
||
### Ingress with TLS
|
||
|
||
```yaml
|
||
apiVersion: networking.k8s.io/v1
|
||
kind: Ingress
|
||
metadata:
|
||
name: my-app
|
||
namespace: my-namespace
|
||
annotations:
|
||
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
||
spec:
|
||
ingressClassName: nginx
|
||
tls:
|
||
- hosts:
|
||
- myapp.example.com
|
||
secretName: my-app-tls
|
||
rules:
|
||
- host: myapp.example.com
|
||
http:
|
||
paths:
|
||
- path: /
|
||
pathType: Prefix
|
||
backend:
|
||
service:
|
||
name: my-app
|
||
port:
|
||
number: 80
|
||
```
|
||
|
||
---
|
||
|
||
## ConfigMaps and Secrets
|
||
|
||
### ConfigMap — Non-sensitive configuration
|
||
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
metadata:
|
||
name: my-app-config
|
||
namespace: my-namespace
|
||
data:
|
||
LOG_LEVEL: "info"
|
||
APP_ENV: "production"
|
||
MAX_CONNECTIONS: "100"
|
||
# Mount as a file for complex config
|
||
app.yaml: |
|
||
server:
|
||
port: 8080
|
||
timeout: 30s
|
||
```
|
||
|
||
```yaml
|
||
# Mount ConfigMap as a file
|
||
volumes:
|
||
- name: config
|
||
configMap:
|
||
name: my-app-config
|
||
items:
|
||
- key: app.yaml
|
||
path: app.yaml
|
||
volumeMounts:
|
||
- name: config
|
||
mountPath: /etc/app
|
||
readOnly: true
|
||
```
|
||
|
||
### Secrets — Sensitive data
|
||
|
||
```bash
|
||
# Create secret from literal (CLI, then store in Vault/SOPS)
|
||
kubectl create secret generic my-app-secrets \
|
||
--from-literal=db-password='s3cr3t' \
|
||
--namespace=my-namespace \
|
||
--dry-run=client -o yaml | kubectl apply -f -
|
||
```
|
||
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: Secret
|
||
metadata:
|
||
name: my-app-secrets
|
||
namespace: my-namespace
|
||
type: Opaque
|
||
# Values are base64-encoded (NOT encrypted — use Sealed Secrets or ESO for real encryption)
|
||
data:
|
||
db-password: czNjcjN0 # base64 of 's3cr3t'
|
||
```
|
||
|
||
> **Important:** Raw Kubernetes Secrets are only base64-encoded, not encrypted at rest unless your cluster has encryption configured. Use [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) or [External Secrets Operator](https://external-secrets.io) for production.
|
||
|
||
---
|
||
|
||
## Resource Requests and Limits
|
||
|
||
```yaml
|
||
resources:
|
||
requests: # Scheduler uses this to place the pod
|
||
cpu: "100m" # 100 millicores = 0.1 CPU
|
||
memory: "128Mi"
|
||
limits: # Container is killed/throttled above this
|
||
cpu: "500m"
|
||
memory: "256Mi"
|
||
```
|
||
|
||
**Rules of thumb:**
|
||
|
||
| Workload Type | CPU Request | Memory Request | Notes |
|
||
|---------------|-------------|----------------|-------|
|
||
| Web API | 100–250m | 128–256Mi | Set limits 2-4x requests |
|
||
| Worker/consumer | 250–500m | 256–512Mi | Memory limit = request for predictability |
|
||
| JVM app | 500m–1 | 512Mi–2Gi | Allow headroom above `-Xmx` for JVM overhead |
|
||
| Sidecar | 10–50m | 32–64Mi | Keep minimal |
|
||
|
||
```yaml
|
||
# WRONG: No requests or limits — unpredictable scheduling, OOM evictions
|
||
containers:
|
||
- name: app
|
||
image: myapp:latest
|
||
# Missing resources: {} — this is dangerous in production
|
||
|
||
# WRONG: Limits without requests — requests default to limits, over-reserves capacity
|
||
resources:
|
||
limits:
|
||
cpu: "2"
|
||
memory: "1Gi"
|
||
# requests missing — will default to limits values
|
||
```
|
||
|
||
---
|
||
|
||
## RBAC — Roles and ServiceAccounts
|
||
|
||
### Principle of Least Privilege
|
||
|
||
**Two patterns depending on whether the app calls the Kubernetes API:**
|
||
|
||
#### Pattern A — App does NOT need the Kubernetes API (most apps)
|
||
|
||
Disable token automounting on the ServiceAccount. The Role/RoleBinding are not needed.
|
||
|
||
```yaml
|
||
# ServiceAccount with token disabled — safest default
|
||
apiVersion: v1
|
||
kind: ServiceAccount
|
||
metadata:
|
||
name: my-app-sa
|
||
namespace: my-namespace
|
||
automountServiceAccountToken: false # No K8s API token injected into pods
|
||
```
|
||
|
||
```yaml
|
||
# Reference in Deployment — no token, no API access
|
||
spec:
|
||
template:
|
||
spec:
|
||
serviceAccountName: my-app-sa
|
||
automountServiceAccountToken: false # Belt-and-suspenders: also set at pod level
|
||
```
|
||
|
||
#### Pattern B — App DOES need the Kubernetes API (operators, controllers, config watchers)
|
||
|
||
Enable the token and grant only the permissions actually required.
|
||
|
||
```yaml
|
||
# 1. ServiceAccount — enable token for this SA
|
||
apiVersion: v1
|
||
kind: ServiceAccount
|
||
metadata:
|
||
name: my-app-sa
|
||
namespace: my-namespace
|
||
automountServiceAccountToken: true # Token required: app calls K8s API
|
||
```
|
||
|
||
```yaml
|
||
# 2. Role — grant only what the app needs (namespace-scoped)
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: Role
|
||
metadata:
|
||
name: my-app-role
|
||
namespace: my-namespace
|
||
rules:
|
||
- apiGroups: [""]
|
||
resources: ["configmaps"]
|
||
verbs: ["get", "list", "watch"] # Read-only, specific resource
|
||
- apiGroups: [""]
|
||
resources: ["secrets"]
|
||
resourceNames: ["my-app-secrets"] # Restrict to specific secret by name
|
||
verbs: ["get"]
|
||
```
|
||
|
||
```yaml
|
||
# 3. Bind Role to ServiceAccount
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: RoleBinding
|
||
metadata:
|
||
name: my-app-rolebinding
|
||
namespace: my-namespace
|
||
subjects:
|
||
- kind: ServiceAccount
|
||
name: my-app-sa
|
||
namespace: my-namespace
|
||
roleRef:
|
||
kind: Role
|
||
apiGroup: rbac.authorization.k8s.io
|
||
name: my-app-role
|
||
```
|
||
|
||
```yaml
|
||
# 4. Reference SA in Deployment
|
||
spec:
|
||
template:
|
||
spec:
|
||
serviceAccountName: my-app-sa
|
||
# automountServiceAccountToken defaults to true from SA — token is injected
|
||
```
|
||
|
||
---
|
||
|
||
## Horizontal Pod Autoscaler (HPA)
|
||
|
||
```yaml
|
||
apiVersion: autoscaling/v2
|
||
kind: HorizontalPodAutoscaler
|
||
metadata:
|
||
name: my-app-hpa
|
||
namespace: my-namespace
|
||
spec:
|
||
scaleTargetRef:
|
||
apiVersion: apps/v1
|
||
kind: Deployment
|
||
name: my-app
|
||
minReplicas: 2 # Always at least 2 for HA
|
||
maxReplicas: 10
|
||
metrics:
|
||
- type: Resource
|
||
resource:
|
||
name: cpu
|
||
target:
|
||
type: Utilization
|
||
averageUtilization: 70 # Scale up when avg CPU > 70%
|
||
- type: Resource
|
||
resource:
|
||
name: memory
|
||
target:
|
||
type: Utilization
|
||
averageUtilization: 80
|
||
```
|
||
|
||
> HPA requires `resources.requests` to be set on all containers — it calculates utilization as `current / request`.
|
||
|
||
---
|
||
|
||
## PodDisruptionBudget (PDB)
|
||
|
||
Prevent too many pods going down during node drains or rolling updates:
|
||
|
||
```yaml
|
||
apiVersion: policy/v1
|
||
kind: PodDisruptionBudget
|
||
metadata:
|
||
name: my-app-pdb
|
||
namespace: my-namespace
|
||
spec:
|
||
minAvailable: 2 # OR use maxUnavailable: 1
|
||
selector:
|
||
matchLabels:
|
||
app: my-app
|
||
```
|
||
|
||
---
|
||
|
||
## Namespaces and Multi-Tenancy
|
||
|
||
```bash
|
||
# Create namespace with resource quotas
|
||
kubectl create namespace my-namespace
|
||
|
||
# Apply ResourceQuota to limit namespace consumption
|
||
kubectl apply -f - <<EOF
|
||
apiVersion: v1
|
||
kind: ResourceQuota
|
||
metadata:
|
||
name: my-namespace-quota
|
||
namespace: my-namespace
|
||
spec:
|
||
hard:
|
||
requests.cpu: "4"
|
||
requests.memory: 4Gi
|
||
limits.cpu: "8"
|
||
limits.memory: 8Gi
|
||
pods: "20"
|
||
EOF
|
||
```
|
||
|
||
---
|
||
|
||
## Jobs and CronJobs
|
||
|
||
```yaml
|
||
# One-off Job (DB migration, data processing)
|
||
apiVersion: batch/v1
|
||
kind: Job
|
||
metadata:
|
||
name: db-migrate
|
||
namespace: my-namespace
|
||
spec:
|
||
backoffLimit: 3 # Retry up to 3 times on failure
|
||
ttlSecondsAfterFinished: 3600 # Auto-delete after 1h
|
||
template:
|
||
spec:
|
||
restartPolicy: OnFailure # Never for Jobs (not Always)
|
||
containers:
|
||
- name: migrate
|
||
image: ghcr.io/org/my-app:1.0.0
|
||
command: ["python", "manage.py", "migrate"]
|
||
resources:
|
||
requests:
|
||
cpu: "100m"
|
||
memory: "256Mi"
|
||
```
|
||
|
||
```yaml
|
||
# CronJob
|
||
apiVersion: batch/v1
|
||
kind: CronJob
|
||
metadata:
|
||
name: cleanup-job
|
||
namespace: my-namespace
|
||
spec:
|
||
schedule: "0 2 * * *" # 2am daily
|
||
concurrencyPolicy: Forbid # Don't run if previous still running
|
||
successfulJobsHistoryLimit: 3
|
||
failedJobsHistoryLimit: 1
|
||
jobTemplate:
|
||
spec:
|
||
template:
|
||
spec:
|
||
restartPolicy: OnFailure
|
||
containers:
|
||
- name: cleanup
|
||
image: ghcr.io/org/cleanup:1.0.0
|
||
resources:
|
||
requests:
|
||
cpu: "50m"
|
||
memory: "64Mi"
|
||
```
|
||
|
||
---
|
||
|
||
## kubectl Debugging Cheatsheet
|
||
|
||
```bash
|
||
# --- Pod status and logs ---
|
||
kubectl get pods -n my-namespace
|
||
kubectl get pods -n my-namespace -o wide # Show node assignment
|
||
kubectl describe pod <pod-name> -n my-namespace # Events and state details
|
||
kubectl logs <pod-name> -n my-namespace # Current logs
|
||
kubectl logs <pod-name> -n my-namespace --previous # Logs from crashed container
|
||
kubectl logs <pod-name> -n my-namespace -c <container> # Multi-container pod
|
||
|
||
# --- Execute into a running container ---
|
||
kubectl exec -it <pod-name> -n my-namespace -- sh
|
||
kubectl exec -it <pod-name> -n my-namespace -- bash
|
||
|
||
# --- Check resource usage ---
|
||
kubectl top pods -n my-namespace
|
||
kubectl top nodes
|
||
|
||
# --- Deployment operations ---
|
||
kubectl rollout status deployment/my-app -n my-namespace
|
||
kubectl rollout history deployment/my-app -n my-namespace
|
||
kubectl rollout undo deployment/my-app -n my-namespace # Rollback
|
||
kubectl rollout undo deployment/my-app --to-revision=2 -n my-namespace
|
||
|
||
# --- Scale manually ---
|
||
kubectl scale deployment my-app --replicas=5 -n my-namespace
|
||
|
||
# --- Inspect events (cluster-wide issues) ---
|
||
kubectl get events -n my-namespace --sort-by='.lastTimestamp'
|
||
|
||
# --- Port-forward for local debugging ---
|
||
kubectl port-forward pod/<pod-name> 8080:8080 -n my-namespace
|
||
kubectl port-forward svc/my-app 8080:80 -n my-namespace
|
||
|
||
# --- Dry-run to validate YAML ---
|
||
kubectl apply -f deployment.yaml --dry-run=client
|
||
kubectl apply -f deployment.yaml --dry-run=server # Validates against live cluster
|
||
```
|
||
|
||
### Diagnosing Common Errors
|
||
|
||
```bash
|
||
# CrashLoopBackOff: container keeps crashing
|
||
kubectl logs <pod-name> --previous -n my-namespace # Check crash logs
|
||
kubectl describe pod <pod-name> -n my-namespace # Check exit code & OOMKilled
|
||
|
||
# ImagePullBackOff: can't pull image
|
||
kubectl describe pod <pod-name> -n my-namespace # Check Events section
|
||
# Causes: wrong image tag, missing imagePullSecret, private registry
|
||
|
||
# Pending pod: not scheduled
|
||
kubectl describe pod <pod-name> -n my-namespace
|
||
# Causes: insufficient resources, no matching node selector, taint/toleration mismatch
|
||
|
||
# OOMKilled: out of memory
|
||
# Increase memory limits, check for memory leaks
|
||
kubectl describe pod <pod-name> -n my-namespace | grep -A5 "Last State"
|
||
```
|
||
|
||
---
|
||
|
||
## Anti-Patterns
|
||
|
||
```yaml
|
||
# BAD: Using :latest tag — non-deterministic deployments
|
||
image: myapp:latest
|
||
|
||
# GOOD: Pin to a specific immutable tag (SHA or semver)
|
||
image: ghcr.io/org/myapp:1.4.2
|
||
# or
|
||
image: ghcr.io/org/myapp@sha256:abc123...
|
||
|
||
# ---
|
||
|
||
# BAD: Running as root
|
||
securityContext: {} # Defaults to root
|
||
|
||
# GOOD: Non-root with explicit UID
|
||
securityContext:
|
||
runAsNonRoot: true
|
||
runAsUser: 1001
|
||
|
||
# ---
|
||
|
||
# BAD: No resource limits — one pod can starve the entire node
|
||
containers:
|
||
- name: app
|
||
image: myapp:1.0.0
|
||
# No resources defined
|
||
|
||
# GOOD: Always set requests and limits
|
||
resources:
|
||
requests:
|
||
cpu: "100m"
|
||
memory: "128Mi"
|
||
limits:
|
||
cpu: "500m"
|
||
memory: "256Mi"
|
||
|
||
# ---
|
||
|
||
# BAD: Storing plaintext secrets in ConfigMaps
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
data:
|
||
DB_PASSWORD: "mysecretpassword" # NEVER — use Secret or external secrets manager
|
||
|
||
# ---
|
||
|
||
# BAD: ClusterAdmin for application service accounts
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRoleBinding
|
||
roleRef:
|
||
kind: ClusterRole
|
||
name: cluster-admin # Grants god-mode to your app
|
||
|
||
# ---
|
||
|
||
# BAD: minAvailable: 0 in PDB — defeats the purpose
|
||
spec:
|
||
minAvailable: 0
|
||
|
||
# ---
|
||
|
||
# BAD: restartPolicy: Always in a Job (causes infinite restart loop)
|
||
spec:
|
||
restartPolicy: Always # Use OnFailure or Never for Jobs
|
||
```
|
||
|
||
---
|
||
|
||
## Best Practices Checklist
|
||
|
||
### Security
|
||
- [ ] Container runs as non-root (`runAsNonRoot: true`, `runAsUser` set)
|
||
- [ ] `readOnlyRootFilesystem: true` with `emptyDir` for writable paths
|
||
- [ ] `allowPrivilegeEscalation: false`
|
||
- [ ] All capabilities dropped (`capabilities.drop: [ALL]`)
|
||
- [ ] Dedicated ServiceAccount per app, not `default`
|
||
- [ ] `automountServiceAccountToken: false` unless needed
|
||
- [ ] RBAC follows least privilege (use `Role`, not `ClusterRole` unless needed)
|
||
- [ ] Secrets managed via Sealed Secrets or External Secrets Operator
|
||
|
||
### Reliability
|
||
- [ ] All 3 probe types configured (startup + liveness + readiness)
|
||
- [ ] Resource requests AND limits set on every container
|
||
- [ ] `minReplicas: 2+` for any production workload
|
||
- [ ] PodDisruptionBudget defined for stateful or critical services
|
||
- [ ] `RollingUpdate` strategy with `maxUnavailable: 0`
|
||
- [ ] HPA configured for variable-load services
|
||
|
||
### Observability
|
||
- [ ] App exposes `/health` (liveness) and `/ready` (readiness) endpoints
|
||
- [ ] Structured JSON logging (no PII in logs)
|
||
- [ ] Resource labels: `app`, `version`, `environment`
|
||
|
||
---
|
||
|
||
## Related Skills
|
||
|
||
- `docker-patterns` — Multi-stage Dockerfiles and image security
|
||
- `deployment-patterns` — CI/CD pipelines, rollback strategy, health check endpoints
|
||
- `security-review` — Broader security hardening context
|
||
- `git-workflow` — GitOps integration with K8s (ArgoCD / Flux patterns)
|