The Problem with "Just Tell Them Not To"
Every Kubernetes cluster I have managed eventually hits the same issue: someone deploys a container running as root, someone else creates a service without resource limits, and a third person pushes an image from Docker Hub straight into production with no vulnerability scan.
You can write a wiki page with best practices. You can send Slack reminders. None of it works at scale. People forget, people are busy, and people onboard without reading the wiki.
The fix is to encode your rules directly into the cluster. Policy-as-Code means the cluster rejects bad configurations at admission time, before anything gets created. The developer gets immediate feedback, the security team does not need to manually review every deployment, and your compliance posture is consistent across every namespace.
Why Kyverno
There are a few tools in this space. OPA Gatekeeper is the original, and it works, but it requires learning Rego - a policy language most teams do not want to invest in. Kyverno takes a different approach: policies are native Kubernetes resources written in YAML. If your team can write a Kubernetes manifest, they can write a Kyverno policy.
Kyverno does three things:
- Validate - reject resources that break rules (e.g. no
latesttags, must have resource limits) - Mutate - automatically fix resources on admission (e.g. inject labels, add default resource requests)
- Generate - create companion resources when something is deployed (e.g. auto-create NetworkPolicies)
Installing Kyverno
Helm is the fastest path:
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno \
--namespace kyverno \
--create-namespace \
--version 3.3.4 \
--set replicaCount=3 \
--set backgroundController.enabled=true
Three replicas for HA. The background controller lets Kyverno scan existing resources against new policies, not just new admissions.
Verify it is running:
kubectl get pods -n kyverno
Validation Policies
Block Latest Tag
The most common starting point. This stops anyone from deploying an image with :latest or no tag at all:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-latest-tag
annotations:
policies.kyverno.io/title: Disallow Latest Tag
policies.kyverno.io/category: Best Practices
policies.kyverno.io/severity: medium
spec:
validationFailureAction: Enforce
background: true
rules:
- name: validate-image-tag
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Using ':latest' or no tag is not allowed. Pin a specific version."
pattern:
spec:
containers:
- image: "*:*"
initContainers:
- image: "*:*"
When validationFailureAction is set to Enforce, Kyverno rejects the resource. Set it to Audit first to see what would fail without breaking anything.
Require Resource Limits
No deployment should run without CPU and memory limits. Without them, a single pod can consume all node resources and starve everything else:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
background: true
rules:
- name: check-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "CPU and memory limits are required for all containers."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
Require Labels
Enforce standard labels for cost tracking, ownership, and observability:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-labels
spec:
validationFailureAction: Enforce
background: true
rules:
- name: check-required-labels
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
- DaemonSet
validate:
message: "Labels 'app.kubernetes.io/name', 'app.kubernetes.io/team', and 'app.kubernetes.io/env' are required."
pattern:
metadata:
labels:
app.kubernetes.io/name: "?*"
app.kubernetes.io/team: "?*"
app.kubernetes.io/env: "?*"
Mutation Policies
Mutation policies modify resources on the way in. This is powerful for enforcing defaults without making developers change their manifests.
Auto-inject Team Labels
If a namespace has a team annotation, automatically add it as a label to every pod in that namespace:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-team-label
spec:
background: false
rules:
- name: inject-team-from-namespace
match:
any:
- resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
metadata:
labels:
app.kubernetes.io/team: "{{ request.namespace | namespace_label(@, 'team') }}"
Set Default Resource Requests
If a container has no resource requests, inject sensible defaults so the scheduler can make good placement decisions:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: default-resource-requests
spec:
background: false
rules:
- name: set-default-requests
match:
any:
- resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
spec:
containers:
- (name): "*"
resources:
requests:
+(memory): "128Mi"
+(cpu): "50m"
The +() syntax means "add only if not already set." It will not override anything the developer has explicitly defined.
Generation Policies
Generation policies create new resources automatically when certain conditions are met.
Auto-create NetworkPolicy
When a new namespace is created, automatically create a default-deny NetworkPolicy:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-networkpolicy
spec:
background: false
rules:
- name: default-deny-ingress
match:
any:
- resources:
kinds:
- Namespace
generate:
synchronize: true
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-ingress
namespace: "{{ request.object.metadata.name }}"
data:
spec:
podSelector: {}
policyTypes:
- Ingress
With synchronize: true, if someone deletes the NetworkPolicy, Kyverno recreates it. The policy stays in place.
Rollout Strategy
Do not turn on Enforce mode across the cluster on day one. Here is the rollout I use:
Phase 1: Audit Mode (Week 1-2)
Set all policies to Audit mode. This logs violations without blocking anything:
spec:
validationFailureAction: Audit
Check what would fail:
kubectl get policyreport -A --no-headers | awk '{print $1, $3, $4}'
Phase 2: Enforce on Non-Prod (Week 3-4)
Use match conditions to enforce only in dev and staging namespaces:
rules:
- name: validate-image-tag
match:
any:
- resources:
kinds:
- Pod
namespaceSelector:
matchExpressions:
- key: env
operator: In
values:
- dev
- staging
Phase 3: Enforce Everywhere (Week 5+)
Remove the namespace selector. Full enforcement. By this point teams have had weeks to fix their configurations.
Exceptions
Some workloads genuinely need to break rules (init containers running as root for filesystem setup, etc). Use exclude blocks:
rules:
- name: validate-image-tag
exclude:
any:
- resources:
namespaces:
- kube-system
- kyverno
CI Pipeline Integration
Shift left by validating manifests in CI before they reach the cluster. The Kyverno CLI does this:
# Install the CLI
brew install kyverno
# Test manifests against policies locally
kyverno apply ./policies/ --resource ./manifests/deployment.yaml
Add this to your GitHub Actions pipeline:
# .github/workflows/policy-check.yml
name: Policy Check
on: [pull_request]
jobs:
kyverno-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Kyverno CLI
run: |
curl -LO https://github.com/kyverno/kyverno/releases/download/v1.12.0/kyverno-cli_v1.12.0_linux_x86_64.tar.gz
tar -xzf kyverno-cli_v1.12.0_linux_x86_64.tar.gz
sudo mv kyverno /usr/local/bin/
- name: Validate manifests
run: |
kyverno apply ./policies/ --resource ./k8s/ -o json | tee results.json
FAIL_COUNT=$(jq '[.results[] | select(.result == "fail")] | length' results.json)
if [ "$FAIL_COUNT" -gt 0 ]; then
echo "Policy violations found"
exit 1
fi
Now policy violations fail the PR before anyone needs to review them.
Monitoring
Kyverno generates PolicyReport resources that work with standard Kubernetes tooling:
# View all violations across the cluster
kubectl get polr -A -o wide
# Count violations by policy
kubectl get polr -A -o json | \
jq -r '.items[].results[]? | select(.result=="fail") | .policy' | \
sort | uniq -c | sort -rn
For dashboards, Kyverno exposes Prometheus metrics. Add these to your Grafana:
kyverno_admission_review_duration_seconds- latency of admission reviewskyverno_policy_results_total- count of pass/fail/warn/error by policykyverno_admission_requests_total- total admission requests
Common Pitfalls
1. Enforcing before auditing. You will break existing workloads. Always start in Audit mode and review PolicyReports before switching to Enforce.
2. Not excluding system namespaces. Kyverno should not validate kube-system, kyverno, or other infrastructure namespaces. Their workloads have different requirements.
3. Forgetting init containers. A policy that validates containers but ignores initContainers and ephemeralContainers has a gap. Cover all three.
4. No exception process. Some workloads legitimately need to break rules. Build an exception mechanism using exclude blocks or policy exceptions from day one. Without it, teams will push back on the entire system.
5. Too many policies at once. Start with 3-5 high-impact policies (latest tag, resource limits, required labels). Add more as the team builds confidence.
Summary
Kyverno gives you governance without Rego, enforcement without manual reviews, and compliance evidence via PolicyReports. Start with Audit mode, graduate to Enforce, and integrate into CI so violations never reach the cluster. Three weeks from install to full enforcement if you follow the phased rollout.
