CI vs CD
CI: automate build/test on every change. CD: automate deployment to environments (staging/prod). Tools: GitHub Actions, Jenkins, GitLab CI.
Containers vs VMs
Containers share OS kernel; are lightweight and fast to start. VMs emulate hardware; heavier isolation. Use Docker for packaging apps; Kubernetes for orchestration.
Kubernetes basics
Key objects: Pod, Deployment, Service, Ingress, ConfigMap, Secret. Reconcile desired state via controllers.
Infrastructure as Code
Manage infra with code (Terraform, CloudFormation). Enables versioning, review, repeatability. Separate variables/state from modules.
Observability: logs, metrics, traces
Centralized logs, time-series metrics (Prometheus), distributed tracing (OpenTelemetry). Set SLOs/SLIs/SLAs; alert on symptoms.
Blue/Green and Canary deployments
Blue/Green: two production environments; switch traffic. Canary: gradually shift traffic to new version. Use feature flags to decouple deploy from release.
Secrets management
Store secrets in vaults (HashiCorp Vault, cloud KMS). Avoid hardcoding; rotate regularly; use least privilege.
Horizontal vs Vertical scaling
Vertical: bigger machine. Horizontal: more machines behind load balancer. Prefer horizontal for resilience and cost.
GitOps
Manage deployments via Git as the source of truth. Use pull requests and reconciliation (Argo CD/Flux) to apply desired state.
Terraform state and modules
Store state remotely (S3 + DynamoDB lock); compose reusable modules; enforce versions; plan/apply via CI, review changes before apply.
Container security
Scan images (Trivy, Clair), minimize base images, run as non-root, sign images (cosign), and restrict capabilities/PSPs.
SRE: Error budgets and SLOs
Define SLOs/SLIs, track error budgets to balance reliability and velocity; use burn-rate alerts to detect breaches early.
Zero-downtime migrations
Use expand/contract pattern: add nullable columns, backfill, dual-write, switch reads, then remove old fields; coordinate via feature flags.
Incident response and postmortems
Run blameless postmortems, document timeline/impact/root cause, track action items, and improve on-call/runbooks. Practice game days.