Skip to content

DevOps Project Example: From Code Push to Production with GitOps, FluxCD, and Kubernetes

Most DevOps tutorials show you a pipeline diagram. This one shows you a real pipeline, built on a real application, running on real Kubernetes clusters — with every tool, every workflow, and every design decision explained.

This post walks through the complete CI/CD system behind Slotmachine — a real-time multiplayer tournament app — from the moment a developer pushes code to GitHub, through six security and quality gates, all the way to automated deployment on both Nutanix on-premise clusters and AWS EKS. No hand-waving. No "and then magic happens."

The full source code is available in two repositories:


The Big Picture

Here is the complete flow before we drill into any single piece:

Developer
  │ git push
┌─────────────────────────────────────────────────────────────────┐
│  GitHub Actions — 6-Job CI Pipeline (slotmachine repo)          │
│                                                                 │
│  [1] CodeQL SAST → [2] Build+Test → [3] Locust Load Test       │
│  → [4] OWASP ZAP DAST → [5] Docker Build + Trivy Scan + Push  │
│  → [6] Update Deployment Repo (Kustomize image tag update)      │
└─────────────────────────────────────────────────────────────────┘
  │                         │
  │ push images              │ push to branch: development
  ▼                         ▼
Docker Hub              slotmachine-deployment repo
(pkhamdee/slotmachine)    ├── branch: development  ──► K8s DEV  (Nutanix)
  client:{sha}            ├── branch: qa           ──► K8s QA   (Nutanix)
  server:{sha}            └── branch: main         ──► K8s PROD (AWS EKS)
                               ▲              ▲
                          PR + DevOps    PR + QA Sign Off
                          Approval       + SDM Approval

All three clusters watched by FluxCD (CD Agent):
  FluxCD polls git branch → detects change → kustomize apply → K8s reconciled

NKP Management Cluster (Nutanix On-Premise):
  Manages DEV + QA clusters with SSO/IdP, Governance, Platform Services

Two repositories. Three environments. One fully automated path from commit to production — with human approval gates exactly where they belong.


Part 1: The Philosophy — Why Separate Code and Deployment?

The single most important design decision in this architecture is something many teams overlook: the application code and the deployment configuration live in different Git repositories.

The Problem with Mixing Them

When you put your Kubernetes YAML files alongside your application code:

my-app/
├── src/          ← application code
├── tests/
└── k8s/          ← Kubernetes manifests  ← ⚠️ problematic
    ├── deployment.yaml
    └── service.yaml

You create tight coupling that causes real operational pain:

  • A hotfix to a CSS file triggers a full deployment pipeline
  • You can't audit "what's running in production" without reading application commit history
  • Rolling back a bad deployment means reverting application code, not just config
  • Multiple environments (dev/qa/prod) live in branches or subdirectories — a mess

The GitOps Solution: Two Repos, Two Concerns

slotmachine/              ← Source of truth for the APPLICATION
  .github/workflows/ci.yml
  client/src/             ← React frontend
  server/src/             ← Node.js backend
  tests/locustfile.py

slotmachine-deployment/   ← Source of truth for WHAT RUNS IN THE CLUSTER
  base/                   ← Shared Kubernetes manifests
  overlays/development/   ← Dev environment configuration
  overlays/production/    ← Prod environment configuration

Browse the code: slotmachine app repo · slotmachine-deployment repo

The rule: the application repo never contains Kubernetes manifests. The deployment repo never contains application source code. The only connection is an image tag — a short Git SHA that the CI pipeline writes into the deployment repo automatically.

This is the GitOps model: Git is the single source of truth for the desired state of the cluster. No kubectl apply from a CI runner. The cluster pulls its desired state from Git, rather than having it pushed from a pipeline.

Benefits You Feel Immediately

Problem Without Separation With GitOps Separation
"What's running in prod?" Dig through CI logs Read overlays/production/kustomization.yaml
Rolling back prod Revert app code + redeploy Revert deployment repo PR
Audit trail Mixed with code changes Clean, deployment-only commit history
Environment drift Configuration copy-paste Kustomize base + overlays
Access control Devs touch prod infra Devs only push to app repo; infra team owns deployment repo

Part 2: The Application — Slotmachine

Before the pipeline, let's understand what we're deploying.

Slotmachine is a real-time multiplayer slot tournament application. Players compete in timed sessions, spinning reels to accumulate the highest balance. A live scoreboard updates every second. An admin controls session flow.

Tech Stack

Frontend:  React 18 + Vite (SPA) served by Nginx
Backend:   Node.js + Express + Socket.io
Database:  MongoDB 7 + Mongoose
Cache:     Redis 7 (Socket.io multi-pod fan-out via ioredis pub/sub)
Container: Docker (client image: nginx:alpine, server image: node:20-alpine)

Architecture Inside Kubernetes

Internet
AWS NLB (Production) / LoadBalancer Service (Dev/QA)
┌─────────────────────────────────────────────┐
│  client Pod × 4 (prod) / × 1 (dev)         │
│  nginx:alpine                               │
│  ├── serves /  → React SPA (static files)  │
│  ├── proxies /api/*  → server:3001         │
│  └── proxies /socket.io/* → server:3001    │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│  server Pod × 6 (prod) / × 1 (dev)         │
│  node:20-alpine                             │
│  ├── Express REST API (port 3001)           │
│  ├── Socket.io server (real-time events)    │
│  └── Mongoose (MongoDB) + ioredis (Redis)  │
└─────────────────────────────────────────────┘
  │               │
  ▼               ▼
MongoDB         Redis
StatefulSet     Deployment
(20Gi PVC prod) (pub/sub adapter)

The multi-pod server architecture relies on Redis as a Socket.io adapter: when one server pod broadcasts a scoreboard update, Redis fan-outs the event to all other server pods, which forward it to their connected clients. Without Redis, clients connected to different pods would see inconsistent scoreboards.

Repository Structure

slotmachine/
├── .github/workflows/ci.yml      ← 6-job CI pipeline
├── .zap/rules.tsv                ← OWASP ZAP alert filter rules
├── docker-compose.yml            ← Local development stack
├── tests/locustfile.py           ← Load test scenarios
├── client/
│   ├── Dockerfile
│   ├── nginx.conf                ← SPA routing + API proxy config
│   ├── vite.config.js
│   └── src/
│       ├── components/           ← 12 React components
│       ├── hooks/                ← useGame, useSession
│       ├── api/gameApi.js
│       └── __tests__/            ← 97 Vitest tests
└── server/
    ├── Dockerfile
    └── src/
        ├── config/gameConfig.js
        ├── controllers/
        ├── middleware/adminAuth.js
        ├── models/               ← 5 MongoDB schemas
        ├── routes/
        └── services/
            ├── SessionManager.js ← State machine + timers
            └── slotEngine.js     ← Spin logic + payout table

Part 3: The CI Pipeline — 6 Jobs, Every Gate Explained

The CI pipeline is defined in .github/workflows/ci.yml. Every push and pull request runs all six jobs. Only the final two jobs (container push and GitOps update) are gated to the main branch.

push / PR
  ├─── [1] code-scan         (CodeQL SAST)
  ├─── [2] build-and-test    (npm + Vitest + node:test)
  ├─── [3] performance-test  (Locust load test)
  ├─── [4] dast-zap          (OWASP ZAP DAST)
  └── all 4 pass?
        ├─── [5] container-build-scan-push  (Docker + Trivy + DockerHub)
        └─── [6] update-gitops              (Kustomize image tag → deployment repo)
                  (main branch only for jobs 5 and 6)

Job 1: CodeQL SAST — Finding Vulnerabilities Before They Ship

SAST (Static Application Security Testing) analyzes source code without running it.

# .github/workflows/ci.yml (simplified)
code-scan:
  runs-on: ubuntu-latest
  permissions:
    security-events: write
  steps:
    - uses: actions/checkout@v4
    - uses: github/codeql-action/init@v3
      with:
        languages: javascript
        queries: security-extended   # Includes OWASP Top 10 patterns
    - uses: github/codeql-action/autobuild@v3
    - uses: github/codeql-action/analyze@v3

CodeQL builds a semantic model of the code — it understands data flow, not just text patterns. It can detect: - SQL/NoSQL injection where user input flows to a query - XSS where untrusted data reaches the DOM - Prototype pollution in JavaScript - Path traversal vulnerabilities

Results appear directly in the GitHub Security tab. A Critical finding blocks the PR.

Job 2: Build and Test — 118 Tests, Zero Compromises

build-and-test:
  runs-on: ubuntu-latest
  services:
    mongodb:
      image: mongo:7
      ports: ['27017:27017']
    redis:
      image: redis:7-alpine
      ports: ['6379:6379']
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with: { node-version: '20' }
    - run: npm ci
    - run: npm run test --workspace=client   # 97 Vitest tests
    - run: npm run test --workspace=server   # 21 node:test tests
    - run: npm run build --workspace=client  # Vite production build

118 tests run against real MongoDB and Redis instances — no mocks, no stubs at the database layer. The build artifact is produced and verified before any container is built.

Job 3: Performance Test — Catching Regressions Before Users Do

performance-test:
  runs-on: ubuntu-latest
  services:
    mongodb: { image: mongo:7 }
    redis:   { image: redis:7-alpine }
  steps:
    - run: pip install locust
    - run: |
        node server/server.js &
        sleep 5
        locust -f tests/locustfile.py \
          --headless -u 2 -r 1 -t 10s \
          --host http://localhost:3001

The locustfile.py simulates real user behavior: create a session, spin the reels, check the scoreboard. Running this on every PR catches performance regressions — a slow database query or a blocking event loop operation — before they reach any environment.

Job 4: DAST — Attacking the Running Application

DAST (Dynamic Application Security Testing) probes the running application from the outside, like a real attacker would. OWASP ZAP is the industry-standard open-source tool for this.

dast-zap:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Start application stack
      run: docker compose up -d && sleep 15

    # On PRs: baseline scan (passive, fast, ~2 min)
    - name: ZAP Baseline Scan
      if: github.event_name == 'pull_request'
      uses: zaproxy/action-baseline@v0.13.0
      with:
        target: 'http://localhost:8080'
        rules_file_name: '.zap/rules.tsv'

    # On main push: full active scan (active attacks, ~15 min)
    - name: ZAP Full Scan
      if: github.ref == 'refs/heads/main'
      uses: zaproxy/action-full-scan@v0.11.0
      with:
        target: 'http://localhost:8080'
        rules_file_name: '.zap/rules.tsv'

ZAP tests for: - SQL/NoSQL injection (sending ' OR 1=1 -- style payloads) - XSS (injecting <script>alert(1)</script> into every form field) - CSRF, clickjacking, missing security headers - Authentication bypass attempts

The .zap/rules.tsv file suppresses known false positives so the signal stays clean.

Job 5: Container Build, Scan, and Push

container-build-scan-push:
  needs: [code-scan, build-and-test, performance-test, dast-zap]
  runs-on: ubuntu-latest
  steps:
    - name: Build client image
      run: |
        docker build -t pkhamdee/slotmachine:client-${{ github.sha }} \
                     -t pkhamdee/slotmachine:client \
                     ./client

    - name: Build server image
      run: |
        docker build -t pkhamdee/slotmachine:server-${{ github.sha }} \
                     -t pkhamdee/slotmachine:server \
                     ./server

    - name: Trivy scan — client image
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: pkhamdee/slotmachine:client-${{ github.sha }}
        severity: HIGH,CRITICAL
        exit-code: 1          # Fail the pipeline on HIGH or CRITICAL CVEs

    - name: Trivy scan — server image
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: pkhamdee/slotmachine:server-${{ github.sha }}
        severity: HIGH,CRITICAL
        exit-code: 1

    - name: Push to Docker Hub       # Only on main branch
      if: github.ref == 'refs/heads/main'
      run: |
        docker push pkhamdee/slotmachine:client-${{ github.sha }}
        docker push pkhamdee/slotmachine:client
        docker push pkhamdee/slotmachine:server-${{ github.sha }}
        docker push pkhamdee/slotmachine:server

Two image tags are pushed for every release: - :client-abc1234 — the immutable, specific tag (Git SHA) used by Kustomize in deployment manifests - :client — the mutable latest-style tag for convenience

Trivy scans the container's OS packages and language dependencies against CVE databases. A HIGH or CRITICAL finding fails the build — the image never reaches Docker Hub.

Job 6: Update the Deployment Repository

This is where CI hands off to CD. The last job: 1. Checks out slotmachine-deployment 2. Updates the image tags to the new SHA using Kustomize 3. Commits and pushes to the development branch

update-gitops:
  needs: [container-build-scan-push]
  if: github.ref == 'refs/heads/main'
  runs-on: ubuntu-latest
  steps:
    - name: Checkout deployment repo
      uses: actions/checkout@v4
      with:
        repository: pkhamdee/slotmachine-deployment
        token: ${{ secrets.DEPLOYMENT_REPO_TOKEN }}
        ref: development

    - name: Update image tags
      run: |
        cd overlays/development
        kustomize edit set image \
          slotmachine-client=pkhamdee/slotmachine:client-${{ github.sha }}
        kustomize edit set image \
          slotmachine-server=pkhamdee/slotmachine:server-${{ github.sha }}

    - name: Commit and push
      run: |
        git config user.name  "github-actions[bot]"
        git config user.email "github-actions[bot]@users.noreply.github.com"
        git add .
        git commit -m "ci: update images to ${{ github.sha }}"
        git push origin development

After this push, FluxCD takes over. CI's job is done.


Part 4: The Deployment Repository — Kustomize Base and Overlays

The slotmachine-deployment repository contains all Kubernetes manifests. It uses Kustomize — the Kubernetes-native configuration tool that lets you write base manifests once and patch them per environment.

Repository Layout

slotmachine-deployment/
├── base/                          ← Single source of truth for Kubernetes objects
│   ├── kustomization.yaml
│   ├── namespace.yaml
│   ├── client/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── kustomization.yaml
│   ├── server/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── configmap.yaml
│   │   └── kustomization.yaml
│   ├── mongodb/
│   │   ├── statefulset.yaml       ← StatefulSet for stable Pod identity
│   │   ├── service.yaml
│   │   └── kustomization.yaml
│   └── redis/
│       ├── deployment.yaml
│       └── kustomization.yaml
└── overlays/
    ├── development/               ← Dev-specific patches
    │   ├── kustomization.yaml
    │   └── namespace.yaml
    └── production/                ← Prod-specific patches
        ├── kustomization.yaml
        └── namespace.yaml

The Base — Write Once, Run Everywhere

The base defines the structure with sensible defaults. Here's the server deployment base:

# base/server/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: server
spec:
  replicas: 1                     # Overridden by overlays
  selector:
    matchLabels:
      app: slotmachine-server
  template:
    spec:
      initContainers:
        - name: wait-for-mongodb  # Don't start until DB is ready
          image: busybox
          command: ['sh', '-c',
            'until nc -z mongodb 27017; do sleep 2; done']
        - name: wait-for-redis
          image: busybox
          command: ['sh', '-c',
            'until nc -z redis 6379; do sleep 2; done']
      containers:
        - name: server
          image: slotmachine-server:latest   # Replaced by Kustomize
          ports:
            - containerPort: 3001
          envFrom:
            - configMapRef:
                name: server-config
            - secretRef:
                name: server-secret
          resources:
            requests:
              cpu: "200m"
              memory: "256Mi"
          readinessProbe:
            tcpSocket:
              port: 3001
          topologySpreadConstraints:
            - maxSkew: 1
              topologyKey: kubernetes.io/hostname
              whenUnsatisfiable: ScheduleAnyway
              labelSelector:
                matchLabels:
                  app: slotmachine-server

The initContainers block is a critical detail: without it, the server Pod starts before MongoDB is ready, crashes, and Kubernetes restarts it in a loop. Init containers enforce dependency ordering cleanly.

The Overlays — Environment-Specific Patches

The production overlay patches the base without duplicating it:

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base

namespace: production

images:
  - name: slotmachine-client
    newName: pkhamdee/slotmachine
    newTag: client-abc1234def5    # ← auto-updated by CI
  - name: slotmachine-server
    newName: pkhamdee/slotmachine
    newTag: server-abc1234def5    # ← auto-updated by CI

patches:
  - target: { kind: Deployment, name: client }
    patch: |
      - op: replace
        path: /spec/replicas
        value: 4

  - target: { kind: Deployment, name: server }
    patch: |
      - op: replace
        path: /spec/replicas
        value: 6

  - target: { kind: StatefulSet, name: mongodb }
    patch: |
      - op: replace
        path: /spec/volumeClaimTemplates/0/spec/resources/requests/storage
        value: 20Gi

  - target: { kind: Service, name: client }
    patch: |
      - op: add
        path: /metadata/annotations
        value:
          service.beta.kubernetes.io/aws-load-balancer-type: "external"
          service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"

The development overlay is identical in structure but keeps replicas at 1, storage at 5Gi, and uses no NLB annotations.

What Kustomize enables:

                Base manifest
         ┌───────────┴───────────┐
         │                       │
    dev overlay             prod overlay
  replicas: 1              replicas: 4/6
  storage: 5Gi             storage: 20Gi
  namespace: development   namespace: production
  no NLB annotations       NLB + internet-facing

Zero copy-paste. A change to the base manifest (like adding a new environment variable) automatically flows to both environments.


Part 5: FluxCD — The CD Agent That Closes the Loop

FluxCD runs inside each Kubernetes cluster as a set of controllers. It watches a Git repository branch and continuously reconciles the cluster state with whatever is committed there.

How FluxCD Works

FluxCD Poll Loop (every 1 minute by default):
  1. Connect to Git → fetch branch HEAD SHA
  2. Compare with last-applied SHA
  3. If different:
     a. git clone the new state
     b. kustomize build overlays/<env>/
     c. kubectl apply the rendered manifests
     d. Update last-applied SHA
  4. If same: nothing to do. Sleep. Repeat.

This is pull-based CD — the cluster reaches out to Git, rather than having a pipeline push into the cluster. The advantages:

Push-based CD (traditional):
  CI runner → kubectl apply → cluster
  Problems:
  - CI runner needs cluster credentials
  - Cluster access from the internet
  - Hard to audit who changed what

Pull-based CD (GitOps with FluxCD):
  FluxCD (inside cluster) → polls Git → applies locally
  Advantages:
  - No external access to cluster needed
  - Cluster credentials never leave the cluster
  - Full audit trail in Git history
  - Self-healing: manual kubectl edits are reverted

FluxCD Configuration for Each Environment

# flux-system/sources/slotmachine-deployment.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: slotmachine-deployment
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/pkhamdee/slotmachine-deployment
  ref:
    branch: development    # ← development cluster watches 'development' branch
                           #   QA cluster watches 'qa' branch
                           #   Production cluster watches 'main' branch
  secretRef:
    name: flux-git-credentials
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: slotmachine
  namespace: flux-system
spec:
  interval: 5m
  sourceRef:
    kind: GitRepository
    name: slotmachine-deployment
  path: ./overlays/development   # ← path matches the cluster's environment
  prune: true                    # Delete resources removed from Git
  wait: true                     # Wait for rollout to complete
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: client
      namespace: development
    - apiVersion: apps/v1
      kind: Deployment
      name: server
      namespace: development

The prune: true setting is important: if a manifest is deleted from Git, FluxCD deletes the corresponding Kubernetes resource. The cluster state is always what Git says it should be — nothing more, nothing less.

Self-Healing in Practice

Scenario: An engineer manually scales the server Deployment to 10 replicas
          with: kubectl scale deployment/server --replicas=10

FluxCD behavior (within 5 minutes):
  1. Detects drift: cluster has 10 replicas, Git says 6
  2. Re-applies the manifest from Git
  3. Cluster returns to 6 replicas
  4. Logs: "Kustomization/slotmachine: detected drift, reconciling"

Result: Manual changes to production are impossible to accidentally
        leave in place. Everything flows through Git.

Part 6: Multi-Environment Promotion — The Human Gates

The Git branch structure maps directly to environments:

Git branches in slotmachine-deployment:

  development  ──────────────────────────► K8s DEV  (Nutanix On-Premise)
      │  Pull Request
      │  Reviewed by: DevOps/Platform Engineer
     qa  ─────────────────────────────────► K8s QA   (Nutanix On-Premise)
      │  Pull Request
      │  Reviewed by: QA Sign-Off process + Software Delivery Manager
    main  ────────────────────────────────► K8s PROD (AWS EKS)

How a Release Flows Through the Pipeline

Step 1 — Developer pushes to main in the app repo:

git push origin main
# → GitHub Actions CI starts (6 jobs, ~20 min)
# → On success: image tags written to development branch of deployment repo
# → FluxCD on DEV cluster detects change, deploys automatically

Step 2 — DevOps/Platform Engineer promotes to QA:

# In slotmachine-deployment repo:
# Create PR: development → qa
gh pr create \
  --base qa \
  --head development \
  --title "Release: app commit abc1234" \
  --body "Promoting $(git log development -1 --format='%s') to QA"
# DevOps Engineer reviews and approves
# PR merged → FluxCD on QA cluster detects change, deploys

Step 3 — QA Sign-Off and Production Release:

QA team runs acceptance tests against QA cluster
QA Sign-Off: test results recorded, sign-off documented
Software Delivery Manager approves
PR merged: qa → main
FluxCD on Production cluster detects main branch change
kustomize build overlays/production/
kubectl apply → rolling update on AWS EKS
  (4 client pods × rolling update = zero downtime)
  (6 server pods × rolling update = zero downtime)

Why PRs for Promotion (Not Another Pipeline)

Some teams automate promotion with scripts that merge branches. This architecture uses PRs deliberately:

  • Visibility: Every promotion is a visible event in GitHub — searchable, commentable, linked to issues
  • Required reviews: Branch protection rules enforce that a human approves before merging
  • Rollback: Rolling back production is a one-command git revert + merge, not an incident procedure
  • Audit: Every promotion has an author, timestamp, reviewer, and message — satisfying compliance requirements

Part 7: Infrastructure — Nutanix On-Premise and AWS EKS

The three Kubernetes clusters run on two different infrastructure platforms, managed by a single control plane.

NKP Management Cluster (Nutanix Kubernetes Platform)

The NKP Management Cluster runs on Nutanix on-premise hardware and manages both the DEV and QA clusters:

NKP Management Cluster (Nutanix On-Premise)
  ├── SSO / IdP Integration
  │     → Single sign-on for cluster access
  │     → RBAC mapped to Active Directory groups
  ├── Governance
  │     → Policy enforcement across managed clusters
  │     → Resource quotas, network policies, OPA Gatekeeper
  ├── Platform Services
  │     → Centralized monitoring (Prometheus + Grafana)
  │     → Centralized logging (Loki / Elasticsearch)
  │     → Certificate management (cert-manager)
  └── Application Services
        → Shared ingress controllers
        → Shared storage classes
        → Backup policies

Manages:
  ├── Kubernetes DEV  (Nutanix On-Premise)
  └── Kubernetes QA   (Nutanix On-Premise)

NKP provides a Cluster API-based management layer — the DEV and QA clusters are defined declaratively and the management cluster ensures they match specification. Fleet-wide policy enforcement means no cluster can be misconfigured without the management cluster detecting and alerting.

Production on AWS EKS

Production runs on AWS EKS in ap-southeast-7:

# Production cluster specifications
Nodes:      4× m6i.xlarge (4 vCPU, 16 GB RAM)
Zones:      ap-southeast-7a, ap-southeast-7b, ap-southeast-7c
CNI:        Cilium (SNAT mode)
LB:         AWS Network Load Balancer (cross-zone enabled)
Scheduling: topologySpreadConstraints (maxSkew: 1)

The topologySpreadConstraints in the base manifest ensure pods distribute evenly across nodes and availability zones. With 6 server pods across 4 nodes, no single node failure takes down more than 2 server pods.

# From base/server/deployment.yaml
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: slotmachine-server

Why On-Premise for Dev/QA, Cloud for Production?

Cost optimization:
  DEV/QA:  On-premise Nutanix hardware (already owned)
           → near-zero variable cost for non-production workloads
           → spin up/down as needed

  PROD:    AWS EKS
           → global availability
           → AWS NLB for internet-facing traffic
           → m6i.xlarge (cost-effective for the workload size)
           → auto-scaling when traffic spikes

Part 8: Security — Defense in Depth Across Every Stage

The pipeline builds in security at every phase rather than bolting it on at the end:

Stage         │ Tool           │ What It Catches
──────────────┼────────────────┼──────────────────────────────────────
Source code   │ CodeQL         │ SAST: injection, XSS, data flow vulns
Running app   │ OWASP ZAP      │ DAST: active attack simulation
Container OS  │ Trivy          │ CVEs in OS packages + npm dependencies
Runtime       │ K8s policies   │ Privilege escalation, host path mounts
Network       │ Cilium         │ Network policy enforcement between pods
Secrets       │ K8s Secrets    │ ADMIN_PASSWORD never in plaintext YAML
Headers       │ Express config │ CSP, X-Frame-Options, Permissions-Policy

The Security Headers in Code

// server/src/server.js — security headers applied at Express level
app.use((req, res, next) => {
  res.removeHeader('X-Powered-By');                    // Don't reveal Express
  res.setHeader('X-Frame-Options', 'DENY');            // Clickjacking
  res.setHeader('Content-Security-Policy',
    "default-src 'self'; script-src 'self'");          // XSS mitigation
  res.setHeader('Permissions-Policy',
    'camera=(), microphone=(), geolocation=()');       // Feature restriction
  next();
});

Container Hardening via Multi-Stage Builds

# client/Dockerfile — two-stage: build then serve
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build             # Produces /app/dist

FROM nginx:alpine             # Final image has NO Node.js, NO source code
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
# Result: ~25MB image with zero Node.js attack surface

The final client image contains only Nginx and the built static files — no Node.js runtime, no npm, no source code.


Part 9: Local Development — From Zero to Running in One Command

# docker-compose.yml
services:
  mongodb:
    image: mongo:7
    ports: ["27017:27017"]
    volumes:
      - mongo_data:/data/db

  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

  server:
    build: ./server
    ports: ["3001:3001"]
    environment:
      MONGO_URI: mongodb://mongodb:27017/slotmachine
      REDIS_URL: redis://redis:6379
      ADMIN_PASSWORD: localdev
    depends_on: [mongodb, redis]

  client:
    build: ./client
    ports: ["8080:80"]
    depends_on: [server]

volumes:
  mongo_data:
# Start the full stack locally
docker compose up --build

# Run the full test suite
npm ci && npm test --workspaces

# Run load test against local stack
locust -f tests/locustfile.py --headless -u 5 -r 1 -t 30s \
  --host http://localhost:8080

The local environment mirrors production topology exactly: the same Nginx config, the same Redis adapter, the same MongoDB schema. What works locally, works in the cluster.


Summary

This pipeline embodies every principle of modern DevOps practice in a real, working system.

The separation of concerns between slotmachine (application code) and slotmachine-deployment (Kubernetes manifests) is the architectural foundation everything else builds on. It enables independent evolution of the app and its infrastructure, clean audit trails, and role-based access — developers push to the app repo; the platform team owns the deployment repo.

The six-job CI pipeline enforces quality and security at every stage before a single byte reaches a container registry. CodeQL finds vulnerabilities in source code. Vitest and node:test verify correctness with 118 tests. Locust stress-tests the running server. OWASP ZAP attacks it from the outside. Trivy scans the container image for CVEs. Only after all six gates pass does the image push to Docker Hub and the deployment repo receive its new image tag.

FluxCD closes the loop by running inside each cluster and continuously reconciling cluster state against the Git branch it watches. No pipeline needs credentials to reach the cluster. No manual kubectl apply is needed. The cluster is self-healing — any drift from the desired state in Git is automatically corrected within minutes.

The three-branch promotion model (development → qa → main) maps directly to three Kubernetes clusters (Dev on Nutanix, QA on Nutanix, Production on AWS EKS), with human approval gates exactly where they provide value — not automated away, but not blocking automated delivery either.

The NKP management cluster provides fleet governance across the on-premise clusters: SSO, centralized policy enforcement, observability, and platform services — all without each cluster needing to reinvent its own security and monitoring stack.

The result is a system where a developer can push a feature, see it live in Dev within minutes, promoted to QA with a PR, and shipped to production on AWS after sign-off — with a complete audit trail, zero manual steps in the critical path, and security baked in at every layer.


The full source code is on GitHub: slotmachine app · slotmachine-deployment. Questions or discussion? Connect on LinkedIn, X or reach out via email.

Discussion

Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.