Why Most Teams Get CI/CD Wrong

The first time most teams set up a CI/CD pipeline, they copy a YAML file from Stack Overflow, add a few steps, and call it done. The pipeline "works" until it doesn't. Tests that only run on push. Deploys that skip staging. Secrets hardcoded in environment variables.

The problem isn't laziness. It's that CI/CD feels like plumbing, invisible until it breaks.

This post is the guide we wish we had when we started: a full walkthrough of a production-grade pipeline from an empty repository to a system you can actually trust.

What We're Building

By the end of this post, you'll have a pipeline that:

  • Runs unit and integration tests on every pull request
  • Enforces code quality with linting and type checks
  • Builds and tags a Docker image on merge to main
  • Deploys automatically to a staging environment
  • Requires a manual approval step before production
  • Sends a Slack notification on deploy success or failure

We'll use GitHub Actions for orchestration, but the concepts apply to any CI system.

Step 1: The Repository Structure

Before touching YAML, structure your repository so the pipeline has something sensible to work with.

my-app/
├── src/
├── tests/
│   ├── unit/
│   └── integration/
├── Dockerfile
├── docker-compose.test.yml
└── .github/
    └── workflows/
        ├── ci.yml
        └── deploy.yml

Split CI and deploy into separate workflow files. CI runs on every push and pull request. Deploy only runs on merge to main. This keeps concerns separated and makes debugging far easier.

Step 2: The CI Workflow

Here's a minimal but solid CI workflow:

name: CI

on:
  push:
    branches: ["**"]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: "npm"

      - name: Install dependencies
        run: npm ci

      - name: Lint
        run: npm run lint

      - name: Type check
        run: npm run typecheck

      - name: Unit tests
        run: npm test -- --coverage

      - name: Integration tests
        run: docker compose -f docker-compose.test.yml up --abort-on-container-exit

A few things worth calling out:

  • npm ci instead of npm install: reproducible installs, no accidental lockfile mutations
  • --abort-on-container-exit: integration tests run in Docker so your database and service dependencies are real, not mocked

Step 3: The Build and Tag Job

When CI passes on main, build and push a tagged Docker image:

build:
  needs: test
  runs-on: ubuntu-latest
  if: github.ref == 'refs/heads/main'
  outputs:
    image_tag: ${{ steps.meta.outputs.version }}
  steps:
    - uses: actions/checkout@v4

    - name: Docker meta
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ghcr.io/${{ github.repository }}
        tags: |
          type=sha,format=short

    - name: Build and push
      uses: docker/build-push-action@v5
      with:
        push: true
        tags: ${{ steps.meta.outputs.tags }}

Using the commit SHA as the image tag (rather than latest) means every deploy is traceable back to an exact commit. This is non-negotiable for production pipelines.

Step 4: Staging Deploy

Automatically deploy to staging after a successful build:

deploy-staging:
  needs: build
  runs-on: ubuntu-latest
  environment: staging
  steps:
    - name: Deploy to staging
      run: |
        curl -X POST "${{ secrets.DEPLOY_WEBHOOK_URL }}" \
          -H "Authorization: Bearer ${{ secrets.DEPLOY_TOKEN }}" \
          -d '{"image": "ghcr.io/${{ github.repository }}:${{ needs.build.outputs.image_tag }}"}'

The environment: staging line does a lot of work here. GitHub Environments let you scope secrets to specific deployment targets and require reviewers before a job runs, which we'll use for production.

Step 5: Production Gate

This is where most pipelines cut corners. Add a manual approval step before production:

deploy-production:
  needs: deploy-staging
  runs-on: ubuntu-latest
  environment:
    name: production
    url: https://app.yourcompany.com

In your GitHub repository settings, configure the production environment to require approval from a specific team. The deploy will pause and wait for a human to review the staging output before proceeding.

Simple. No external tools required.

The Operational Stuff That Actually Matters

Once the pipeline is running, the real work begins:

Keep pipelines fast. If CI takes more than 8 minutes, developers start skipping it mentally. Cache aggressively. Parallelize test suites.

Protect secrets properly. Use repository secrets for CI credentials. Use environment-scoped secrets for deploy tokens. Never pass secrets between jobs via outputs.

Monitor your pipeline health. A pipeline with a 30% flake rate is worse than no pipeline because it teaches teams to ignore failures.

Make rollbacks trivial. Since every image is tagged with a commit SHA, rolling back to a previous version is a one-liner. Document it. Test it.

What's Next

This pipeline covers the fundamentals, but there's plenty of room to grow: infrastructure-as-code validation, security scanning with tools like Trivy, canary deploys, and feature flag integration.

Want to practice building this yourself? We have a CI/CD Fundamentals lab on Stepwik that walks through every step in a live environment.