Why Most Teams Get CI/CD Wrong
The first time most teams set up a CI/CD pipeline, they copy a YAML file from Stack Overflow, add a few steps, and call it done. The pipeline "works" until it doesn't. Tests that only run on push. Deploys that skip staging. Secrets hardcoded in environment variables.
The problem isn't laziness. It's that CI/CD feels like plumbing, invisible until it breaks.
This post is the guide we wish we had when we started: a full walkthrough of a production-grade pipeline from an empty repository to a system you can actually trust.
What We're Building
By the end of this post, you'll have a pipeline that:
- Runs unit and integration tests on every pull request
- Enforces code quality with linting and type checks
- Builds and tags a Docker image on merge to
main - Deploys automatically to a staging environment
- Requires a manual approval step before production
- Sends a Slack notification on deploy success or failure
We'll use GitHub Actions for orchestration, but the concepts apply to any CI system.
Step 1: The Repository Structure
Before touching YAML, structure your repository so the pipeline has something sensible to work with.
my-app/
├── src/
├── tests/
│ ├── unit/
│ └── integration/
├── Dockerfile
├── docker-compose.test.yml
└── .github/
└── workflows/
├── ci.yml
└── deploy.yml
Split CI and deploy into separate workflow files. CI runs on every push and pull request. Deploy only runs on merge to main. This keeps concerns separated and makes debugging far easier.
Step 2: The CI Workflow
Here's a minimal but solid CI workflow:
name: CI
on:
push:
branches: ["**"]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 20
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Lint
run: npm run lint
- name: Type check
run: npm run typecheck
- name: Unit tests
run: npm test -- --coverage
- name: Integration tests
run: docker compose -f docker-compose.test.yml up --abort-on-container-exit
A few things worth calling out:
npm ciinstead ofnpm install: reproducible installs, no accidental lockfile mutations--abort-on-container-exit: integration tests run in Docker so your database and service dependencies are real, not mocked
Step 3: The Build and Tag Job
When CI passes on main, build and push a tagged Docker image:
build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
outputs:
image_tag: ${{ steps.meta.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=sha,format=short
- name: Build and push
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
Using the commit SHA as the image tag (rather than latest) means every deploy is traceable back to an exact commit. This is non-negotiable for production pipelines.
Step 4: Staging Deploy
Automatically deploy to staging after a successful build:
deploy-staging:
needs: build
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy to staging
run: |
curl -X POST "${{ secrets.DEPLOY_WEBHOOK_URL }}" \
-H "Authorization: Bearer ${{ secrets.DEPLOY_TOKEN }}" \
-d '{"image": "ghcr.io/${{ github.repository }}:${{ needs.build.outputs.image_tag }}"}'
The environment: staging line does a lot of work here. GitHub Environments let you scope secrets to specific deployment targets and require reviewers before a job runs, which we'll use for production.
Step 5: Production Gate
This is where most pipelines cut corners. Add a manual approval step before production:
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment:
name: production
url: https://app.yourcompany.com
In your GitHub repository settings, configure the production environment to require approval from a specific team. The deploy will pause and wait for a human to review the staging output before proceeding.
Simple. No external tools required.
The Operational Stuff That Actually Matters
Once the pipeline is running, the real work begins:
Keep pipelines fast. If CI takes more than 8 minutes, developers start skipping it mentally. Cache aggressively. Parallelize test suites.
Protect secrets properly. Use repository secrets for CI credentials. Use environment-scoped secrets for deploy tokens. Never pass secrets between jobs via outputs.
Monitor your pipeline health. A pipeline with a 30% flake rate is worse than no pipeline because it teaches teams to ignore failures.
Make rollbacks trivial. Since every image is tagged with a commit SHA, rolling back to a previous version is a one-liner. Document it. Test it.
What's Next
This pipeline covers the fundamentals, but there's plenty of room to grow: infrastructure-as-code validation, security scanning with tools like Trivy, canary deploys, and feature flag integration.
Want to practice building this yourself? We have a CI/CD Fundamentals lab on Stepwik that walks through every step in a live environment.