I’ve been hand-deploying this blog with jekyll build && aws s3 sync && aws cloudfront create-invalidation since the Terraform stack went up. Three other projects sit alongside it on my Gitea – a Python crypto-trading agent, a Claude Code usage monitor, and the PageIndex fork – all of them with their own test commands, none of them being run on anything except my laptop. That’s fine for a weekend, embarrassing as a permanent state. This weekend I built JeakylJenkins: a self-hosted Jenkins controller, an ephemeral Docker build cloud, and a small Ollama container for AI-review pipelines, all wired into the existing Gitea via the Jenkins Gitea plugin.

The whole thing runs on a second-hand Intel NUC sat next to the Gitea host. No cloud bill, no SaaS CI quotas, no monthly runner-minute creep. The cost was about five hours of evening work and one boot loop that took the better part of an hour to figure out. This post is mostly the design decisions and the parts where the documentation lied to me.

What had to be true

Before I wrote any compose file I wrote down the constraints, because the temptation when standing up a Jenkins box is to add things speculatively and end up with a giant _init.groovy.d/ folder nobody can reason about a year later.

There’s an existing Nginx Proxy Manager on the LAN that already handles TLS via Route53 DNS-01 for *.jeakyl.com. I am not adding a second reverse proxy. NPM and the Jenkins/Ollama containers share a pre-existing proxy-net Docker network; NPM proxies ci.jeakyl.com to jj-jenkins:8080 by container name, not by host LAN IP. Cross-bridge host-IP routing from inside the NPM container is unreliable and gives intermittent 502s; I had learned that the hard way on a previous unrelated project.

Gitea is canonical and stays on its own host. Jenkins integrates with it via the Gitea plugin: webhook in, API token out, a jenkins-bot user with read/write repository, read user, write issue, read organization scopes. Nothing about this is novel; I’m only flagging it because every Jenkins-on-a-NUC tutorial I read seemed to want me to also self-host the Git server, the secrets manager, and an LDAP. I have one Gitea, it works, leave it alone.

The third constraint mattered most: build agents must be ephemeral. Docker-in-Docker is a security and cache-thrashing mess; static long-lived agents accumulate state and rot. The pattern I wanted is what the Jenkins docs call the Docker cloud: the controller asks the Docker daemon to spawn a container with the inbound-agent image, that container runs one job, and Jenkins kills it when the job finishes. To avoid handing the controller the host’s Docker socket, a tecnativa/docker-socket-proxy sits in front of /var/run/docker.sock and exposes only the endpoints needed for spawning agents.

Why JCasC, and the boot failure

Jenkins is famously a configuration nightmare; click-driven setup that is impossible to reproduce. Configuration as Code (JCasC) fixes that by letting you declare the controller’s entire state – security realm, credentials, Gitea servers, clouds, libraries, jobs – as a single YAML file. I committed jenkins/casc/jenkins.yaml and made it the source of truth on day one.

The first boot failed. The container crash-looped, and docker compose logs jenkins showed a JCasC schema error from crumbIssuer.standard having an excludeClientIPFromCrumb field. I’d copy-pasted that field from a tutorial written against lts-jdk17; on lts-jdk21 it had been removed. The fix was a one-character delete, but the path to noticing it was longer than it should have been because Jenkins’ own boot output buries the JCasC error six screens above the eventual BOOT_FAILED line.

What I did then was deliberate: I stripped the YAML back to a minimum-viable baseline that I knew would boot, committed that, and added each capability back as a separate commit, restarting between each. The git log from Saturday evening reads like a checklist:

e08287e Strip JCasC to minimum-viable baseline; defer cloud/library/jobs
8adeda9 Add _smoke pipeline; bump numExecutors to 1 for controller self-test
8aceb43 Drop dead init Groovy script; switch matrix-auth to entries: schema
5a2b2f6 Re-add globalLibraries (jeakyl shared library) via HTTPS + gitea-token
52e9adc Re-add Docker cloud + add 'make redeploy' one-liner
04151dd Suppress two known JCasC-plugin boot warnings via init Groovy
c85e115 Add _cloud_smoke pipeline to exercise clouds.docker + globalLibraries
d770fcb Re-add seed-job: organizationFolder per Gitea org

Each of those is a thing I can roll back to in seconds if the next one breaks. If you’re starting a JCasC config from scratch and you find yourself with a 400-line YAML that won’t boot, this is the move; everything else I tried (binary-searching the file, commenting out blocks) cost me more than just rewinding to a known-good baseline.

The smoke pipelines

Two small pipelines live in JCasC itself, declared via job-dsl. They don’t live in a project repo because they aren’t about a project; they’re about whether the controller can do its job at all.

_smoke runs on the controller’s built-in executor and curls three things: the docker-socket-proxy, the Gitea API (with the jenkins-bot token), and Ollama. If any of those fail, the controller’s plumbing is broken and there is no point looking at project builds yet.

_cloud_smoke is the one that actually tells you the Docker cloud is wired up. It declares agent { label 'docker-agent' }, which forces the controller to spawn an ephemeral jenkins/inbound-agent:latest-jdk21 container via the socket proxy, and it loads the shared library at the top with @Library('jeakyl') _. Reaching the first stage proves both: the cloud spawned an agent, and the library resolved over HTTPS via the Gitea token credential. If _cloud_smoke is green, you can wire up a project’s Jenkinsfile and expect it to work.

I ran both of them after every JCasC change in that incremental sequence. It’s the cheapest possible regression suite for a Jenkins controller; ten seconds of curl beats five minutes of clicking around the UI to find the broken bit.

The Docker cloud, and the nested-docker problem

The cloud config is small:

clouds:
  - docker:
      name: "docker"
      dockerApi:
        dockerHost:
          uri: "tcp://socket-proxy:2375"
      templates:
        - labelString: "docker-agent"
          dockerTemplateBase:
            image: "jenkins/inbound-agent:latest-jdk21"
            network: "jeakyljenkins_ci"
          connector:
            attach:
              user: "root"
          instanceCapStr: "4"
          remoteFs: "/home/jenkins/agent"

The network: "jeakyljenkins_ci" line matters. By default, clouds.docker will spawn agents on Docker’s default bridge, where they cannot resolve jj-jenkins or jj-ollama by name. Pinning the agent to the same ci network the controller and Ollama sit on means JNLP back to jj-jenkins:50000 works, and pipeline steps that hit http://ollama:11434 also work without needing the public NPM hostname.

I did try the more elegant pattern first: leave the cloud agent generic, and wrap individual stages in agent { docker { image 'ruby:3.3-slim' } } so each pipeline picks the toolchain it needs. That requires the cloud agent to itself have a Docker CLI and a route to a daemon. Setting that up turned into a small russian-doll problem: the inbound-agent container would need its own access to the socket proxy, with its own ACL, and the credentials had to be threaded through. After about twenty minutes I gave up and changed jekyll-aws.Jenkinsfile to apt-install ruby and awscli into the running inbound-agent at the start of the build instead. It adds roughly thirty seconds of setup overhead per build, and means I do not need a custom agent image. For a personal blog with a handful of builds a day that is plainly the right trade-off.

The shared library

Four projects, four pipelines, four places a bug can live. The shared library at shared-library/vars/*.groovy is loaded into the controller via unclassified.globalLibraries and called from project Jenkinsfiles with @Library('jeakyl') _. The point is that pipeline behaviour change happens in one place.

pythonUv.groovy runs uv sync && ruff && mypy && pytest for the Claude Code usage monitor and PageIndex. pythonVenv.groovy does the equivalent with a plain .venv for CryptoTrader, and – this is the bit I care about – it fails the build outright if TRADER_MODE is unset or set to anything other than BACKTEST or SIMULATION. CryptoTrader’s own CLAUDE.md already enforces this at the application layer; doing it again in CI is belt-and-braces, because a project drift could in principle let a real-trade code path run during a pytest run, and that’s the sort of mistake I do not want to make from a Jenkins agent. jekyllAws.groovy does bundle exec jekyll build, syncs to S3, invalidates CloudFront, and only deploys when the branch is main and the build is not a pull-request build (env.CHANGE_ID == null). aiReview.groovy posts the PR diff to Ollama and writes the response back as a Gitea comment; that one is more of a toy than the others, but it’s a useful excuse to have Ollama on the box.

Project Jenkinsfiles end up being four lines of declarative wrapper around the right shared-library function, which is exactly what you want.

Bind mounts, not Compose volumes

I started with named Compose volumes for Jenkins state. About an hour in I switched to host bind mounts at /opt/jenkins and /opt/ollama. The reason is make redeploy:

redeploy:
	git pull --rebase
	$(COMPOSE) down
	# Wipe Jenkins state so the rebuilt image's /usr/share/jenkins/ref/casc/* is re-copied
	# into JENKINS_HOME on first boot.
	find /opt/jenkins -mindepth 1 -delete
	$(COMPOSE) build jenkins
	$(COMPOSE) up -d

When you change casc/jenkins.yaml, the JCasC reload-from-UI option works most of the time, but it cannot recover from boot-time schema errors and it sometimes silently leaves stale derived state behind (locked-in plugin defaults, cached credentials, stale init.groovy.d outputs). For development on a config that is actively being rewritten, the cleanest answer is to wipe JENKINS_HOME and let the rebuilt image re-seed it from /usr/share/jenkins/ref/casc/. That means I want a path I can find -delete quickly without fighting Docker’s volume permissions, and host bind mounts make that trivial. Ollama’s model weights stay in /opt/ollama and survive a Jenkins wipe.

A named volume would have made the find -delete step into an awkward docker volume rm plus recreate, and the next compose up would have race-conditioned with the JCasC reseed. The bind mount is uglier and exactly right.

Two warnings that cannot be fixed, only suppressed

After the controller was finally booting cleanly, the log still had two warnings on every start. The first was JCasC’s BaseConfigurator complaining that it can’t classify jenkins.plugins.git.GitSCMSource#owner because the field is abstract but not Describable. The second was AdminWhitelistRule saying it no longer has any effect, with a stack trace appended for context.

Both are emitted by JCasC plugin internals, not by anything in my YAML. The first fires twice every boot regardless of whether you use the git plugin in your config (and you probably do, because most non-trivial Jenkins setups use it transitively). The second is informational; the underlying setting was demoted to a no-op in Jenkins core, but the plugin still calls it and logs the call. Neither has any functional effect.

I lived with them for a few hours and then wrote a four-line init.groovy.d script:

import java.util.logging.Level
import java.util.logging.Logger

Logger.getLogger("io.jenkins.plugins.casc.BaseConfigurator").setLevel(Level.SEVERE)
Logger.getLogger("jenkins.security.s2m.AdminWhitelistRule").setLevel(Level.SEVERE)

println "[init.groovy.d] suppressed BaseConfigurator + AdminWhitelistRule WARNINGs (known JCasC noise)"

I’m narrowing the suppression to those two specific loggers. The parent io.jenkins.plugins.casc logger still surfaces real UnknownAttributesException schema errors at SEVERE, which is what I actually want to see. There’s a temptation to silence everything; resist it. A boot log full of warnings teaches you to ignore the boot log, and the next time the schema breaks you’ll spend another hour finding it.

Secrets: env file, or compose secret, or both

Most secrets live in .env: the admin password, the Gitea API token, the AWS access keys for the JeakylBlog deploy IAM user. JCasC reads these via ${VAR} expansion at boot, and each credentials: entry references a secret by ID rather than inlining anything.

The exception is the jenkins-bot SSH private key. PEM-shaped multi-line values in a .env file are a foot-gun – Compose’s env parser is fragile around multi-line quoted strings, and it’s the kind of thing that works once and then mysteriously breaks on a re-up because someone (you, in three months) added a stray space. So the SSH key lives in ./secrets/jenkins-bot-ssh-key, mounted into the container as a Compose secret at /run/secrets/jenkins-bot-ssh-key, and JCasC reads it with ${readFile:/run/secrets/jenkins-bot-ssh-key}.

Same idea, two transports, picked by shape. Anything that fits cleanly on one line goes in .env; anything that doesn’t goes in ./secrets/.

Watchtower opt-out

The NUC already runs Watchtower against everything else on the host, automatically pulling and restarting containers when a new image is published. That is fine for stateless apps. It is emphatically not fine for a Jenkins controller mid-build, an Ollama container with a 4GB model loaded into memory, or a socket proxy that can be swapped out from under a running agent.

Each of the three services has com.centurylinklabs.watchtower.enable: "false" set as a label. Watchtower respects the opt-out, so it leaves the JeakylJenkins stack alone and continues managing everything else on the host. Upgrades happen when I run them, with a known-good plugin set in plugins.txt and a new image build I can roll back from.

Webhooks, four assumptions wrong in a row

A week later I sat down to add JeakylBlog’s Jenkinsfile, expecting it to be five minutes of cp and git push. The seed job had already created the multibranch pipeline, and the Jenkins Gitea plugin is supposed to manage repo webhooks for you when manageHooks: true is set on the server config. I’d read that line in the plugin docs and taken it at face value. The Jenkinsfile landed in the repo, I pushed main, I expected a build to fire. None did.

The first wrong assumption was about what permission jenkins-bot needed. I had given the bot Write on each project repo, which is enough to clone, push, and post commit statuses. It is not enough to create webhooks; Gitea requires repo Admin for that, and the plugin’s attempt to create the webhook on each scan was getting a 403 from Gitea’s hook API and logging a single-line warning. Sixteen of those warnings were already in the boot log; I’d skimmed past them on the assumption that the plugin would surface anything important at SEVERE. Bumping jenkins-bot from Write to Administrator on each of the four repos took two minutes via the Gitea UI. For the JeaKylConsulting org I added the bot to a ci-bot team with Administrator permission and let it inherit across every org repo.

The second wrong assumption was that fixing the permission would automatically fix the missing webhook. The Gitea plugin caches its prior failure in JVM memory and won’t retry within the same controller process; the boot warning was the one and only attempt. I created the webhook directly against Gitea’s API to unblock the immediate case, on the bet that the plugin would adopt the existing hook on its next post-restart scan rather than try to recreate it. A docker compose restart jenkins clears the cache so future repos go through the plugin’s normal path.

The third wrong assumption was that https://ci.jeakyl.com/gitea-webhook/post would work as the webhook URL because that hostname works for everything else. Gitea fired the webhook and got back context deadline exceeded (Client.Timeout exceeded while awaiting headers). From inside the Gitea container, ci.jeakyl.com resolves via PiHole to NPM’s host LAN IP, and connecting to that IP from a container hits Docker’s hairpin-NAT wart. The cleanest fix is to short-circuit DNS for that one hostname inside the Gitea container so the connection goes container-to-container on proxy-net instead of bouncing out via the host:

extra_hosts:
  - "ci.jeakyl.com:172.18.0.2"

172.18.0.2 is NPM’s IP on proxy-net. Pinning a numeric IP in someone else’s compose file is fragile; if NPM is recreated and lands on a different IP, the line silently breaks. The right place to anchor the contract is on NPM’s end, by giving it a static IP on proxy-net:

networks:
  proxy-net:
    ipv4_address: 172.18.0.2

That makes the extra_hosts entry stable across NPM container recreates, and means anything else that ever needs to reach NPM by IP gets the same guarantee.

The fourth wrong assumption was that pointing the webhook at ci.jeakyl.com was now sufficient. Gitea has a webhook.ALLOWED_HOST_LIST setting that defaults to external, and the check runs after DNS resolution. So even though the URL was https://ci.jeakyl.com/..., the resolved IP was 172.18.0.2, which is RFC 1918, which is not “external”, which is denied. The error message names the resolved IP rather than the URL, which made it obvious in retrospect:

deny 'ci.jeakyl.com(172.18.0.2:443)'

Adding the env override GITEA__webhook__ALLOWED_HOST_LIST=*.jeakyl.com to Gitea’s compose lets the plugin call any of my own subdomains regardless of resolved IP. Glob patterns work; I’d keep that list tight to your own hostnames rather than opening it to all private addresses, since it’s effectively the SSRF allow-list for any user with admin on a Gitea repo.

After the four pieces lined up, the chain became boring. A push to JeakylBlog main reaches Gitea, fires the webhook, hits NPM via proxy-net, lands at the controller, queues a multibranch build, and an S3 deploy goes out about a minute later. None of the four issues are documented in the same place; each was its own twenty-minute detour. Worth writing down so the next time I do this on a different host it takes one evening instead of three.

What’s next

JeakylBlog is the first project actually wired and deploying through this controller now; pushing to main rebuilds and uploads to S3 in about seventy seconds, with the CloudFront invalidation clearing within a few minutes. The other three – the Claude Code usage monitor, PageIndex, and CryptoTrader – still need their Jenkinsfiles, and now that I’ve burned through the webhook plumbing once, I expect those to be four-line wrappers each.

I’m also still planning to find out whether the aiReview step is actually useful or whether it just produces plausible-sounding LLM nonsense in PR comments. I have a suspicion which it’ll be.

The bigger thing, though, is that I now have a CI host I trust. I change something in the JCasC config, run make redeploy, watch _smoke and _cloud_smoke go green, and that’s the entire feedback loop. No clicking, no ten-screen plugin manager, no manual credential re-entry. That’s worth the five hours, and the three follow-up evenings.