Adding a Claude PR Review Step to Jenkins as a Shared-Library Function

I’ve been bringing up a new TypeScript service over the last month, and I wanted PRs to it to get an automated code review from Claude before I open them in the morning. The mechanics are small: fetch the PR diff, send it to the Anthropic Messages API with a short prompt, post the response back as a Gitea comment. I wanted it as a reusable step in JeakylJenkins’ shared library though, not as 200 lines of bash glued into one project’s Jenkinsfile. The day I add it to a second project, I want to add three lines, not three hundred.

This post is the friction list from writing that step. There’s an Ollama-backed sibling, aiReview, that runs the same pattern against my self-hosted LLM box, and the design of the two is identical; the only difference is which HTTP API gets the prompt. I’ll keep the focus on the Anthropic version, because that’s the one I’ve come to rely on.

The shape

sequenceDiagram
  participant Dev as Author
  participant Gitea as gitea.jeakyl.com
  participant Jenkins as ci.jeakyl.com
  participant Anthropic as api.anthropic.com

  Dev->>Gitea: Push branch + open PR
  Gitea->>Jenkins: Webhook (PR opened)
  Jenkins->>Jenkins: Resolve multibranch, set CHANGE_ID
  Jenkins->>Gitea: GET /repos/.../pulls/{n}.diff
  Gitea-->>Jenkins: unified diff (text)
  Jenkins->>Jenkins: head -c 120000 (trim budget)
  Jenkins->>Anthropic: POST /v1/messages (prompt + diff)
  Anthropic-->>Jenkins: completion (markdown bullets)
  Jenkins->>Gitea: POST /repos/.../issues/{n}/comments
  Gitea-->>Dev: PR notification (review posted)

Four hops. Gitea fires a webhook on PR open; the multibranch project picks it up and sets CHANGE_ID; the pipeline reaches the review stage; the step shells out to curl to fetch the diff, calls Anthropic, and POSTs the response back as a PR comment. About 90% of the work is making sure curl and jq exist on whatever agent landed the build, the credentials are bound correctly, and the byte budget is sensible. The rest is a single jq -Rs invocation to build the request JSON.

A project Jenkinsfile that opts into the step looks like this:

@Library('jeakyl') _
pipeline {
  agent { docker { image 'node:24.15.0'; label 'docker-agent' } }
  stages {
    stage('Claude PR review') {
      when { changeRequest() }
      steps {
        catchError(buildResult: 'SUCCESS', stageResult: 'UNSTABLE') {
          claudeReview(
            model: 'claude-haiku-4-5-20251001',
            anthropicCredId: 'anthropic-api-key',
          )
        }
      }
    }
  }
}

Three things to notice. The step is wrapped in catchError with buildResult: 'SUCCESS', so a hiccup in Anthropic or the Gitea API can’t fail the build; review is advisory. The model is pinned to a dated Haiku build, not a -latest tag. And there’s no diff to wire up, no comment template, no body construction here – that’s all inside the step. The whole opt-in is six meaningful lines.

The pipeline{} block trap

My first cut wrote the step like this:

def call(Map cfg = [:]) {
  pipeline {
    agent any
    stages {
      stage('claudeReview') {
        steps {
          // ... fetch diff, call Anthropic, post comment ...
        }
      }
    }
  }
}

Which felt natural; it’s a self-contained chunk of work, it has its own agent needs, why not declare it as a pipeline. The first PR build that triggered it died with:

Only one pipeline { ... } block can be executed in a single run.

Jenkins’s declarative syntax allows exactly one pipeline{} block per build. A vars step that wraps its body in pipeline{} can only be invoked as the entire Jenkinsfile, not as a step inside another pipeline. There’s no way to nest them; the parser counts the blocks before the runner ever sees them.

The fix is to write the step as a composable function: a def call(Map cfg) that runs sh directly without declaring its own pipeline shape. The caller’s pipeline{} provides the agent, the timeout, the credential context; the step just executes shell. The blast radius is smaller too, because the step can’t accidentally re-specify the agent and end up running on a different machine to the surrounding stages.

If you’ve written shared-library steps before, this is obvious. The Jenkins documentation has the same shape in its examples. I’d just never internalised why; the error message above is what made it click.

The 60kB cutoff and the false-positive review

Now the embarrassing one. The first version of the step trimmed the diff to 60,000 bytes before sending to Anthropic, because I was vaguely nervous about token cost and prompt size:

head -c 60000 pr.diff > pr.diff.trimmed

That worked fine for the first half-dozen small PRs. The third week, I opened a PR that touched seven files and was 60.3kB of diff. The Claude review that came back was wrong in a specific way:

The handler in apps/api/src/handlers/intake.ts is incomplete. The function signature exists but the body is missing closing braces – this file will not compile.

The file was fine. The reviewer was confused because I’d handed it a diff that had been cut mid-token, in the middle of an unrelated file three down in the listing. From Claude’s vantage point, the truncated tail looked like a file that had been opened and never closed. Reasonable inference; wrong cause. Worse, the false-positive was specific and confident enough that I went and stared at the handler for a minute before working out what had happened.

head -c cuts on byte boundaries, which is fine for ASCII-only diffs, but if the boundary lands inside a multi-byte UTF-8 sequence or, more often, simply inside a closing brace at the end of an unrelated file, the model has no signal that the file got truncated externally rather than left incomplete by the author. I bumped the budget to 120kB:

head -c 120000 pr.diff > pr.diff.trimmed

Enough to cover any PR I’d want to ship without breaking up first. You could go further by adding a sentinel to the prompt – something like \n--- diff truncated at <N> bytes; tail content omitted --- – so the model knows the missing-tail case explicitly. I went with the simpler bump, because anything over 120kB is a PR I should be splitting anyway, and the sentinel is more code to keep current with prompt changes.

The general lesson is small but worth carrying: when you feed an LLM a chunk of a file, tell it the chunk is a chunk. If you can’t or won’t, make the chunk big enough that the cliff edge is far from any real content.

Two credentials, two binding shapes

The step needs two secrets: the Anthropic API key for the Messages call, and the Gitea API token for the diff fetch and the comment post. Both live in Jenkins credentials, but they bind differently:

withCredentials([
  usernamePassword(
    credentialsId: 'gitea-token',
    usernameVariable: 'GITEA_USER',
    passwordVariable: 'GITEA_TOKEN',
  ),
  string(credentialsId: anthropicCredId, variable: 'ANTHROPIC_API_KEY'),
]) {
  // ... fetch, call, post ...
}

The Gitea token is stored as a “Username with password” credential because the Gitea Jenkins plugin uses the same credential ID for git-over-HTTPS auth: the token is the password half, and the username (which Gitea ignores for token auth) is whatever the user set it to. So usernamePassword is how you bind it, and you read the token out of GITEA_TOKEN while pointedly ignoring GITEA_USER.

The Anthropic key is “Secret text”. The binding is string; the variable is ANTHROPIC_API_KEY.

Get the binding shape wrong and the failures look different in unhelpful ways. Bind gitea-token as string and Jenkins fails fast at the start of the step, more or less clearly. Bind the Anthropic key as usernamePassword and the failure is the next curl invocation complaining about an undefined variable, which doesn’t immediately point at the binding declaration eight lines up. Easy enough to fix once you know.

Advisory, not blocking

catchError(buildResult: 'SUCCESS', stageResult: 'UNSTABLE') looks like over-engineering until you’ve had an Anthropic-side outage during the office-hours rush and watched every PR build go red on what is, fundamentally, advisory feedback.

The contract I want from the stage is:

The review usually appears within 30 seconds of PR open.
If it doesn’t, the stage goes amber (“unstable”), I see it in the build summary, but the build still passes.
The build going red means actual tests or actual typecheck failed, not that the LLM had a bad day.

catchError with SUCCESS/UNSTABLE gives me exactly that. The stage shows up in the Blue Ocean view in amber when something goes wrong, the build summary stays green, the PR check is green, and the merge button is enabled. A reviewer can still click through and see “huh, the Claude comment is missing, what happened” if they care.

The Ollama version, aiReview, gets the same treatment for different reasons. The local LLM box might be busy generating embeddings for something else and time my prompt out. Same outcome though: the build shouldn’t fail because an assistant got distracted.

What it costs

A typical PR for me is between 8 and 25kB of diff. With Haiku 4.5 and a 1024-token output budget, each review runs about 5 to 12 cents of API spend. The cumulative monthly cost across all the projects calling this step sits well under a coffee. The latency to first comment is between 4 and 18 seconds, depending on PR size and the time of day. Slower than typecheck (around 2 seconds), faster than the test suite (around 90), so on the timeline of “things appearing in the PR check list” the review is not the long pole.

The default model is Haiku because most diffs are small, the prompt is short, and “is there anything obviously bad here” doesn’t need Sonnet’s reasoning. For projects where I want a more careful review, the model is overridable from the calling Jenkinsfile:

claudeReview(
  model: 'claude-sonnet-4-6',
  anthropicCredId: 'anthropic-api-key',
  maxDiffBytes: 200000,
)

I have not yet had a PR where the Haiku review missed something obvious that Sonnet caught. That might say more about the kind of PRs I open than about the models.

What’s left

A short list of things I haven’t done yet, all on the “if it starts mattering” pile:

Threaded comments per-file via the Gitea review API, instead of a single bulk comment on the PR. Today it’s one big comment with bullet points; workable for small PRs, awkward for large ones.
Idempotency. If Jenkins retries the PR build for any reason, the step posts a second comment. I should be hashing the diff and looking for an existing comment with the same hash before posting a new one.
A spend cap with a hard kill switch. A runaway loop that opened 500 PRs in 30 seconds would happily call Anthropic 500 times. I’d want a per-day spend ceiling with a circuit breaker before that becomes a possible outcome.

None of these are blockers for the current usage; they’re notes-to-self for the day they bite.

The full step

The step lives at shared-library/vars/claudeReview.groovy in JeakylJenkins. The file is about 130 lines, most of which is the long comment block explaining the composable-step constraint. The active logic is small enough to summarise in a list:

Return early if CHANGE_ID is unset (it isn’t a PR build).
Derive the repo path from CHANGE_URL.
Make sure curl and jq are installed (apt or apk, depending on the base image).
Stage scratch files under a build-numbered directory inside the workspace.
Bind both credentials with withCredentials.
Fetch the diff via the Gitea API, trim to maxDiffBytes.
Build the request JSON with jq -Rs, POST to Anthropic, extract content[0].text.
POST the text back to Gitea via the issues comment API.
Always clean up the scratch dir in a finally.

The point of having this as a shared-library step rather than 200 lines of bash inline in each project’s Jenkinsfile is that step 6’s byte budget can be raised once and benefit every project that calls it. Which is what I did when I hit the truncation bug. The two projects already using the step picked up the fix the next time their PR built, with no per-project change.

That’s the whole pitch for shared-library steps in one sentence: fix it once, propagate everywhere. Worth the small overhead of getting the step shape right the first time.