Turning GitHub Reviews Into Automatic Code Changes

GitHub's pull request review interface is genuinely good. You can walk through a diff file by file, leave inline comments on specific lines, start threaded discussions, and request changes with a single click. As a review tool, it's hard to beat. But as a task dispatch system, it's useless. Every comment you leave is a message that someone has to read, context-switch into, find the right file, make the change, push, and come back. I wanted to close that gap — leave a review comment and have the code change itself.

So I built a local webhook server that receives GitHub PR reviews, bundles every comment into a task list, hands it to a Claude Code agent running on my machine, and posts the replies back when it's done. The whole thing is about 250 lines of JavaScript across three files.

Why Local, Not a GitHub Action

The obvious approach is a GitHub Action. PR review triggers a workflow, Claude's API processes the comments, pushes a commit. It's clean and serverless. I didn't do it for two reasons.

First, I wanted the agent to have full access to my local environment. My projects have local tooling, custom scripts, database files, and test infrastructure that only exist on my machine. A CI runner doesn't have any of that. When Claude needs to run a build to verify a change compiles, or check a database schema, it can — because it's running in the same directory I work in.

Second, I wanted to stay in the loop. Not every review comment is a straightforward "rename this variable." Some require judgment calls. By running the agent locally through Happy Code, I can monitor the session from my phone, approve file edits, and intervene if Claude goes down the wrong path. I can be sitting in a coffee shop, leave a review on my phone, and watch my home machine action it — approving each step from the app.

The Architecture

The system has four moving parts: a tunnel, a webhook server, a GitHub API layer, and a Claude orchestrator.

A PR review on GitHub fires a webhook through ngrok to a local Express server on port 3777. The server verifies the HMAC signature, filters out bot reviews and self-replies, then fetches the review comments with their full thread context and the PR diff. It checks out the PR branch locally, writes a .review-tasks.md file into the repo, launches a Claude agent in a new terminal, and polls for a response file. When Claude finishes, the server posts replies back to each GitHub comment.

ngrok provides a stable public URL that forwards to localhost. I'm using their free static domain feature, so the webhook URL never changes between restarts. The Express server receives the webhook, verifies the HMAC SHA-256 signature, and hands off to the processing pipeline.

The Webhook Handler

The server listens for pull_request_review events. GitHub fires these whenever someone submits a review — whether it's an approval, a request for changes, or just a set of comments. The handler runs through a series of filters before doing any real work:

app.post("/webhook", async (req, res) => {
	const event = req.headers["x-github-event"];
	const signature = req.headers["x-hub-signature-256"];
	const payload = req.body;

	if (!verifySignature(payload, signature)) {
		return res.status(401).send("Invalid signature");
	}

	const body = JSON.parse(payload.toString());

	if (event !== "pull_request_review") {
		return res.status(200).send("Ignored event: " + event);
	}

	if (body.action !== "submitted") {
		return res.status(200).send("Ignored action: " + body.action);
	}

	const reviewUser = body.review.user?.login || "";
	if (reviewUser.endsWith("[bot]")) {
		return res.status(200).send("Ignored bot review");
	}

	// Respond immediately so GitHub doesn't timeout
	res.status(200).send("Processing");

	// Process asynchronously
	processReview(/* ... */).catch((err) => {
		console.error("Error processing review:", err);
	});
});

The key detail here is responding to GitHub immediately with a 200 before starting the actual work. GitHub's webhook delivery has a 10-second timeout. The review processing — fetching comments, checking out branches, launching Claude — takes much longer than that. If you don't respond quickly, GitHub will retry the delivery, and you'll end up processing the same review multiple times.

Building Context for Claude

When a review comes in, the server fetches three things: the review's inline comments, the full PR diff, and every comment on the PR (for thread history). Each review comment gets enriched with two pieces of context before being passed to Claude.

The first is the diff hunk — the block of code surrounding the line where the comment was left. GitHub includes this in the comment payload as diff_hunk. The second is the conversation thread. If a comment is a reply in an ongoing discussion, Claude needs to see what came before to understand what's being asked.

for (const comment of comments) {
	comment._diffHunk = comment.diff_hunk || null;

	const rootId = comment.in_reply_to_id || comment.id;
	comment._thread = allPRComments
		.filter((c) => c.id === rootId || c.in_reply_to_id === rootId)
		.filter((c) => c.id !== comment.id)
		.sort((a, b) => new Date(a.created_at) - new Date(b.created_at))
		.map((c) => ({ user: c.user.login, body: c.body }));
}

Thread reconstruction works by finding the root comment in each conversation chain and collecting all replies in chronological order. This means if I left a comment three days ago, someone replied, and I'm now leaving a follow-up review — Claude sees the entire history, not just my latest message.

The Task File

Rather than passing a giant prompt through the CLI, the server writes a .review-tasks.md file directly into the repository. This has a few advantages: the file is visible in the working directory so I can inspect it, it doesn't hit shell argument length limits, and Claude can reference it naturally during its session.

The task file includes the PR description, the review body, a checklist of every comment with its code context and thread history, clear instructions for how to handle each type of comment, and the full PR diff in a collapsible details block:

for (const comment of comments) {
	const location = comment.path
		? `\`${comment.path}:${comment.original_line || comment.line || "?"}\``
		: "general";
	content += `- [ ] **Comment #${comment.id}** (${location})\n`;

	if (comment._diffHunk) {
		content += `  \`\`\`diff\n  ${comment._diffHunk.replace(/\n/g, "\n  ")}\n  \`\`\`\n`;
	}

	if (comment._thread && comment._thread.length > 0) {
		content += `  **Thread:**\n`;
		for (const msg of comment._thread) {
			content += `  > **${msg.user}**: ${msg.body.replace(/\n/g, "\n  > ")}\n`;
		}
	}

	content += `  > ${comment.body.replace(/\n/g, "\n  > ")}\n\n`;
}

The instructions tell Claude to make code changes where requested, answer questions, implement suggestions it agrees with, and explain its reasoning if it disagrees. When it's done, it writes a JSON response file mapping each comment ID to a reply message.

Launching the Agent

The server checks out the PR branch locally, then launches a Claude Code session in a new Terminal window via Happy Code:

happy --name "PR #${pr.number} Review" "Read .review-tasks.md and action every task."

Happy Code is a wrapper around Claude Code that enables remote session monitoring. Once the session starts, I can open the Happy app on my phone and see exactly what Claude is doing — which files it's reading, what changes it's proposing, what permissions it's requesting. If I'm at my desk, the terminal window is right there. If I'm out, the app gives me the same level of control.

The server then polls for the response JSON file every three seconds, with a 30-minute timeout:

return new Promise((resolve) => {
	const pollInterval = 3000;
	let lastLog = Date.now();

	const check = setInterval(() => {
		if (fs.existsSync(responsePath)) {
			try {
				const content = fs.readFileSync(responsePath, "utf-8");
				const parsed = JSON.parse(content);
				const hasReplies = parsed.some((r) => r.reply && r.reply.length > 0);
				if (hasReplies) {
					clearInterval(check);
					console.log("Response content:");
					for (const entry of parsed) {
						console.log(`  Comment #${entry.comment_id}: ${(entry.reply || "").slice(0, 200)}`);
					}
					resolve(content);
				}
			} catch {
				// File may be partially written, keep polling
			}
		}
		// Log a heartbeat every 5 minutes
		if (Date.now() - lastLog > 5 * 60 * 1000) {
			console.log(`Still waiting for Claude response...`);
			lastLog = Date.now();
		}
	}, pollInterval);
});

There's no timeout. The server polls indefinitely until Claude writes the response file, logging a heartbeat every five minutes so you know it hasn't hung. The hasReplies check is important — the server writes an empty response template before launching Claude, so the file exists immediately. It only resolves when Claude has actually filled in replies. When it does, the server logs a preview of each reply to the console so you can see what's about to be posted.

This was a lesson learned the hard way. The first version had a 30-minute timeout, which seemed generous until a review with 23 inline comments took Claude over an hour to work through. The server timed out, threw an error, and the replies were never posted — even though Claude had finished the work and written a perfectly good response file. The response was still sitting in /tmp, fully populated, just never picked up. I was able to replay it manually, but the whole point of the tool is that I shouldn't have to. Removing the timeout entirely was the right call — if the agent is still running, the server should still be waiting.

Posting Replies

Once Claude finishes, the server parses the JSON response and posts each reply back to the corresponding GitHub comment:

function replyToComment(owner, repo, prNumber, commentId, body) {
	gh([
		"api",
		`repos/${owner}/${repo}/pulls/${prNumber}/comments/${commentId}/replies`,
		"-f",
		`body=${body}\n\n${MARKER}`,
	]);
}

Every reply includes a hidden HTML marker: . This is invisible in the rendered comment but present in the raw body. Its purpose becomes clear in the next section.

The Self-Reply Loop

This was the hardest problem to solve, and it took three attempts to get right.

When Claude replies to a review comment, GitHub sees that as activity on the PR. Depending on how the reply is structured, it can trigger a new pull_request_review webhook event. That event hits the server, which fetches the comments, launches Claude again, which replies again, which triggers another webhook — an infinite loop.

Attempt 1: Review ID deduplication. I added a Map that tracked recently processed review IDs and skipped duplicates. This didn't work because Claude's replies create new reviews with different IDs. The dedup map never matched.

Attempt 2: PR-level cooldown. After processing a review on a PR, ignore all reviews on that PR for five minutes. This sort of worked, but the timeout was a guess. A review with many comments might take Claude 20 minutes to process, and a legitimate follow-up review during the cooldown would be silently dropped.

Attempt 3: Hidden marker detection. This is what stuck. Every reply Claude posts includes the  marker in its body. When a new webhook arrives, the server fetches the review's comments and checks if any of them contain the marker:

const MARKER = "<!-- claude-pr-reviewer -->";
const comments = fetchReviewComments(owner, repo, prNumber, reviewId);

if (comments.some((c) => (c.body || "").includes(MARKER))) {
	console.log("Ignoring — comments contain our marker");
	return;
}

The server also checks the review body itself for the marker, and filters out bot users. Together, these checks create a reliable self-reply detection system. If any part of the incoming review was generated by the tool, the whole review is ignored.

The marker approach is better than timing-based solutions because it's deterministic. It doesn't matter how long Claude takes, how many comments there are, or how quickly GitHub delivers the webhook. If the tool wrote it, the tool ignores it.

The GitHub Layer

All GitHub API interaction goes through the gh CLI rather than raw HTTP requests. This avoids dealing with authentication tokens, pagination headers, and OAuth flows — gh handles all of that:

function gh(args, options = {}) {
	const result = execFileSync("gh", args, {
		encoding: "utf-8",
		maxBuffer: 10 * 1024 * 1024,
		...options,
	});
	return result.trim();
}

The maxBuffer is set to 10MB to handle large diffs. The rest is thin wrappers: fetchReviewComments, fetchPRDiff, fetchPRDetails, replyToComment, postPRComment. Each is a single gh api or gh pr call.

Using gh also means the tool authenticates as me, not as a GitHub App or bot account. Comments show up with my avatar and my name, which is a deliberate choice — I'm not trying to build a product, I'm building a personal workflow tool. The replies are mine, written by my agent, reviewed on my machine.

Running It

A single shell script starts everything:

node server.js &
ngrok http 3777 --domain="glaciered-nonferociously-amara.ngrok-free.dev" &

The ngrok domain is a free static domain that persists across restarts. The GitHub webhook is configured once with this URL and the shared secret, and it just works. No DNS changes, no certificate management, no deployment pipeline.

The workflow in practice looks like this: I push a feature branch, open a PR, then leave a review from GitHub's UI — commenting inline on specific lines, requesting changes, asking questions. Within seconds the webhook fires, my machine checks out the branch, and Claude starts working through the task list. I can watch from my phone or my terminal. When it's done, each comment gets a reply and the changes are committed and pushed.

What I'd Do Differently

The polling mechanism works but it's crude. A named pipe or WebSocket between the server and the Claude session would be cleaner than writing a JSON file and polling for it every three seconds. The current approach works because the polling interval is short and the file system is fast, but it's not elegant.

The open -a Terminal call to launch a new terminal window is macOS-specific. If I wanted to run this on a Linux server, I'd need to rethink the Claude launch mechanism — probably switching to a headless session or a tmux pane.

The response file living in /tmp is also fragile. If the machine reboots before the server picks up the response, the work is lost. A more robust approach would write responses to a known directory inside the project — somewhere that survives restarts and is easy to find if you need to replay manually.

The Bigger Picture

What makes this interesting isn't the webhook plumbing — that's straightforward Express and GitHub API work. It's the interaction model. GitHub's review UI is designed for humans to read and humans to action. By inserting an agent into that loop, the UI becomes a task dispatch interface. I write the review in the same way I always would — the only difference is that the comments get actioned automatically.

This is the pattern I keep coming back to with AI agents: don't build new interfaces for the AI. Use the interfaces you already have, and let the agent work behind them. GitHub already has a great review UI with inline comments, threaded discussions, and diff navigation. Building a separate dashboard for "AI code review tasks" would be worse in every way.

The tool also keeps me in control in a way that fully automated solutions don't. I'm still the one writing the review. I'm choosing which lines to comment on, what changes to request, what questions to ask. The agent handles the mechanical part — finding the file, making the edit, running the build, pushing the commit. The judgment stays with me.