AI coding agents can generate thousands of lines of code per hour. The human reviewer is now the bottleneck. Traditional PR review UX was designed for a world where:
None of these assumptions hold anymore.
But there’s a subtler problem hiding inside the obvious one: large PRs predate AI. Humans were already creating them, already annoyed about them, already doing nothing about it. AI doesn’t introduce a new failure mode — it amplifies an existing one by orders of magnitude. The root cause was never a UX problem. It’s an information asymmetry problem: the author spent days understanding the context, and the reviewer gets a diff and 30 seconds of description.
These seven demos explore both layers: TikTok-style consumption patterns for faster throughput (Demos 1–4), and AI-driven context layers that surface the decision surface of a PR rather than just its line surface (Demos 5–7).
| Surface | What it contains | What current tooling shows | What reviewers actually evaluate |
|---|---|---|---|
| Line surface | Every changed line | All diffs, all tools | Rarely directly useful |
| Decision surface | The 3–7 architectural choices made | Nothing, currently | The actual job of review |
The line surface has thousands of items. The decision surface has 3–7. All current tooling shows the line surface and expects reviewers to mentally reconstruct the decision surface. That reconstruction is the cognitive load of code review — not reading speed, not diff format.
Demos 5–7 attack the decision surface directly.
| Demo | Concept | Core Idea |
|---|---|---|
| 1 | Swipe Card Feed | Each PR is a card; swipe to approve/reject/snooze; friction scales with PR size |
| 2 | Diff Reel | File-by-file vertical feed with auto-advance; incentivizes atomic commits |
| 3 | PR Size Coach | Author gets a Reviewability Score before submitting; tool suggests splits |
| 4 | “For You Page” Queue | Algorithm-sorted review queue; shows why each PR was surfaced |
| 5 | The AI Brief | Agent briefs reviewer in first-person: decisions made, alternatives rejected, uncertainties flagged |
| 6 | The Decision Surface | AI infers decisions from diff shape; generates the questions a thorough reviewer should ask |
| 7 | The Agent Story Arc | Rich commit messages as narrative arc; what review looks like when agents treat documentation as a first-class output |
Demos 1–4 apply TikTok consumption UX patterns to code review. They share a common failure mode: they change the review interaction without changing what information the reviewer has. Consumption UX hides context; judgment tasks require it. Each demo’s honest critique is shown below.
These aren’t failures of the demos — they’re discoveries. Understanding where TikTok patterns don’t fit is what pointed toward the second set.
Demos 5–7 work differently. They change what information the reviewer has, not how the interface looks.
AI Brief (5): The agent that wrote the code also writes a first-person brief — not a diff summary, but the agent’s actual reasoning: what it chose, what alternatives it rejected, and what it explicitly flagged for human judgment. The reviewer evaluates choices, not lines. The key insight: the agent doesn’t give verdicts (“this could cause a null pointer”). It gives orientation (“here are the 4 choices I made — do you agree?”). Analysis, not prescription.
Decision Surface (6): Works for human-written PRs with no author context. Two panels: AI-inferred decisions from diff shape (reviewer verifies yes/no), and skeptical questions the reviewer should ask (safety, correctness, coverage). Reviewer marks each question as answered-in-diff, known-from-context, or flags it for the author. Output is a structured review — not a pile of comments. Works without any author cooperation.
Agent Story Arc (7): Opens with the same PR as two commit histories — human (“fix auth”, “more work”, “wip”) vs. agent (each commit records a decision, an alternative, a risk level, and what was intentionally left out of scope). Same code. The agent version also reorders commits into narrative order — the order that makes the PR easiest to evaluate, not the order they were written. Closes with an argument for a new professional norm: AI agents should treat review-readiness as a first-class output, not a byproduct.
1. The problem is information asymmetry, not interface
friction.
Large PRs predate AI. The cause has always been that authors know
everything and reviewers know nothing. No swipe gesture fixes that
gap.
2. Authors won’t add extra work at commit
time.
That’s 30 years of evidence. The cause is laziness and flow-state
interruption, not lack of structure. Adding more fields to commit boxes
won’t change this.
3. AI agents are different.
An agent has perfect memory of its reasoning — every alternative
considered, every tradeoff made, every assumption baked in. This
reasoning is currently discarded at commit time. Capturing it is the
real opportunity.
4. The decision surface is the unit that
matters.
PRs have 3–7 key architectural choices. All current tooling ignores
these and shows thousands of lines instead. Demos 5–7 show what happens
when you invert this.
5. The best demo moment across all seven:
Demo 5’s With/Without Brief toggle for a 347-line PR. Without: 8 files,
alphabetical order, no context. With: 4 decisions you can evaluate in 2
minutes. That contrast communicates the entire thesis.