TikTok for PR Reviews

TikTok for PR Reviews — Research Context

The Problem

AI coding agents can generate thousands of lines of code per hour. The human reviewer is now the bottleneck. Traditional PR review UX was designed for a world where:

PRs are written by one human over a few hours
Reviewers have long, uninterrupted attention spans
A “big PR” is ~500 lines

None of these assumptions hold anymore.

But there’s a subtler problem hiding inside the obvious one: large PRs predate AI. Humans were already creating them, already annoyed about them, already doing nothing about it. AI doesn’t introduce a new failure mode — it amplifies an existing one by orders of magnitude. The root cause was never a UX problem. It’s an information asymmetry problem: the author spent days understanding the context, and the reviewer gets a diff and 30 seconds of description.

These seven demos explore both layers: TikTok-style consumption patterns for faster throughput (Demos 1–4), and AI-driven context layers that surface the decision surface of a PR rather than just its line surface (Demos 5–7).

The Two Surfaces of a PR

Surface	What it contains	What current tooling shows	What reviewers actually evaluate
Line surface	Every changed line	All diffs, all tools	Rarely directly useful
Decision surface	The 3–7 architectural choices made	Nothing, currently	The actual job of review

The line surface has thousands of items. The decision surface has 3–7. All current tooling shows the line surface and expects reviewers to mentally reconstruct the decision surface. That reconstruction is the cognitive load of code review — not reading speed, not diff format.

Demos 5–7 attack the decision surface directly.

The Seven Demos

Demo	Concept	Core Idea
1	Swipe Card Feed	Each PR is a card; swipe to approve/reject/snooze; friction scales with PR size
2	Diff Reel	File-by-file vertical feed with auto-advance; incentivizes atomic commits
3	PR Size Coach	Author gets a Reviewability Score before submitting; tool suggests splits
4	“For You Page” Queue	Algorithm-sorted review queue; shows why each PR was surfaced
5	The AI Brief	Agent briefs reviewer in first-person: decisions made, alternatives rejected, uncertainties flagged
6	The Decision Surface	AI infers decisions from diff shape; generates the questions a thorough reviewer should ask
7	The Agent Story Arc	Rich commit messages as narrative arc; what review looks like when agents treat documentation as a first-class output

Demo Notes

Demos 1–4 apply TikTok consumption UX patterns to code review. They share a common failure mode: they change the review interaction without changing what information the reviewer has. Consumption UX hides context; judgment tasks require it. Each demo’s honest critique is shown below.

Swipe (1): The inline diff is a teaser — the moment a reviewer needs more than 8 lines of context, they bail to GitHub. Swipe is also a touch gesture; most code review happens on laptops.
Reel (2): Auto-advance creates anxiety, not momentum. Per-file approvals are misleading — the correctness of one file often depends on another. Linear feed kills non-linear investigation.
Coach (3): Engineers know line count is a proxy. A 200-line auth change is harder than 800 lines of test fixtures. Once the score is disproved once, trust collapses.
For You Page (4): “Surfaced because: you own this file” optimizes speed at the cost of knowledge sharing — over time, only one person reviews each subsystem.

These aren’t failures of the demos — they’re discoveries. Understanding where TikTok patterns don’t fit is what pointed toward the second set.

Demos 5–7 work differently. They change what information the reviewer has, not how the interface looks.

AI Brief (5): The agent that wrote the code also writes a first-person brief — not a diff summary, but the agent’s actual reasoning: what it chose, what alternatives it rejected, and what it explicitly flagged for human judgment. The reviewer evaluates choices, not lines. The key insight: the agent doesn’t give verdicts (“this could cause a null pointer”). It gives orientation (“here are the 4 choices I made — do you agree?”). Analysis, not prescription.
Decision Surface (6): Works for human-written PRs with no author context. Two panels: AI-inferred decisions from diff shape (reviewer verifies yes/no), and skeptical questions the reviewer should ask (safety, correctness, coverage). Reviewer marks each question as answered-in-diff, known-from-context, or flags it for the author. Output is a structured review — not a pile of comments. Works without any author cooperation.
Agent Story Arc (7): Opens with the same PR as two commit histories — human (“fix auth”, “more work”, “wip”) vs. agent (each commit records a decision, an alternative, a risk level, and what was intentionally left out of scope). Same code. The agent version also reorders commits into narrative order — the order that makes the PR easiest to evaluate, not the order they were written. Closes with an argument for a new professional norm: AI agents should treat review-readiness as a first-class output, not a byproduct.

Key Insights

1. The problem is information asymmetry, not interface friction.
Large PRs predate AI. The cause has always been that authors know everything and reviewers know nothing. No swipe gesture fixes that gap.

2. Authors won’t add extra work at commit time.
That’s 30 years of evidence. The cause is laziness and flow-state interruption, not lack of structure. Adding more fields to commit boxes won’t change this.

3. AI agents are different.
An agent has perfect memory of its reasoning — every alternative considered, every tradeoff made, every assumption baked in. This reasoning is currently discarded at commit time. Capturing it is the real opportunity.

4. The decision surface is the unit that matters.
PRs have 3–7 key architectural choices. All current tooling ignores these and shows thousands of lines instead. Demos 5–7 show what happens when you invert this.

5. The best demo moment across all seven:
Demo 5’s With/Without Brief toggle for a 347-line PR. Without: 8 files, alphabetical order, no context. With: 4 decisions you can evaluate in 2 minutes. That contrast communicates the entire thesis.