Content Understanding & AI Moderation

2025–Present · Platform signals, moderation, governance

Content Understanding AI Moderation Policy Decisioning Evals

I build the AI that reads content on Roblox — ad creative and interactive experiences — and the moderation systems that act on what it finds. The same understanding signals feed surfaces well beyond moderation: brand suitability, discovery, ranking, creator and user experiences. This page describes the publicly safe shape of the work, not internal systems or numbers.

One model layer, many consumers

It helps to separate the two halves of the job. Content understanding is the model layer: multimodal systems that work out what a piece of creative or an experience actually contains. Moderation is the sharpest consumer of that layer — the decisions with enforcement behind them — but it is not the only one. Once a platform can genuinely read its own content, everything downstream wants the signal: discovery and ranking want quality and relevance, advertisers want brand suitability, creator tools want categorization, user surfaces want age appropriateness. Build the read once, and it compounds everywhere.

Why moderation is the hard version

Roblox is an ad-supported platform with a famously young audience, which means ads moderation sits at the intersection of two non-negotiable constraints at once: brand safety, which advertisers price, and child safety, which regulators enforce. There is no margin where one can be traded against the other. Moderation here is not a cost center bolted onto the ads business — it is the thing that makes the inventory trustworthy enough to sell at all.

The content makes it harder still. At YouTube I built systems that read video — content that cooperates, in the sense that it is the same file every time you look at it. Ad creative on Roblox is adversarial: it is produced by parties who sometimes want to be misread, and it can include or lead into interactive experiences whose behavior is not fully visible from their static assets. The read is harder, and the consequences of a wrong read are higher.

The shape of the system

Stripped of internal detail, the pipeline turns evidence into decisions. Models read the creative across modalities — visuals, audio, text, the destination it leads to — against policy. The outputs do not act directly; they flow into policy decisioning that produces an explicit, recorded outcome: approve, reject, restrict, or escalate to a human reviewer. Every decision carries its evidence with it, because a moderation decision a platform cannot reconstruct later is a liability, both to advertisers asking why and to regulators asking how.

The parts I care most about are the governance layers, because they are what let the models be used at all:

Evals as the gate. A model earns decision scope by demonstrating agreement with human judgment on curated case sets — and keeps that scope only while its disagreement and incident evidence stays inside bounds. I think of this as autonomy earning scope, and it is the operating principle of the whole system.
Human escalation as a designed path, not an exception handler. The interesting design question is which cases route to people, with what evidence attached, and how reviewer decisions flow back into the eval sets as precedent.
Auditability as a first-class output. The decision record is a product surface — for internal review, advertiser communication, and the transparency obligations that come with regimes like the EU's Digital Services Act.

What's next: experience understanding

The frontier I am taking on now is the content you cannot read from outside: the interactive experiences themselves. An experience is a program; much of what it contains only exists when someone plays it. The direction is to put the observer inside the content — train and run agents in the same sandbox the experience runs in, let them explore, and collect artifacts along the way: screenshots, transcripts, event traces, interaction logs. Those artifacts get distilled into an experience understanding record — a structured, versioned description of what an experience actually contains and does — that ads suitability, age appropriateness, discovery, and safety systems can all consume.

It is the same discipline as moment-level video understanding, one generation harder: content you have to inhabit to read, observed by agents instead of encoders. The open research problems — exploration coverage versus cost, evaluating the records themselves, experiences that behave differently when observed — are written up in my research notes.

The lineage

I have watched this "build the read once, it compounds everywhere" pattern play out before. At YouTube I built moment-level video understanding for brand ads; it launched publicly as Peak Points, and the underlying signals then diffused into YouTube's contextual platform — powering contextual sponsorship matching between brands and creators, among other surfaces. The Roblox work is the same play on harder content, and experience understanding is the part of the read that does not exist yet.