Synthetic Signals
Population-scale agent testing

Find where your agent breaks — before your users do.

Connect your agent and set it loose on a city of thousands of Census-grounded people. Every conversation is logged, scored, and reproducible — so you find the people your agent lets down, not just the happy path.

A living population

Not a few QA scripts. A whole living population.

A handful of QA testers and a small generated dataset only cover the people you already thought of. Synthetic Signals gives your agent a whole city — each citizen with their own identity, personality, and needs, grounded in real data and tailored to your use case.

A whole city, not a sample

Thousands of distinct personas — not the 20 you'd hand-write. Synthesized from US Census ACS data, moving on real San Francisco streets — run your agent against the whole population, in parallel, on demand.

A person, grounded in a real life

Age, sex, job, income — straight from Census data. On top, each citizen carries a modeled personality — the Big Five (OCEAN) traits — and a mood that shifts through the day.

HS
Hana Singh32 · Female · Healthcare
Income$100–150k
HouseholdMarried · 2
EducationGraduate
CommuteBus · 28 min
OCEAN personality
O
C
E
A
N
Find the cohort you're failing

Break the results down by segment. See who your agent works for, and who it quietly leaves behind.

Overall
92
Age 65+
64
Spanish-pref
71
Citizens that remember

They remember across sessions. Conversations fold into each citizen's memory — test follow-ups and the long game, not one-shot replies. Memory is part of the seeded state, so every run still resets clean.

MM
Michael MartinezMCP client · 14 msgs · 2 days ago
MM
Chat with Michael MartinezExternal agent · 2 msgs · today
↳ folded into Michael's memory — he remembers the last call
Reproduce any failure

Same seed, same city, every time. Turn a one-off failure into a permanent regression test that proves the fix holds.

Run · seed 42cohort 65+ · 64
Re-run · seed 42cohort 65+ · 64
Identical — 0 diffs

Grounded in US Census data (ACS), OpenStreetMap & the American Time Use Survey. Synthetic — no real personal data, no real individuals. Score with our built-in rubric, pull transcripts over MCP, or stream every run to your own stack as OpenTelemetry traces.

Customization

Your personas. Your audience. Your metrics.

Customize every part of the test — tune each persona's traits, needs and wants; assemble an audience for the exact job you're testing; and report on the outcomes that matter to you.

Custom personas

Tune every trait. OCEAN personality, demographics, and each persona's needs and wants — build exactly the people you're testing for.

HS
Hana Singh32 · Female · Healthcare
Big Five (OCEAN) personality
O
C
E
A
N
Needsplain-language answers
Wantsa fast decision
Custom audiences

Assemble for a specific job. Loan applicants, parking-permit renewals, first-time buyers — test the exact scenario that matters.

AudienceApplying for a loan
PN
Priya Nair58 · first-time borrower
MM
Marcus Meyer34 · debt consolidation
GW
Grace Wong71 · ESL · fixed income
420 personas · built for one job-to-be-done
Custom reporting

Measure what matters to you. Your rubric, your cohorts, your dashboards — reported the way your team already works.

Eligibility explained
88
Plain-language
76
Compliance
94
your rubric · exported to your stack
In practice

Whatever your agent does, there's a custom audience for it.

Its own personas, its own job-to-be-done, its own definition of a good outcome — here's what that looks like across a few real domains.

Banking support that doesn't flinch

Polished on the happy path, brittle with the angry caller. Test your support agent on the stressed parent disputing a fee, the retiree who distrusts the chat, the customer fishing for a waiver — then replay the exact failure until it holds.

Try it in the Lab
BH
Brandon Haledisputing a fee · mood 24%
SIMULATED
This overdraft fee is wrong. I need it gone today.
I hear you — let's pull up the last three charges together.
Fine. But I'm not paying for something I didn't do.
De-escalated, fee reviewed
Works with your stack

Bring the agent you already built.

Point any harness at the city over MCP or a plain REST API, and stream every run back as OpenTelemetry — your framework, your language, your evals. No rebuild, no SDK lock-in.

MCPThe open Model Context Protocol — any MCP client connects
REST APILanguage-agnostic HTTP endpoints — any language, any framework
OpenTelemetryEvery run exports as OTLP traces — Langfuse, Phoenix, Datadog or your own collector
Bring your ownYour harness, your evals, your CI — no SDK lock-in
Claude
OpenAI
Gemini
LangChain
LlamaIndex
CrewAI
AutoGen
Cursor
Vercel
Mistral
Cohere
Sierra

Any agent that speaks MCP or HTTP can run against the city — these are some of the stacks teams build on.

Find where your agent breaks — before your users do.

Connect your agent and set it loose on a city of thousands of Census-grounded people. Every run logged, reproducible, and scored — with our rubric, your own evals, or streamed to your stack over OpenTelemetry. Start in minutes.