Synthetic Signals

Population-scale agent testing

Find where your agent breaks — before your users do.

Connect your agent and set it loose on a city of thousands of Census-grounded people. Every conversation is logged, scored, and reproducible — so you find the people your agent lets down, not just the happy path.

Start testing your agent

test222 976 citizens · liveMon 11:49

Working Home Commuting

Hana Singh

32 · Female · Healthcare

Home / leisure · relaxing at home

Energy

Hunger

Social

Mood

$100–150k · Graduate · 228 Valencia St

External agent Daniela AliSIMULATED

Would you like to book?

Yes — the payment and cancellation terms work for me.

I'll send a payment link and the confirmation.

Yes, that works perfectly. Thank you!

Task95

Consistency92

Satisfaction95

A living population

Not a few QA scripts. A whole living population.

A handful of QA testers and a small generated dataset only cover the people you already thought of. Synthetic Signals gives your agent a whole city — each citizen with their own identity, personality, and needs, grounded in real data and tailored to your use case.

A whole city, not a sample

Thousands of distinct personas — not the 20 you'd hand-write. Synthesized from US Census ACS data, moving on real San Francisco streets — run your agent against the whole population, in parallel, on demand.

A person, grounded in a real life

Age, sex, job, income — straight from Census data. On top, each citizen carries a modeled personality — the Big Five (OCEAN) traits — and a mood that shifts through the day.

Hana Singh32 · Female · Healthcare

Income$100–150k

HouseholdMarried · 2

EducationGraduate

CommuteBus · 28 min

OCEAN personality

Find the cohort you're failing

Break the results down by segment. See who your agent works for, and who it quietly leaves behind.

Overall

Age 65+

Spanish-pref

Citizens that remember

They remember across sessions. Conversations fold into each citizen's memory — test follow-ups and the long game, not one-shot replies. Memory is part of the seeded state, so every run still resets clean.

Michael MartinezMCP client · 14 msgs · 2 days ago

Chat with Michael MartinezExternal agent · 2 msgs · today

↳ folded into Michael's memory — he remembers the last call

Reproduce any failure

Same seed, same city, every time. Turn a one-off failure into a permanent regression test that proves the fix holds.

Run · seed 42cohort 65+ · 64

Re-run · seed 42cohort 65+ · 64

Identical — 0 diffs

Grounded in US Census data (ACS), OpenStreetMap & the American Time Use Survey. Synthetic — no real personal data, no real individuals. Score with our built-in rubric, pull transcripts over MCP, or stream every run to your own stack as OpenTelemetry traces.

Customization

Your personas. Your audience. Your metrics.

Customize every part of the test — tune each persona's traits, needs and wants; assemble an audience for the exact job you're testing; and report on the outcomes that matter to you.

Custom personas

Tune every trait. OCEAN personality, demographics, and each persona's needs and wants — build exactly the people you're testing for.

Hana Singh32 · Female · Healthcare

Big Five (OCEAN) personality

Needsplain-language answers

Wantsa fast decision

Custom audiences

Assemble for a specific job. Loan applicants, parking-permit renewals, first-time buyers — test the exact scenario that matters.

AudienceApplying for a loan

Priya Nair58 · first-time borrower

Marcus Meyer34 · debt consolidation

Grace Wong71 · ESL · fixed income

420 personas · built for one job-to-be-done

Custom reporting

Measure what matters to you. Your rubric, your cohorts, your dashboards — reported the way your team already works.

Eligibility explained

Plain-language

Compliance

your rubric · exported to your stack

In practice

Whatever your agent does, there's a custom audience for it.

Its own personas, its own job-to-be-done, its own definition of a good outcome — here's what that looks like across a few real domains.

Banking support that doesn't flinch

Polished on the happy path, brittle with the angry caller. Test your support agent on the stressed parent disputing a fee, the retiree who distrusts the chat, the customer fishing for a waiver — then replay the exact failure until it holds.

Try it in the Lab

Brandon Haledisputing a fee · mood 24%

SIMULATED

This overdraft fee is wrong. I need it gone today.

I hear you — let's pull up the last three charges together.

Fine. But I'm not paying for something I didn't do.

De-escalated, fee reviewed

Works with your stack

Bring the agent you already built.

Point any harness at the city over MCP or a plain REST API, and stream every run back as OpenTelemetry — your framework, your language, your evals. No rebuild, no SDK lock-in.

MCPThe open Model Context Protocol — any MCP client connects

REST APILanguage-agnostic HTTP endpoints — any language, any framework

OpenTelemetryEvery run exports as OTLP traces — Langfuse, Phoenix, Datadog or your own collector

Bring your ownYour harness, your evals, your CI — no SDK lock-in

Claude

OpenAI

Gemini

LangChain

LlamaIndex

CrewAI

AutoGen

Cursor

Vercel

Mistral

Cohere

Sierra

Any agent that speaks MCP or HTTP can run against the city — these are some of the stacks teams build on.

Find where your agent breaks — before your users do.

Connect your agent and set it loose on a city of thousands of Census-grounded people. Every run logged, reproducible, and scored — with our rubric, your own evals, or streamed to your stack over OpenTelemetry. Start in minutes.

Start testing your agent