Lisa · voice AI, built for production

The voice agent that holds when the pilot ends.

The demo is forgiving. Real call volume isn't. Most voice AI pilots die quietly in week three when the queue bugs, the cold-start latency, and the silent hallucinations all show up at once.

A voice agent you can ship. An eval layer that keeps it honest.

Sub-second

End-to-end turn latency

Modular

Swap STT, LLM, TTS per engagement

In-VPC

Runs inside your environment

Every turn

Monitored and evaluated

Where production voice AI falls over

Most voice AI works in the demo and breaks in the call.

The demo conditions are forgiving. Real call volume isn't. Six places the pilot breaks once production starts.

Turn detection is brittle

The caller interrupts, the agent talks over, the call becomes a standoff. Most stacks use voice activity detection meant for a single speaker.

The agent hallucinates

A TTS bolted onto an LLM invents balances, order numbers, and policy details. The customer believes them. The complaint arrives three days later.

Escalation is a coin flip

The agent does not know when to hand off. The customer gives up before they reach a human and the containment metric still calls it a win.

You cannot see the turn

Session recordings, yes. Per-turn reasoning, latency per stage, escalation rationale, tool-call accuracy — no. You cannot debug what you cannot see.

400ms feels like a dropped call

Every extra hop in the pipeline reads as a frozen agent. The customer fills the silence, the turn-taking collapses, the recording becomes unusable for evals.

The pilot holds. The real day does not

Queue bugs, cold-start latency, missing rate-limit handling. The integration tests missed all of it because they tested the trace, not the conversation.

The models are not the hard part. Shipping a production loop around them is.

Inside the stack · the proof layer

Hope-based deployment is not an option for a live call.

Every Lisa deployment ships with an eval layer that tests the whole conversation, not just the trace. Built for voice, wired into the CI loop, runs in your environment.

Reliability, tested semantically.

Tests that pass on "Please enter your phone number" still pass on "Could you provide your contact number?" The simulator matches intent, not string. Change the prompt, the tests hold.

A firewall on every tool call.

An agent can call a CRM endpoint, get a 200 OK, and write the wrong data. Seven layers of validation sit on every tool output — exact match, regex, JSON schema, numeric tolerance, semantic equivalence, LLM-as-judge. The wrong data never reaches the customer or the system of record.

Cost per conversation, not per token.

Token counts don't tell you whether the business goal was hit. Per-conversation cost aggregated across turns, waste detection on redundant tool calls, model-swap ROI measured in unit economics. You get the number the CFO reads, not the one the engineers debug with.

The parts that make Lisa specific

Not a voice model with a phone number bolted on. A production stack built for live turn-taking, modular at every layer.

Not a chatbot with a phone number

Real-time voice system from the start. Turn-aware recognition, live reasoning, and speech synthesis in one session loop.

Knows when to stop

Escalates when automation should end. The goal is customer resolution, not containment metrics.

Modular, not monolithic

Swap STT, TTS, or LLM providers without touching the rest of the system. Adopt better models as they ship.

Girl speaking on an antique telephone

Observable from day one

Voice-session monitoring and agent quality evaluation built in.

80%

of calls resolved without a human

Pilot it on your own line

Run Lisa against your own call script.

Bring a real call script. We configure Lisa against it, run it through the eval layer, and hand you the trace for every turn before anything ships.

The call, end to end

Voice AI Orchestration — The Details

Pipecat orchestrates every turn. Every module — telephony edge, STT, LLM, TTS — is swappable. Adopt a better model the week it ships without rewriting the loop.

The call hits the telephony edge and the audio stream is handed to Pipecat. The edge is swappable — any SIP or media-stream provider that speaks RTP.