How We Built a Voice AI System That Handles Real Healthcare Calls
Building a voice AI system that handles real healthcare calls is fundamentally different from building a chatbot. When a patient calls to reschedule their appointment, they expect a natural conversation — not a phone tree.
The Problem We Solved
Healthcare offices often lose a meaningful share of incoming calls during high hold-time windows. Every missed call can mean a missed appointment, lost revenue, and a frustrated patient. Staff also spend hours on repetitive scheduling tasks instead of patient care.
MDFit Nova-Sonic changes that equation. It answers every call, understands natural speech, and manages appointments in real-time.
Architecture Overview
The system is built on Amazon Nova-Sonic for real-time voice processing, with Twilio handling the telephony layer. Here is the high-level flow:
- Patient calls the office number
- Twilio routes the call to our WebSocket endpoint
- Amazon Nova-Sonic processes speech in real-time
- Our agent system determines intent and takes action
- The patient hears a natural response within 2 seconds
The 5-Agent System
We use 5 specialized AI agents, each handling a specific domain:
- Scheduling Agent: Books new appointments based on provider availability
- Rescheduling Agent: Handles date/time changes with conflict detection
- Cancellation Agent: Processes cancellations with confirmation
- Information Agent: Answers questions about office hours, locations, and providers
- Escalation Agent: Routes complex cases to human staff
Each agent has its own prompt, tool set, and memory context. The orchestrator routes conversations to the right agent based on intent recognition.
Key Technical Decisions
Real-Time Streaming
We chose WebSocket streaming over REST for audio because latency matters. A multi-second delay in a phone conversation feels broken. Our target was sub-2-second median response time, with escalation safeguards when performance drifts.
HIPAA-Aware Architecture
Our architecture uses HIPAA-eligible AWS services. PHI-sensitive data is encrypted at rest and in transit, and we maintain audit logs for operational traceability. Full HIPAA compliance still depends on each organization's policies, controls, and BAA process.
Multi-Tenant Design
The system supports multiple healthcare practices, each with their own providers, schedules, and configurations. This is enterprise software — deployed at Rothman Orthopaedic with real patients calling (844) 699-2336.
Results
- Typical median response latency under 2 seconds in current deployment windows
- High intent-recognition performance on core scheduling flows
- Architecture designed for 100+ concurrent sessions on current infrastructure
- Production-deployed and handling real patient calls
Voice AI in healthcare is no longer just a prototype exercise. With proper escalation design and compliance-aware operations, it can run safely in production workflows.
For related architecture breakdowns, visit Blog, see MDFit, and explore broader delivery areas in Solutions.