AI Voice Automation for Healthcare Scheduling: Lessons From Production
The Healthcare Scheduling Problem
Healthcare call centers are overwhelmed. The average patient waits 8 minutes on hold to schedule a routine appointment. Staff turnover in medical reception roles exceeds 30% annually. And every missed call is a missed appointment — which means lost revenue and delayed care.
Automation promises to help, but most healthcare organizations have been burned by terrible IVR systems. "Press 1 for scheduling, press 2 for billing" doesn't solve the problem — it just adds friction.
We built MDFit Nova-Sonic to handle this differently: a voice AI assistant that has natural conversations with patients, understands context, and actually completes the scheduling workflow.
What MDFit Nova-Sonic Does
MDFit handles the core phone-based workflows that consume most of a medical office's call volume:
- Appointment scheduling — finding available slots, confirming patient details, booking
- Rescheduling — moving existing appointments to new times
- Cancellations — processing cancellations with reason tracking
- Provider messaging — taking messages for specific providers and routing them appropriately
- General inquiries — office hours, directions, insurance questions
The key difference from traditional IVR: patients speak naturally. They say "I need to see Dr. Smith next Tuesday afternoon" and the system understands, checks availability, and confirms — all in a single conversational turn.
Architecture: 5 Specialized AI Agents
Rather than building one monolithic AI agent that tries to do everything, we designed MDFit with 5 specialized agents:
- Triage Agent — the first point of contact, determines caller intent and routes to the appropriate specialist agent
- Scheduling Agent — handles new appointment booking with calendar integration
- Modification Agent — manages rescheduling and cancellation workflows
- Messaging Agent — takes and routes provider messages
- Escalation Agent — detects when the caller needs a human and performs warm handoff
Each agent has its own system prompt, tool access, and conversation patterns. The triage agent acts as an orchestrator, handing off to specialists as the conversation evolves. If any agent detects confusion, frustration, or a situation outside its scope, the escalation agent takes over and connects the caller to a human.
The Voice AI Stack
The real-time voice pipeline is the most technically demanding piece:
- Amazon Nova-Sonic — the speech-to-speech foundation model that powers natural conversation
- Twilio — telephony integration via WebSocket for real-time audio streaming
- AWS Lambda — serverless compute for the agent orchestration layer
- DynamoDB — conversation state and session management
- Custom streaming pipeline — bidirectional audio with sub-200ms response latency
Latency is everything in voice AI. If the system takes more than a second to respond, the experience feels broken. We invested heavily in streaming architecture to ensure the AI responds naturally, with appropriate pauses and conversational timing.
HIPAA-Aware Architecture
Healthcare AI has to take patient data seriously. MDFit's architecture is designed with HIPAA considerations throughout:
- No PHI in logs — patient identifiable information is never written to application logs
- Encrypted transport — all audio and data streams use TLS 1.3
- Session isolation — each call gets its own isolated execution context
- Audit trail — every action is logged for compliance without exposing patient data
- Access controls — role-based access for healthcare staff managing the system
We're careful to say "HIPAA-aware" rather than "HIPAA-compliant" because full compliance requires organizational policies, BAAs, and processes beyond just technology. The architecture supports compliance — the healthcare organization implements the full program.
Production Deployment at Rothman Orthopaedic
MDFit is deployed in production at Rothman Orthopaedic, handling real patient calls on a real phone number. This isn't a demo or a proof of concept — it's answering phones and scheduling appointments.
The deployment process taught us several things:
- Provider calendars are messy. Integrating with existing scheduling systems (Epic, Athena, custom EMRs) is often harder than building the AI itself.
- Patients are patient. Most callers are surprisingly comfortable talking to an AI assistant, especially when it's clearly identified as one.
- Edge cases are infinite. Patients ask about insurance coverage, request specific rooms, need translation services, and have questions the AI was never trained for. The escalation agent is critical.
Multi-Tenant Architecture for Scale
MDFit is built as a multi-tenant platform, meaning each healthcare organization gets its own isolated configuration:
- Custom greetings and provider directories
- Organization-specific scheduling rules and constraints
- Separate conversation logs and analytics dashboards
- Independent escalation routing
This allows us to onboard new healthcare clients without deploying new infrastructure. Configuration changes — adding a new provider, updating office hours, modifying the greeting — take effect immediately.
What We'd Do Differently
After shipping MDFit to production, here's what we'd change if starting over:
- Start with the escalation agent first. Knowing when to hand off to a human is more important than handling every case. Build the safety net before the trapeze.
- Record and replay real calls earlier. Synthetic test data doesn't capture the messiness of real conversations. Get real audio samples (with consent) as early as possible.
- Invest in latency monitoring from day one. Voice AI performance degrades gradually, and by the time users complain, the problem has been building for weeks.
Learn More
MDFit Nova-Sonic is part of the Foundry Ventures product portfolio. If you're interested in AI voice automation for healthcare, visit our products page to learn more about the platform and its capabilities.