Case Study
28 min read

How We Built getarbol: AI Agents That Handle 5,000+ Calls a Day

Every call answered in under 1 second. 50+ languages. 73% of callers believe they're talking to a human. 99.9% uptime. Positive ROI in week one. This is the story of building getarbol.com from zero to production.

March 15, 2025

The problem nobody wanted to solve

Here's a number that should terrify every business owner: 62% of business calls go unanswered.

Not 6%. Not 16%. Sixty-two percent. More than half of every call that comes into a business — a potential customer, a patient needing to reschedule, a homeowner with a burst pipe at midnight — rings out into the void. Voicemail. Dial tone. Nothing.

And the downstream effect is brutal. When a potential client waits longer than 5 minutes for a response, close rates drop by 400%. That's not a typo. Four hundred percent. The lead that was warm at 9:01 AM is ice cold by 9:06. And 80% of callers who reach voicemail never call back. They don't leave a message. They don't try again. They call the next name on Google.

Call centers were supposed to solve this. They didn't. They created a different set of problems — long hold times, language barriers, high turnover rates, inconsistent quality. A single human agent costs $35,000+ per year in salary alone, before benefits, before training, before the desk and the headset and the software licenses. And that agent still misses calls. They go to lunch. They call in sick. They quit after three months because call center work is soul-crushing. They're not there at 3 AM when your biggest client calls with an emergency.

The industries that depend most on phone calls are the ones getting hit hardest.

Healthcare providers miss patient calls and those patients end up in emergency rooms instead of scheduled appointments. Real estate agents lose leads while they're showing a property — the buyer who called goes to the agent who picked up. Legal firms watch revenue walk out the door every time a potential client hears a voicemail greeting instead of a human voice. Insurance agencies, home service businesses, restaurants, dental offices — every one of them bleeding opportunity around the clock.

We saw this playing out across client after client. Businesses spending thousands on Google Ads to drive phone calls, then missing half of those calls because nobody was available to answer. It was like filling a bathtub with the drain open.

We didn't set out to build a call center product. We were building software for a client — a mid-sized restaurant group in South Florida — and they asked us a simple question: "Can you make something that answers our phone?"

Not a voicemail system. Not a phone tree. Something that actually answers — takes the reservation, answers the question about the gluten-free menu, gives directions, handles the modification for the party of eight that's now a party of twelve. Something that sounds like a person and works like a machine.

That question became getarbol.com — and it fundamentally changed how we think about AI, about product development, and about what "good enough" means when the stakes are real-time human conversation. See the full AI call center case study for the business impact.

What we actually built

Let's be precise about what Arbol is, because the market is full of misleading terms.

Arbol is not a chatbot with a phone number. It's not an IVR system with a better voice. It's not a scheduling widget that can also take calls. It's a full AI call center — multi-agent voice workflows that handle both inbound and outbound calls end-to-end, with real-time integrations into CRMs, calendars, and business tools.

Think of it as virtual employees. You describe what you want them to do in plain English — "Answer the phone, greet callers warmly, ask what they need, book appointments into my Google Calendar, and text me if someone asks for me specifically" — and ten minutes later, you have an AI receptionist that does exactly that. No code. No flow charts. No consultants. No six-week implementation project.

5,000+
Calls answered daily
73%
Thought it was human
50+
Languages supported
99.9%
Uptime since launch

Read the full getarbol case study to see how this system handles real-world restaurant operations 24/7.

Every call picked up in under one second. Not under ten seconds. Not under five. Under one. The phone rings, and before the second ring, the AI has already greeted the caller. No hold music. No "your call is important to us." No IVR trees where you press 1 for sales and 2 for support and 3 to be transferred to another menu. Just a voice that understands what you need and handles it — in whatever language you speak.

That "whatever language you speak" part isn't a footnote. It's core to the product. A caller speaks Spanish, the AI speaks Spanish. Someone switches to Portuguese mid-sentence, the AI follows. No menus. No language selection. The AI just listens and responds in kind.

The architecture behind the voice

Building a voice AI that sounds human is an interesting engineering challenge. Building one that sounds human while handling thousands of concurrent calls across 50+ languages without dropping a single one, while maintaining sub-second response times, while integrating with a dozen different CRMs and calendars in real-time — that's a different kind of problem entirely.

Here's how we solved it.

The speech pipeline

The call flow looks deceptively simple from the outside. A phone rings. Someone picks up. A conversation happens. The caller hangs up satisfied. But underneath that simple interaction, there are six distinct stages happening in sequence, and the total round-trip time for all six has to stay under 400 milliseconds or the conversation starts to feel wrong.

  1. Call ingestion — The call comes in via Twilio's telephony infrastructure. Twilio handles the phone number provisioning, carrier interconnects, and the initial audio stream setup. We chose Twilio because of their global coverage, their reliability track record, and critically, their HIPAA Business Associate Agreement option for healthcare clients.

  2. Real-time transcription — Audio streams continuously to our transcription layer. This isn't a batch process where we wait for the caller to finish speaking. It's streaming transcription — we're converting speech to text as it happens, word by word. The system needs to handle accents, background noise, poor cell connections, and people who mumble, all in real-time.

  3. Intent classification and routing — The transcribed text hits our intent classification system. This is where the multi-agent architecture kicks in. The system needs to figure out what the caller wants — are they booking an appointment? Asking a question? Requesting a transfer to a specific person? Complaining? The classification has to be fast and accurate because it determines which agent handles the response.

  4. Response generation — OpenAI's language models generate the response, informed by the business context, the conversation history, the caller's intent, and any relevant data from integrated systems (like calendar availability or CRM records). The prompt engineering here is critical — the model needs to sound natural, stay on-brand, follow the business's specific instructions, and handle edge cases gracefully.

  5. Speech synthesis — Text-to-speech converts the generated text into natural-sounding audio. This isn't the robotic TTS of ten years ago. Modern speech synthesis handles prosody, emphasis, pacing, and even the subtle breathing patterns that make speech sound human. The voice needs to match the brand — a law firm's AI receptionist shouldn't sound like a college student.

  6. Audio delivery — The synthesized audio streams back to the caller through Twilio. The entire round trip — from the moment the caller finishes speaking to the moment they hear the AI's response — has to complete in under 400ms.

OpenAITwilioVercelAWSPrismaCloudflareStripeClerk

Why 400ms? Because humans are incredibly sensitive to conversational latency on phone calls. Research shows that people start perceiving awkward pauses at around 500ms. At 700ms, the conversation feels broken. At a full second, callers start saying "hello? are you there?" The AI had to be faster than the threshold of human perception — and ideally faster than most human agents, who typically take 1-3 seconds to formulate a response.

Infrastructure decisions that shaped everything

The infrastructure stack wasn't chosen from a trend list. Every component was selected for a specific reason, and most of those reasons come back to one word: latency.

Vercel + AWS — We chose Vercel's serverless architecture running on top of AWS infrastructure for the application layer. Serverless was critical because call volume is inherently spiky — a restaurant gets slammed at 6 PM, a doctor's office at 8 AM, a plumber during the first freeze of winter. Traditional server infrastructure means either over-provisioning (paying for capacity you don't use 90% of the time) or under-provisioning (dropping calls during peaks). Serverless scales to zero and to infinity. We pay for what we use, and our customers never experience capacity limits.

Cloudflare — Sits at the edge providing CDN, DDoS protection, and Web Application Firewall capabilities. When you're handling thousands of concurrent voice calls, you're a ripe target for denial-of-service attacks. Cloudflare's edge network also means that API requests from our integrations (calendar checks, CRM writes) hit the nearest point of presence, shaving milliseconds off every interaction.

Prisma — Handles the database layer with type-safe queries. We considered raw SQL for performance, but Prisma's type safety caught dozens of bugs during development that would have been runtime failures in production. When a bug in your database layer means a dropped call for a patient trying to reach their doctor, type safety isn't a nice-to-have.

Clerk — Manages user authentication and session handling. Our customers log into a dashboard to configure their AI agents, review transcripts, and manage settings. Clerk handles the entire auth flow — SSO, MFA, session management — so we didn't have to build it from scratch.

Stripe — Payment processing at PCI DSS Level 1. We never see, store, or handle credit card numbers. With a pay-per-minute model, billing accuracy is everything. Stripe's metered billing API handles usage tracking, invoicing, and volume discounts automatically.

Multi-agent workflows: Why one AI isn't enough

Early in development, we tried the obvious approach: one AI model, one system prompt, handle everything. It worked for simple calls. But real calls aren't simple.

Here's what a typical call actually looks like for a real estate agency:

Caller: "Hi, I'm calling about the listing on Oak Street." (Intent: property inquiry — route to information agent)

Caller: "What's the square footage? And when was the roof replaced?" (Intent: detailed property question — needs knowledge base lookup)

Caller: "I'd like to schedule a showing for this Saturday." (Intent: appointment booking — needs calendar integration)

Caller: "Actually, can I talk to the listing agent directly?" (Intent: human transfer — needs warm handoff with context)

Four different intents in a single call. A monolithic model could technically handle all of them, but it wouldn't handle any of them well. The prompt would be overloaded with instructions for every possible scenario. The context window would fill up with irrelevant system context. And most importantly, the model would have no way to take real-world actions like checking calendar availability or writing to a CRM.

So we built a multi-agent system where specialized virtual employees handle different roles:

  • AI Receptionist — The front door. Answers every inbound call, greets the caller, identifies what they need, and routes to the right specialist agent. This agent is optimized for fast intent classification and warm, natural greetings. It's the agent that most callers interact with first, and it sets the tone for the entire experience.

  • Booking agent — Connected to Google Calendar, Calendly, and other scheduling systems. When a caller wants to book an appointment, this agent checks real-time availability, proposes times, handles conflicts, and confirms the booking — all within the call. The caller hangs up with a confirmed appointment and gets a text confirmation seconds later.

  • Lead qualification agent — Critical for sales-driven businesses. This agent asks qualifying questions, captures structured data (budget, timeline, specific needs), scores the lead based on business-defined criteria, and routes hot leads immediately while queuing warm leads for follow-up. The CRM gets updated in real-time.

  • Information agent — Pulls from the business's knowledge base to answer questions about hours, location, pricing, services, menu items, insurance acceptance, service areas — whatever the business has configured. This agent is optimized for accuracy and completeness, drawing from structured data rather than generating answers from the model's training data.

  • Transfer agent — Recognizes when a human is needed and executes a warm handoff. This isn't a blind transfer — the agent summarizes the conversation for the human before connecting, so the caller doesn't have to repeat themselves. The human gets a text or screen pop with the caller's name, reason for calling, and everything discussed so far.

  • Follow-up agent — Post-call automation. Sends confirmation texts and emails via Resend, updates CRM records, triggers Zapier workflows, and handles any downstream actions that need to happen after the call ends.

The critical innovation is the context handoff between agents. When the booking agent takes over from the receptionist, it receives the full conversation history and the caller's identified intent. The transition is invisible to the caller — they don't hear a transfer, don't get put on hold, don't experience any interruption. One moment the AI is greeting them, the next moment it's checking their calendar. From the caller's perspective, it's one seamless conversation.

And the entire behavior of every agent is configured through plain English instructions. A business owner writes something like:

"When someone calls about a property listing, answer their questions from the property database. If they want to schedule a showing, check the listing agent's Google Calendar and book a 30-minute slot. If the caller asks to speak with someone directly, transfer to the listing agent's cell phone and send them a text with the caller's name and property interest."

No programming. No decision trees. No consulting engagement. Ten minutes from configuration to live calls.

The 73% test: When AI passes for human

We needed to know if the product worked. Not in a lab. Not in a demo. In real conversations with real callers who had real needs.

So we ran a blind test. We partnered with several client businesses and randomly routed incoming calls to either the AI or a human agent. Callers had no idea they were part of a study. After the call, an independent team contacted them with a single question: "Were you speaking to a person or a computer?"

73% of callers who spoke to the AI believed they were talking to a human.

We expected maybe 40%. The 73% number forced us to rethink what was actually driving the perception of "humanness" in voice AI. It wasn't what we assumed.

It wasn't the voice quality. Modern TTS is good enough that voice alone doesn't give it away. The uncanny valley in speech synthesis has mostly been crossed. Callers weren't detecting robotic speech patterns.

It was the conversational behavior. The things that made the AI feel human were behavioral, not acoustic:

  • Appropriate pauses. The AI doesn't respond the instant the caller stops speaking. It waits a natural beat — the way a human does when processing what they've heard. Too fast feels robotic. Too slow feels broken. The sweet spot is 200-400ms of "thinking time" on complex questions.

  • Acknowledgment patterns. Before answering a question, the AI acknowledges what the caller said: "Great question about the roof" or "Let me check on that for you." Humans do this instinctively. Chatbots jump straight to the answer.

  • Interruption handling. Real callers interrupt. They start a sentence, change their mind, start over. They talk over the AI. A rigid system breaks when interrupted. Our agents detect interruptions, stop speaking, and adapt to the new direction of the conversation.

  • Contextual memory. The AI remembers what was said earlier in the call. "You mentioned you're looking for a three-bedroom — I found two listings that match." This is table stakes for a human conversation but surprisingly rare in voice AI products.

  • Graceful uncertainty. When the AI doesn't know something, it says so: "I don't have that information right away, but I can have someone follow up with you." It doesn't hallucinate answers. It doesn't freeze. It handles uncertainty the way a competent receptionist would.

  • Seamless escalation. When a conversation exceeds what the AI should handle — an emotional caller, a complex legal question, a VIP client — the transfer to a human is smooth. No hold music. No "please wait while I transfer you." Just: "Let me connect you with Sarah who can help with that directly."

The combination of these behaviors is what crosses the threshold. Each one is a small thing. Together they compound into an experience that feels genuinely human.

Scaling to 5,000+ daily calls

The first version of Arbol handled 50 calls a day and broke a sweat doing it. Getting to 5,000+ didn't just require more servers. It required rethinking the architecture from the ground up.

The bottleneck was never compute — it was concurrency management. Each active call holds open a WebSocket connection, a transcription stream, and a TTS stream simultaneously. At 200 concurrent calls, our original architecture buckled. Connection pools exhausted. Latency spiked. Calls started dropping.

The scaling journey: From 50 to 5,000

Phase 1: The first 50 calls (Months 1-2)

The original architecture was embarrassingly simple. A single server process handling everything — call ingestion, transcription, LLM calls, TTS, and audio delivery. It worked for demos and for our first pilot client. It absolutely could not scale.

The problems showed up fast. Each call consumed a persistent WebSocket connection, a transcription stream, a TTS stream, and one or more LLM API calls happening concurrently. A single call was fine. Ten concurrent calls were okay. Fifty concurrent calls pushed memory usage to 90% and latency started creeping up. We were one traffic spike away from downtime.

Phase 2: Breaking the monolith (Months 3-4)

We decomposed the system into independent, stateless functions. Call handling, transcription, intent classification, response generation, speech synthesis, and follow-up actions all became separate serverless functions that could scale independently.

This was the single most important architectural decision we made. A spike in incoming calls doesn't affect follow-up email sending. A slow LLM response on one call doesn't block transcription on another. Each function scales to exactly the demand it faces, and no further.

Phase 3: The concurrency crisis (Month 5)

At 200 concurrent calls, things broke again. Not our code this time — our connections to external services. We were opening new connections to transcription and TTS services for every call, and those services had connection limits we hadn't anticipated. The solution was connection pooling and multiplexing — sharing persistent connections across multiple concurrent calls.

Phase 4: Global scale (Months 6-8)

By this point, we had clients across multiple time zones and needed to handle traffic patterns that shifted throughout the day. The move to Vercel's serverless functions backed by AWS infrastructure solved this permanently. No servers to manage, no capacity to plan. The platform scales from 1 to 10,000 simultaneous calls with identical response times.

The optimizations that made the difference

Response caching — We analyzed our call data and found that 30% of all questions were variations of the same five queries: business hours, location/address, pricing, availability, and basic service descriptions. These questions didn't need the full LLM reasoning pipeline. We built a semantic cache that matches incoming questions against known patterns and serves pre-generated responses. The AI still "speaks" them naturally with varied phrasing, but the expensive generation step is skipped. This cut average latency by 60% for the most common queries.

Stateless call handlers — Any function in our fleet can pick up any call. Call state lives in the Prisma-managed database, not in local memory. This means if a serverless function terminates mid-call (it happens), another function picks up the conversation within 200ms with full context. The caller hears a brief pause — shorter than a natural thinking pause — and the conversation continues seamlessly.

Predictive warm-up — We noticed that many of our clients' call patterns were predictable. A restaurant gets a spike at 5 PM. A doctor's office gets slammed at 8 AM on Monday. We pre-warm the relevant infrastructure — database connections, model contexts, cached responses — before the predicted spike. This eliminates cold start latency during peak hours.

Sentry for observability — Every call generates structured telemetry: transcription confidence scores, response latency at each pipeline stage, intent classification accuracy, handoff success rates, and caller satisfaction signals derived from conversation patterns (short calls and repeated questions are bad signs; long calls and booking confirmations are good signs). We built alerting that fires when any metric degrades, and we catch issues before callers notice them.

Our monitoring dashboard tracks a metric we call "conversation health score" — a composite of latency, transcription accuracy, intent match confidence, and caller engagement patterns. When this score drops below threshold for any client's AI agent, we get alerted and can diagnose before a single caller is affected.

The language challenge

50+ languages wasn't a marketing number we picked because it sounded impressive. It was a hard requirement driven by our clients' real caller demographics.

Arbol serves healthcare providers in Miami (where 70% of the population speaks Spanish at home), real estate brokerages in New York (where you'll hear Mandarin, Russian, Korean, and Bengali in a single afternoon), legal firms in Los Angeles (Spanish, Tagalog, Vietnamese, Armenian), and service businesses in Houston (one of the most linguistically diverse cities in the United States).

Their callers don't announce what language they speak. They just start talking. The system has to figure it out — fast.

Our language detection runs during the first few seconds of speech. It analyzes phonetic patterns, not words, so it works even before the transcription model has produced usable text. Once detected, the entire pipeline switches: transcription model, LLM prompt language, TTS voice. The switchover happens within a single conversational beat.

And it handles code-switching — when a caller alternates between languages within the same sentence, which is incredibly common in multilingual communities. A caller might say "I need to hacer una cita for next Tuesday" and the AI handles it naturally without missing a beat.

English
62%
Spanish
24%
Portuguese
5%
Mandarin
4%
Other
5%

The "Other" category is the interesting one. It includes French, Arabic, Haitian Creole, Korean, Vietnamese, Russian, Hindi, and dozens more. None individually account for more than 1-2% of total calls, but collectively they represent hundreds of calls per day that would have been completely unhandled by a traditional monolingual receptionist.

No "press 1 for English." No language selection menus. The AI just speaks whatever language you speak. For callers, this feels like magic. For multilingual callers who are used to being asked to "please hold for a Spanish-speaking representative," it feels like respect.

The integrations that made it real

There was a moment about three months into development where we had a technically impressive demo. The AI answered calls beautifully. It sounded natural. It understood questions. Visitors were wowed.

But nobody was buying it.

The problem was obvious in hindsight: an AI that answers calls but can't do anything with the information is just a fancy voicemail. A receptionist who takes messages but never delivers them. The AI would have a perfect conversation with a caller who wanted to book a showing, but then the real estate agent would have to log in to a dashboard, read a transcript, and manually create the calendar event. That's not saving time. That's adding a step.

The product became real — actually useful, actually worth paying for — when we built the integration layer.

Calendar integrations

  • Google Calendar — Real-time availability checking and instant booking. The AI can see open slots, propose times, handle conflicts ("That time is taken, but I have a 2:30 available"), and confirm the appointment — all within the call. The event appears on the calendar before the caller hangs up.

  • Calendly — For businesses that use dedicated scheduling tools with complex availability rules, buffer times, and appointment types. The AI queries Calendly's API directly, respecting all the rules the business has configured.

CRM integrations

  • Salesforce — Leads flow into Salesforce in real-time with full call data: contact information, qualification scores, conversation transcript, and tagged intent.

  • HubSpot — Same deep integration for HubSpot shops. Contact creation, deal pipeline updates, task assignments for follow-up.

  • Pipedrive — Activity logging, deal creation, and contact management for sales teams using Pipedrive's pipeline model.

  • Zoho CRM — Full integration for teams in the Zoho ecosystem.

Workflow automation

  • Zapier — The escape hatch for everything else. Any action that Arbol doesn't natively integrate with can be triggered through Zapier, connecting to 5,000+ apps. A call from a new lead can trigger a Slack notification, create a Trello card, add a row to a Google Sheet, and send a follow-up email — all automatically.

  • REST API — Full programmatic access for businesses with custom systems. Every piece of data Arbol captures is accessible via API: call recordings, transcripts, lead scores, conversation metadata. Customers own their data and can export it at any time.

  • CSV bulk upload — For outbound campaigns. Upload a list of contacts, configure the AI's outbound script, and let it work through the list. The same natural conversation quality as inbound calls, at campaign scale.

Webhooks for real-time data flow

Every significant event in a call — call started, intent identified, appointment booked, transfer initiated, call completed — fires a webhook to the customer's configured endpoint. This enables real-time dashboards, custom alerting, and integration with any system that can receive HTTP requests.

The integration layer is what turned Arbol from a technology demo into a business tool. A real estate agent doesn't check a dashboard after every call — the lead is already in their CRM, scored and qualified, with a transcript attached and a showing already on their calendar. A healthcare provider's front desk sees the appointment appear in their scheduling system before the patient has finished saying goodbye.

Security and compliance: Non-negotiable from day one

When your product handles phone calls for healthcare providers, legal firms, and insurance companies, security isn't a feature you ship in v2. It's the foundation you build on from day one. A single data breach doesn't just lose you a customer — it ends your company.

We made the decision early: every architectural choice, every vendor selection, every integration would be evaluated through a security lens first and a feature lens second. This slowed us down in the first month. It has saved us every month since.

Data handling principles

Customer data is never used to train AI models. This is not a toggleable setting. It's architectural. Call data flows to OpenAI for real-time processing, but under contractual terms that explicitly prohibit training use. Customer data is processed and forgotten — never retained by our AI providers for model improvement.

Every sub-processor in our stack is contractually bound through Data Processing Agreements that require:

  • SOC 2 Type II certification
  • AES-256 encryption at rest
  • TLS 1.2+ encryption in transit
  • Incident notification within 24-72 hours
  • Data processing exclusively within the United States

This isn't a checkbox exercise. We audit our sub-processors, review their SOC 2 reports, and maintain a public sub-processor list that customers can review at any time.

HIPAA readiness

Healthcare is one of our largest verticals, and HIPAA compliance isn't optional. Twilio provides HIPAA Business Associate Agreements for voice and SMS data. Our infrastructure is configured for HIPAA compliance from the storage layer up. Call recordings, transcripts, and patient information are encrypted at rest and in transit, with access controls that limit data visibility to authorized users only.

Payment security

Stripe handles all payment processing at PCI DSS Level 1 — the highest level of payment security certification. We never see, store, or process credit card numbers. They flow directly from the customer's browser to Stripe's infrastructure. Even if our entire application were compromised, payment data would be untouchable.

Transparency

We notify customers 30 days before any new sub-processor begins processing their data. Customers can review the change, assess the sub-processor's security posture, and object if they have concerns. Objections go to privacy@getarbol.com and are handled by a human, not an autoresponder.

Our current sub-processor list includes:

  • Vercel — Application hosting and serverless functions
  • Amazon Web Services — Cloud infrastructure, storage, and computing (SOC 2 Type II, ISO 27001)
  • Twilio — Voice calls, SMS, phone number provisioning (HIPAA BAA available)
  • OpenAI — Language model processing (data NOT used for training)
  • Cloudflare — CDN, DDoS protection, WAF
  • Clerk — User authentication and session management
  • Prisma — Database ORM and connection management
  • Stripe — Payment processing (PCI DSS Level 1)
  • Resend — Transactional email delivery
  • Sentry — Error tracking and performance monitoring
  • Google — Analytics and usage data

Every vendor on this list was selected not just for capability, but for security posture. We rejected several vendors with better features because their security certifications or data handling practices didn't meet our bar.

Positive ROI in week one

The economics of Arbol are almost unfairly good for the customer. Let us walk through the math.

The cost of the status quo

A full-time receptionist costs $15-20/hour. That's $2,400-$3,200/month for a single shift — 8 hours, 5 days a week. They don't work nights. They don't work weekends. They don't work holidays. They take lunch breaks. They call in sick. And when two calls come in at once, one goes to voicemail.

Want 24/7 coverage? Triple the cost. You need three shifts, plus weekend coverage, plus holiday coverage, plus backup for sick days. That's $7,000-$10,000/month for a single phone line.

And even then, you're limited to the languages your staff speaks.

A full call center operation — the kind that handles hundreds of calls per day — costs $35,000+ per agent per year. A ten-agent call center is $350,000/year before technology costs, management overhead, office space, and the constant drag of hiring and training replacements for the 30-40% of agents who leave every year.

The Arbol alternative

Arbol costs a few cents per minute. No platform fees. No contracts. No minimum commitments. Volume discounts kick in automatically as usage scales. A business that handles 100 calls per day at an average of 3 minutes per call might spend a fraction of what a single receptionist costs — and get 24/7 coverage in 50+ languages with zero missed calls.

The real-world proof

A 50-person real estate brokerage came to us with a specific problem: they were missing 40% of their after-hours calls. Their agents were showing properties, meeting clients, handling paperwork. Calls that came in between 6 PM and 9 AM went to voicemail. Weekend calls — their busiest time for buyer inquiries — were hit or miss.

Each missed call was a potential listing, a potential buyer, a potential commission. In real estate, where a single transaction can generate $10,000-$30,000 in commission, even one recovered call per week pays for the entire service many times over.

They deployed Arbol on a Friday afternoon. By the following Friday, they had captured 100% of incoming calls. Zero missed. Leads were qualified and in Salesforce before the agents started their morning coffee. Showings were booked directly into Google Calendar. After-hours callers were getting the same quality experience as 9-to-5 callers.

Positive ROI in the first week. Not the first month. The first week.

$35K+
Annual cost per human agent
24/7
AI availability
0
Missed calls
Week 1
Positive ROI

The hidden ROI

The direct cost savings are obvious. But the hidden ROI is where it gets interesting.

Recovered revenue from missed calls. If 62% of calls go unanswered and 80% of those callers never try again, a business handling 50 calls per day is losing 25 potential customers every day. Even if only 10% of those would have converted, that's 2-3 lost customers daily. For a service business averaging $500 per customer, that's $1,000-$1,500 per day in lost revenue. Per day.

Faster response time. Arbol answers in under one second. The caller never waits. In an industry where response time directly correlates with close rates, this alone can double conversion rates.

Consistent quality. Human agents have bad days. They get tired at the end of a shift. They have personal biases. They forget to ask qualifying questions. The AI delivers the same quality on the 5,000th call of the day as on the first.

Data capture. Every call is transcribed, tagged, and searchable. Businesses can analyze call patterns, identify common questions (which informs their marketing), track lead sources, and measure conversion from call to appointment to sale. This data was previously locked in human agents' heads — if it was captured at all.

Where Arbol lives today

What started with a restaurant group asking if we could answer their phone now serves five core industries across the United States.

Healthcare

Patient scheduling, appointment reminders, intake calls, prescription refill requests, insurance verification routing. Healthcare has unique requirements — HIPAA compliance, sensitivity in tone, the need to recognize emergencies and route them immediately. Our healthcare AI agents are configured to detect urgency signals and can transfer to clinical staff within seconds.

A multi-location dental practice uses Arbol to handle all scheduling across six offices. Patients call a single number and the AI books into the right office's calendar based on the patient's location, insurance, and the type of appointment needed. What used to require three full-time schedulers now runs autonomously.

Real estate

Lead capture, showing coordination, after-hours inquiry handling, open house follow-up. Real estate is inherently a phone business — buyers want to talk, not fill out forms. Arbol captures leads 24/7, qualifies them (budget, timeline, area, property type), and routes hot leads to agents immediately while queuing warm leads for morning follow-up.

The 50-person brokerage that was our early success story has since expanded Arbol to handle all incoming calls, not just after-hours. Their agents report spending 60% less time on the phone and more time in person with clients.

Legal services

Client intake, consultation scheduling, call screening, after-hours emergency routing. Legal firms have a specific challenge: potential clients often call in distress, and the first conversation sets the tone for the entire relationship. Our legal AI agents are configured for empathy and precision — gathering essential case details without being clinical, screening for conflicts, and scheduling consultations with the right attorney based on practice area.

Insurance

Quote requests, claims intake, policy questions, renewal reminders. Insurance calls tend to be information-heavy — callers have specific questions about coverage, deductibles, and claims processes. The information agent pulls from the agency's knowledge base to answer accurately, and the lead qualification agent captures enough detail for the agent to prepare a quote before the callback.

Home services

Dispatch coordination, service scheduling, emergency routing, estimate follow-up. Home service businesses have a unique pattern: many of their calls are emergencies. A burst pipe, a broken AC unit in July, a dead furnace in January. The AI needs to assess urgency, dispatch the right technician, and communicate estimated arrival times — all in a single call.

Each industry has its own conversational patterns, compliance requirements, integration needs, and caller expectations. The multi-agent architecture means we can spin up industry-specific virtual employees with specialized knowledge and behavior without rebuilding the core platform. A healthcare receptionist and a real estate receptionist share the same underlying infrastructure, but their conversational behavior, compliance guardrails, and integration targets are completely different.

The plain English configuration that changed everything

There's a moment in every product's life where a design decision proves to be the most important one you made. For Arbol, it was the decision to make AI agent configuration happen through plain English instructions instead of flow charts, decision trees, or code.

Here's what configuration looks like for a new Arbol customer:

Business description: "We're a personal injury law firm in Houston. We handle car accidents, workplace injuries, and slip-and-fall cases."

Greeting: "Answer the phone warmly, introduce yourself as a receptionist for Martinez Law, and ask how you can help."

Qualification rules: "Ask what type of incident it was, when it happened, if they've seen a doctor, and if they have an attorney already. If the incident was within the last 2 years and they don't have an attorney, they're a qualified lead."

Booking rules: "Offer a free consultation. Check Attorney Martinez's calendar for 30-minute slots. Book the first available slot that works for the caller."

Transfer rules: "If the caller says it's an emergency or they're currently injured, transfer immediately to our emergency line."

That's it. No programming. No consulting. No six-week implementation. The business owner writes these instructions in plain English, and the AI follows them. Changes take effect immediately — update the instructions and the next call reflects the change.

This decision had a profound effect on adoption. Our setup time dropped from days to minutes. Customer success tickets dropped by 80%. And businesses started iterating on their AI agents in ways we never anticipated — A/B testing different greetings, refining qualification criteria based on call data, adjusting transfer thresholds as they built trust in the AI.

The best configurations come from businesses that have been using the product for a few months. They've reviewed hundreds of transcripts, identified edge cases, and refined their instructions to handle every scenario they've encountered. Their AI agents become more capable over time — not because the underlying technology improves, but because the humans configuring them get better at telling the AI what to do.

What we'd do differently

Every product has a list of things the team would do differently with the benefit of hindsight. Ours is shorter than most, but the lessons hit hard.

Start with compliance. We built security and compliance in from day one, and it was the single best decision we made. Every integration partner, every sub-processor, every data flow was designed with SOC 2 and HIPAA in mind from the start. We watched competitors try to retrofit compliance onto their voice platforms and it was always a nightmare — months of re-architecture, vendor swaps, and data migration. If you're building anything that touches sensitive data, compliance is not something you add later. It's the foundation you pour on day one.

Invest in monitoring earlier. We built deep observability after our first scaling incident at 200 concurrent calls. It should have been there from the first call. Now every call generates structured telemetry through Sentry: transcription confidence scores, response latency at each pipeline stage, intent classification accuracy, handoff success rates, and conversation health scores. We catch degradation before callers notice it. But those first few months of flying blind cost us sleep and probably some early customer trust.

Test with real noise. Our lab environment was too clean. Perfect microphones, quiet rooms, clear speech, native English speakers with neutral accents. Real calls have background noise — a restaurant kitchen, a highway, a child crying, a TV blaring. Real callers mumble, speak with heavy accents, switch languages mid-sentence, put you on speakerphone in a car going 70 mph with the windows down. We didn't catch these edge cases until production. Now our test suite includes recordings from actual noisy environments across dozens of acoustic conditions. We test with construction noise, crowded restaurants, wind, poor cell reception, and multi-speaker environments.

Ship integrations sooner. The AI receptionist was technically impressive from month one. It sounded great in demos. But customers didn't actually start paying until it could book into their calendar and push leads into their CRM. The product became real — genuinely useful, worth paying for — when it connected to the tools people already use. We spent too long perfecting the voice quality and not long enough building the integration layer that made the voice useful.

Don't underestimate the long tail. The common call scenarios are easy. "What are your hours?" "I'd like to book an appointment." "Can I speak to someone?" Those work great from day one. The hard part is the 10% of calls that don't fit any pattern. The caller who's angry. The caller who's confused and can't articulate what they need. The caller who's calling the wrong business. The caller who just wants to chat. Building graceful handling for every edge case is an ongoing process, and we should have allocated more engineering time to it from the start.


Arbol handles 5,000+ calls every day. Every one answered. Every language supported. Every hour of every day.

Setup takes 10 minutes. Configuration is plain English. Integrations connect to the tools you already use. Pricing is pay-per-minute with no hidden fees. ROI takes a week.

The 62% of calls that used to go unanswered? Zero. The 80% of callers who used to hang up and never call back? They're booked, qualified, and in your CRM before you finish your morning coffee.

This project follows our proven delivery process — from concept to production in weeks, not months. If you're building something that needs to work this reliably — or if you're tired of watching revenue walk away every time the phone rings — let's talk.

Building something ambitious?

100+ projects. 5 years. Small teams achieving massive outcomes.

Start a Conversation

Free consultation. No commitment required.