Industry
Building a Voice Agent That Switches Arabic, English and Hindi Mid-Call

Quick answer
A code-switching voice agent detects the caller's language from the first spoken words — analysing phoneme patterns and acoustic cues — then continues the full conversation in that language without a restart. For Dubai's mixed-language callers, MAJ Leads deploys agents confirmed for English, Arabic (Khaleeji-neutral MSA), Hindi and Malayalam, with seamless mid-call switching between them.
Why does code-switching matter for Dubai callers specifically?
Dubai is home to residents of more than 200 nationalities, according to the UAE Government's official fact sheet. The city's expatriate population accounts for the vast majority of residents, and the daily reality of that diversity is that callers rarely stay in a single language for an entire conversation. A business owner might open with "Hello, are you available?" and then, once the call is flowing, switch to Hindi to explain a complex requirement. A patient calling a clinic might start in Arabic and flip to English when giving their Emirates ID number.
This linguistic mixing is not a communication failure — it is a natural feature of multilingual communities. Linguists call it code-switching: the practice of alternating between two or more languages within a single conversation, sometimes mid-sentence. In Dubai, the most common switches are between English and Arabic, English and Hindi, and Hindi and Malayalam. For an AI receptionist, not handling this means the call breaks the moment the caller drifts out of the agent's expected language — which, in practice, is almost every call.
The scale of the relevant language communities in Dubai is significant. Speakers of Hindi and related South Asian languages form the largest expatriate grouping, numbering in the millions across the UAE; Malayali speakers number in the hundreds of thousands, concentrated heavily in Dubai. Arabic is the official language of the country, and English is the lingua franca in business contexts. A voice agent that can only handle one of these has, by definition, limited reach in this market. You can read more about the specific patient mix that makes multilingual handling essential in healthcare in our post on Arabic, Hindi and Malayalam AI receptionists for UAE clinics.
What is code-switching inside a voice agent — technically?
Code-switching in a voice AI pipeline is more than translating one phrase at a time. It requires the system to maintain conversation context across language boundaries — so that when a caller switches from English to Hindi mid-way through giving their name and appointment date, the agent does not lose what was said before the switch, does not ask the caller to start over, and does not revert to a default language.
The mechanism begins with spoken language identification (LID): an acoustic model analyses the first words of the call — examining phoneme patterns, rhythm and prosody — to determine which language is being spoken, without waiting for a full transcription. According to Picovoice's 2026 technical overview of language detection, spoken language identification can operate within a few seconds of audio, returning a language code and confidence score that feeds downstream speech recognition, voice synthesis and logic layers.
On the Vapi platform that MAJ Leads uses as its confirmed voice infrastructure, multilingual support is configured at the transcriber level. Speech-to-text providers including Deepgram (Nova 2/Nova 3 with multi-language mode) and Gladia are documented in Vapi's multilingual documentation as supporting automatic language detection and mid-call language switching — maintaining conversation state across the switch. The voice synthesis layer is configured to match, so the agent responds in the same language the caller just used.
- Language identification layer. Analyses the caller's speech acoustics within the first few seconds. Returns a language code and confidence score, routing the audio to the correct speech-to-text model.
- Speech-to-text (STT). Transcribes the caller's words in the detected language. Multi-language STT models handle switches within a single utterance.
- LLM (language model) layer. Processes the transcribed text and generates a response. Prompt configuration tells the model to reply in the caller's detected language.
- Text-to-speech (TTS). Synthesises the agent's response in the correct language and voice — the agent does not just translate, it speaks with the right phonetics and cadence.
- Context retention. Conversation state (what was asked, what was answered, what information was collected) persists across language switches without a reset.
How does a code-switching call actually flow?
The following are illustrative examples of how mid-call language switching works in practice. They are constructed to show the mechanics — not transcripts of real recorded calls.
Example A: English to Hindi switch — clinic appointment
A caller dials a Dubai clinic. The agent opens in English (default): "Good afternoon, thank you for calling. How can I help you today?" The caller responds: "I need to book an appointment — aur doctor ki availability kya hai?" (mixing English and Hindi mid-sentence). The agent's language detection identifies the Hindi component and continues with: "Haan, hum aapki appointment book kar sakte hain. Aap kab available hain?" — completing the booking in Hindi without requiring the caller to re-explain or restart.
Example B: Arabic opener, English detail — real estate enquiry
A caller enquires about a property listing in Arabic: "Marhaba, ana muhtaj ma'lomat 'an al-shaqqah." The agent responds in Arabic. The caller then switches to English to specify: "I want the 2-bedroom in Business Bay, the one on the portal." The agent continues in English, collects the listing reference and the caller's preferred viewing slot, and ends by confirming in the language the caller last used.
In both examples, the key operational outcome is the same: the caller never has to press a number for a language option, wait to be transferred, or repeat themselves. The agent adapts. This matters for call completion and caller experience — a caller who hits a language wall mid-conversation is likely to hang up.
Which languages does the MAJ Leads voice agent support?
MAJ Leads' confirmed, deployed languages are: English, Arabic (Khaleeji-neutral MSA), Hindi, and Malayalam — with mid-call code-switching between them. Each language in this set reflects a real, large community among Dubai's residents.
| Language | Community in Dubai / UAE | Primary use case |
|---|---|---|
| English | Lingua franca across business and professional communities | Default; professional, real estate, cross-community enquiries |
| Arabic (Khaleeji-neutral MSA) | Official language; Emirati and wider Arab expatriate community | Government, healthcare, formal and local business contexts |
| Hindi | Indian community (largest single expatriate group in UAE) | Healthcare, retail, SME, hospitality |
| Malayalam | Malayali community; one of the UAE's largest expatriate groups, concentrated in Dubai | Healthcare, professional services, construction sector |
Note
Tagalog, Urdu, Filipino and other languages spoken by large communities in Dubai are not currently confirmed as live, tested production languages in MAJ's deployments. The underlying Vapi platform supports many additional languages and these can be scoped per deployment, but we do not represent them as available out of the box. Our post on Tagalog, Urdu and Malayalam receptionist demand in Dubai covers the market gap in more detail.
Why not just use a "press 1 for Arabic, 2 for English" IVR menu?
Traditional IVR (Interactive Voice Response) menus ask callers to self-select a language at the start of the call. This approach has three practical problems in Dubai's context. First, many callers do not self-identify with a single language — they switch depending on context, habit and which language feels more natural for a particular topic. Asking them to commit to one upfront does not reflect how they actually speak.
Second, language-menu IVRs add friction at the worst possible moment — before the caller has explained why they are calling. For time-sensitive enquiries (a clinic patient needing an urgent appointment, a property buyer who just saw a listing go live), that initial barrier increases hang-up rates.
Third, IVR menus do not handle mid-conversation switches. If a caller selected "Arabic" at the start but then switches to Hindi to describe a medical symptom, a traditional IVR offers no path forward. A code-switching agent simply continues. That is the practical difference between a menu-based system and genuine multilingual handling.
How does language data flow into the CRM and booking system?
When a call ends, the agent's detected language and the full conversation record are logged to the CRM via Make.com — MAJ's confirmed automation layer — in under 30 seconds. Supported CRMs include Dynamics 365, Zoho, HubSpot, Salesforce, Bitrix24, Pipedrive and Google Sheets. The language field on the lead record is populated automatically, which is useful for assigning the right human follow-up agent, personalising subsequent outreach, and analysing which languages are most common across a client's caller base.
For clinics using Cal.com or Healthsite for appointment scheduling, the booking is created in the same language session — the agent collects the patient's preferred time slot and writes it to the booking system without requiring a language switch for the confirmation step. The patient receives confirmation in the language the appointment was booked in.
Does handling multilingual inbound calls create any TDRA compliance considerations?
Inbound calls — calls initiated by the customer dialling your business number — are exempt from the UAE's outbound telemarketing obligations under Cabinet Resolution 56 of 2024. There is no DNCR screening requirement, no 09:00–18:00 calling window, and no prior TDRA approval needed to answer an inbound call — regardless of the language used. The language of the call does not change its compliance classification.
What does apply across all call types — inbound and outbound — is the requirement for call recording with caller notification, and data handling obligations under the UAE's Personal Data Protection Law (Federal Decree-Law No. 45 of 2021). MAJ-deployed agents notify callers that the call is being recorded. Call recordings are retained for the period required under UAE telecom law; we do not state a fixed statutory duration as it may vary by context and you should confirm the applicable retention period with your own legal adviser.
Legal caveat
How long does it take to configure a multilingual agent?
Standard onboarding for a MAJ Leads deployment is 14 business days; a rush configuration runs 5–7 business days. Language configuration is part of the initial setup — the agent is built from the start to handle the specific language mix relevant to the client's caller base. A clinic in Deira with a predominantly Hindi and Malayalam patient mix will have a different default language order than a commercial real estate brokerage in DIFC where English and Arabic are primary.
Pricing scales with call volume, languages, use cases and integration depth: AED 1,500 – 25,000+ per month. Language count alone does not determine the price point — the full scope of integrations, call handling logic and volume are the primary drivers. See our comparison post on AI voice agent cost in the UAE for a breakdown of what drives pricing at different tiers.
Sources
- UAE Government Official Fact Sheet — more than 200 nationalities (u.ae)
- Vapi Multilingual Support Documentation (official docs)
- Picovoice — Language Detection and Identification in 2026: Methods, Challenges, and Voice AI Integration
- UAE Cabinet Resolution 56 of 2024 — Telemarketing Regulation (official text)
- UAE Cabinet Resolution 57 of 2024 — Telemarketing Penalties (official text)
Frequently asked questions
What is code-switching in an AI voice agent?
Which languages does MAJ Leads' voice agent support?
Does the voice agent handle calls where the caller mixes two languages in one sentence?
Is the Arabic dialect Emirati or MSA?
Does handling multilingual inbound calls trigger TDRA telemarketing obligations?
How is the caller's language stored after the call?
Anam Jalal
Founder & CEO, MAJ Leads
Anam Jalal is the founder of MAJ Leads, a Dubai-based AI voice agent company deploying TDRA-compliant AI receptionists and callers for UAE clinics, brokerages and SMEs — working hands-on across UAE telephony and CRM integrations, from SIP provisioning to TDRA compliance configuration.
Read more about Anam →Related articles
Industry
Malayalam, Tagalog, Urdu: The Languages Your Dubai Receptionist Is Losing Leads In
Hundreds of thousands of Malayalam, Tagalog and Urdu speakers live and spend in Dubai. If your receptionist can't serve them in their language, your competitors pick up those leads. Here's how AI changes the equation.
Guide
Can an AI Voice Agent Speak Emirati Arabic? What Actually Works for UAE Callers
Can an AI voice agent handle Emirati Arabic callers? We break down the dialect landscape, what Khaleeji-neutral MSA Arabic actually sounds like in practice, and what to demand in a live demo.
