Skip to content

Industry

Building a Voice Agent That Switches Arabic, English and Hindi Mid-Call

Anam Jalal

Founder & CEO, MAJ Leads

Updated 2 Jun 2026 · 11 min read

Building a Voice Agent That Switches Arabic, English and Hindi Mid-Call

Quick answer

A code-switching voice agent detects the caller's language from the first spoken words — analysing phoneme patterns and acoustic cues — then continues the full conversation in that language without a restart. For Dubai's mixed-language callers, MAJ Leads deploys agents confirmed for English, Arabic (Khaleeji-neutral MSA), Hindi and Malayalam, with seamless mid-call switching between them.

Why does code-switching matter for Dubai callers specifically?

Dubai is home to residents of more than 200 nationalities, according to the UAE Government's official fact sheet. The city's expatriate population accounts for the vast majority of residents, and the daily reality of that diversity is that callers rarely stay in a single language for an entire conversation. A business owner might open with "Hello, are you available?" and then, once the call is flowing, switch to Hindi to explain a complex requirement. A patient calling a clinic might start in Arabic and flip to English when giving their Emirates ID number.

This linguistic mixing is not a communication failure — it is a natural feature of multilingual communities. Linguists call it code-switching: the practice of alternating between two or more languages within a single conversation, sometimes mid-sentence. In Dubai, the most common switches are between English and Arabic, English and Hindi, and Hindi and Malayalam. For an AI receptionist, not handling this means the call breaks the moment the caller drifts out of the agent's expected language — which, in practice, is almost every call.

The scale of the relevant language communities in Dubai is significant. Speakers of Hindi and related South Asian languages form the largest expatriate grouping, numbering in the millions across the UAE; Malayali speakers number in the hundreds of thousands, concentrated heavily in Dubai. Arabic is the official language of the country, and English is the lingua franca in business contexts. A voice agent that can only handle one of these has, by definition, limited reach in this market. You can read more about the specific patient mix that makes multilingual handling essential in healthcare in our post on Arabic, Hindi and Malayalam AI receptionists for UAE clinics.

What is code-switching inside a voice agent — technically?

Code-switching in a voice AI pipeline is more than translating one phrase at a time. It requires the system to maintain conversation context across language boundaries — so that when a caller switches from English to Hindi mid-way through giving their name and appointment date, the agent does not lose what was said before the switch, does not ask the caller to start over, and does not revert to a default language.

The mechanism begins with spoken language identification (LID): an acoustic model analyses the first words of the call — examining phoneme patterns, rhythm and prosody — to determine which language is being spoken, without waiting for a full transcription. According to Picovoice's 2026 technical overview of language detection, spoken language identification can operate within a few seconds of audio, returning a language code and confidence score that feeds downstream speech recognition, voice synthesis and logic layers.

On the Vapi platform that MAJ Leads uses as its confirmed voice infrastructure, multilingual support is configured at the transcriber level. Speech-to-text providers including Deepgram (Nova 2/Nova 3 with multi-language mode) and Gladia are documented in Vapi's multilingual documentation as supporting automatic language detection and mid-call language switching — maintaining conversation state across the switch. The voice synthesis layer is configured to match, so the agent responds in the same language the caller just used.

  • Language identification layer. Analyses the caller's speech acoustics within the first few seconds. Returns a language code and confidence score, routing the audio to the correct speech-to-text model.
  • Speech-to-text (STT). Transcribes the caller's words in the detected language. Multi-language STT models handle switches within a single utterance.
  • LLM (language model) layer. Processes the transcribed text and generates a response. Prompt configuration tells the model to reply in the caller's detected language.
  • Text-to-speech (TTS). Synthesises the agent's response in the correct language and voice — the agent does not just translate, it speaks with the right phonetics and cadence.
  • Context retention. Conversation state (what was asked, what was answered, what information was collected) persists across language switches without a reset.

How does a code-switching call actually flow?

The following are illustrative examples of how mid-call language switching works in practice. They are constructed to show the mechanics — not transcripts of real recorded calls.

Example A: English to Hindi switch — clinic appointment

A caller dials a Dubai clinic. The agent opens in English (default): "Good afternoon, thank you for calling. How can I help you today?" The caller responds: "I need to book an appointment — aur doctor ki availability kya hai?" (mixing English and Hindi mid-sentence). The agent's language detection identifies the Hindi component and continues with: "Haan, hum aapki appointment book kar sakte hain. Aap kab available hain?" — completing the booking in Hindi without requiring the caller to re-explain or restart.

Example B: Arabic opener, English detail — real estate enquiry

A caller enquires about a property listing in Arabic: "Marhaba, ana muhtaj ma'lomat 'an al-shaqqah." The agent responds in Arabic. The caller then switches to English to specify: "I want the 2-bedroom in Business Bay, the one on the portal." The agent continues in English, collects the listing reference and the caller's preferred viewing slot, and ends by confirming in the language the caller last used.

In both examples, the key operational outcome is the same: the caller never has to press a number for a language option, wait to be transferred, or repeat themselves. The agent adapts. This matters for call completion and caller experience — a caller who hits a language wall mid-conversation is likely to hang up.

Which languages does the MAJ Leads voice agent support?

MAJ Leads' confirmed, deployed languages are: English, Arabic (Khaleeji-neutral MSA), Hindi, and Malayalam — with mid-call code-switching between them. Each language in this set reflects a real, large community among Dubai's residents.

MAJ Leads confirmed language support
LanguageCommunity in Dubai / UAEPrimary use case
EnglishLingua franca across business and professional communitiesDefault; professional, real estate, cross-community enquiries
Arabic (Khaleeji-neutral MSA)Official language; Emirati and wider Arab expatriate communityGovernment, healthcare, formal and local business contexts
HindiIndian community (largest single expatriate group in UAE)Healthcare, retail, SME, hospitality
MalayalamMalayali community; one of the UAE's largest expatriate groups, concentrated in DubaiHealthcare, professional services, construction sector

Note

On Arabic dialect: MAJ's Arabic is Khaleeji-neutral Modern Standard Arabic — broadly understood across Gulf Arabic speakers. We do not claim perfect replication of every Emirati dialectal variation. Read more about the Arabic nuances in our post on AI voice agents and Emirati Arabic.

Tagalog, Urdu, Filipino and other languages spoken by large communities in Dubai are not currently confirmed as live, tested production languages in MAJ's deployments. The underlying Vapi platform supports many additional languages and these can be scoped per deployment, but we do not represent them as available out of the box. Our post on Tagalog, Urdu and Malayalam receptionist demand in Dubai covers the market gap in more detail.

Why not just use a "press 1 for Arabic, 2 for English" IVR menu?

Traditional IVR (Interactive Voice Response) menus ask callers to self-select a language at the start of the call. This approach has three practical problems in Dubai's context. First, many callers do not self-identify with a single language — they switch depending on context, habit and which language feels more natural for a particular topic. Asking them to commit to one upfront does not reflect how they actually speak.

Second, language-menu IVRs add friction at the worst possible moment — before the caller has explained why they are calling. For time-sensitive enquiries (a clinic patient needing an urgent appointment, a property buyer who just saw a listing go live), that initial barrier increases hang-up rates.

Third, IVR menus do not handle mid-conversation switches. If a caller selected "Arabic" at the start but then switches to Hindi to describe a medical symptom, a traditional IVR offers no path forward. A code-switching agent simply continues. That is the practical difference between a menu-based system and genuine multilingual handling.

How does language data flow into the CRM and booking system?

When a call ends, the agent's detected language and the full conversation record are logged to the CRM via Make.com — MAJ's confirmed automation layer — in under 30 seconds. Supported CRMs include Dynamics 365, Zoho, HubSpot, Salesforce, Bitrix24, Pipedrive and Google Sheets. The language field on the lead record is populated automatically, which is useful for assigning the right human follow-up agent, personalising subsequent outreach, and analysing which languages are most common across a client's caller base.

For clinics using Cal.com or Healthsite for appointment scheduling, the booking is created in the same language session — the agent collects the patient's preferred time slot and writes it to the booking system without requiring a language switch for the confirmation step. The patient receives confirmation in the language the appointment was booked in.

Does handling multilingual inbound calls create any TDRA compliance considerations?

Inbound calls — calls initiated by the customer dialling your business number — are exempt from the UAE's outbound telemarketing obligations under Cabinet Resolution 56 of 2024. There is no DNCR screening requirement, no 09:00–18:00 calling window, and no prior TDRA approval needed to answer an inbound call — regardless of the language used. The language of the call does not change its compliance classification.

What does apply across all call types — inbound and outbound — is the requirement for call recording with caller notification, and data handling obligations under the UAE's Personal Data Protection Law (Federal Decree-Law No. 45 of 2021). MAJ-deployed agents notify callers that the call is being recorded. Call recordings are retained for the period required under UAE telecom law; we do not state a fixed statutory duration as it may vary by context and you should confirm the applicable retention period with your own legal adviser.

Legal caveat

Compliance note: If your business uses multilingual AI callers for outbound follow-up (re-engaging leads who did not initiate contact), those calls are subject to the full TDRA telemarketing framework: DNCR screening before every dial, calls only within 09:00–18:00, prior TDRA approval for campaigns, and a registered caller ID. See our full guide at TDRA-compliant AI cold calling in the UAE. Verify penalty amounts under Cabinet Resolution 57 of 2024 against the official text and your own legal counsel.

How long does it take to configure a multilingual agent?

Standard onboarding for a MAJ Leads deployment is 14 business days; a rush configuration runs 5–7 business days. Language configuration is part of the initial setup — the agent is built from the start to handle the specific language mix relevant to the client's caller base. A clinic in Deira with a predominantly Hindi and Malayalam patient mix will have a different default language order than a commercial real estate brokerage in DIFC where English and Arabic are primary.

Pricing scales with call volume, languages, use cases and integration depth: AED 1,500 – 25,000+ per month. Language count alone does not determine the price point — the full scope of integrations, call handling logic and volume are the primary drivers. See our comparison post on AI voice agent cost in the UAE for a breakdown of what drives pricing at different tiers.

Sources

Frequently asked questions

What is code-switching in an AI voice agent?
Code-switching is the ability of an AI voice agent to detect when a caller shifts from one language to another mid-conversation, and continue the conversation in the new language without requiring a restart or a menu selection. The agent maintains full conversation context across the language switch.
Which languages does MAJ Leads' voice agent support?
MAJ Leads' confirmed, deployed languages are English, Arabic (Khaleeji-neutral MSA), Hindi and Malayalam, with mid-call switching between them. Additional languages may be configurable on a per-deployment basis via the Vapi platform, but Tagalog, Urdu and other languages are not currently confirmed as live production languages.
Does the voice agent handle calls where the caller mixes two languages in one sentence?
Yes. The language detection layer identifies the dominant or most recent language in the caller's speech and the speech-to-text model handles mixed utterances. The agent responds in the language the caller has most recently used, without requiring the caller to stay in a single language throughout.
Is the Arabic dialect Emirati or MSA?
MAJ's Arabic is Khaleeji-neutral Modern Standard Arabic (MSA) — broadly understood across Gulf Arabic speakers. We do not claim perfect replication of every Emirati dialectal variation. MSA is well understood by Emirati and wider Arab expatriate callers in a business context.
Does handling multilingual inbound calls trigger TDRA telemarketing obligations?
No. Inbound calls — where the customer dials your number — are exempt from the UAE's outbound telemarketing obligations under Cabinet Resolution 56 of 2024, regardless of the language used. Outbound follow-up calls to leads who did not initiate contact are subject to the full TDRA framework.
How is the caller's language stored after the call?
The detected language is logged to the CRM via Make.com in under 30 seconds, alongside the full call record. This enables language-based routing of human follow-up, personalised outreach and caller-base analysis.

Anam Jalal

Founder & CEO, MAJ Leads

Anam Jalal is the founder of MAJ Leads, a Dubai-based AI voice agent company deploying TDRA-compliant AI receptionists and callers for UAE clinics, brokerages and SMEs — working hands-on across UAE telephony and CRM integrations, from SIP provisioning to TDRA compliance configuration.

Read more about Anam

Related articles