AI and VoIP Guide

SIP Trunking for AI Voice Agents

9 min read  ·  Updated April 2026

Every AI voice agent that makes or receives phone calls does it through a SIP trunk. Whether you are building on Vapi, Bland.ai, Retell, or a custom stack, the PSTN connectivity layer is SIP. Here is how to architect it correctly and fix the most common failures.

SIPSymposium is an independent platform not affiliated with or endorsed by any product or company mentioned in this guide.

In this guide

1. How AI voice agents connect to PSTN via SIP

An AI voice agent that answers or makes phone calls sits in the media path of a SIP call. The architecture has three layers:

  1. PSTN layer: A SIP trunk from a carrier (Twilio, Bandwidth, Telnyx, VoIP.ms) provides phone numbers and PSTN routing
  2. SIP media layer: A media server (built into the AI platform or your own FreeSWITCH/Asterisk) terminates the SIP call and feeds audio to the AI pipeline
  3. AI processing layer: STT (speech-to-text) converts incoming audio to text, an LLM generates responses, TTS (text-to-speech) converts responses to audio sent back via RTP

The SIP trunk connects layer 1 to layer 2. The AI platform handles the connection between layer 2 and 3. Your choice of SIP trunk provider, codec, and media architecture directly impacts both call quality and AI response latency.

; Typical AI voice agent call flow Caller dials your number -> Carrier (SIP trunk provider) -> INVITE to your AI platform SBC/media server -> RTP audio to AI media server -> Audio frames to STT engine -> Transcript to LLM -> LLM response to TTS -> Audio from TTS back to media server -> RTP back to carrier and caller

2. SIP connectivity on major AI voice platforms

PlatformSIP connectivity modelBYOC support
VapiTwilio or Vonage built-in, or BYOC SIP trunkYes — SIP URI termination
Bland.aiBuilt-in telephony + BYOCYes — custom SIP endpoint
Retell AIBuilt-in Twilio telephony + BYOCYes — custom SIP trunk
ElevenLabs ConversationalTwilio integration or SDKVia Twilio BYOC
Custom stackAny SIP trunk + FreeSWITCH/AsteriskFull control

BYOC (Bring Your Own Carrier) on AI platforms means you connect your own SIP trunk to the AI platform instead of using their bundled telephony. Benefits: lower per-minute costs, use existing carrier relationships, custom number inventory, better geographic coverage.

3. SIP trunk requirements for AI voice agents

AI voice platforms have specific SIP trunk requirements that differ from traditional PBX deployments:

Recommended carriers for AI voice

Twilio Elastic SIP Trunking, Bandwidth, and Telnyx are widely used with AI voice platforms. Bandwidth and Telnyx have lower per-minute rates than Twilio and offer competitive SIP trunking for high-volume AI deployments. VoIP.ms and Voip.ms work well for testing and lower volume.

4. Latency architecture for sub-second AI response

Perceived conversational latency in AI voice is the time from when the caller stops speaking to when they hear the AI start responding. Target under 1.5 seconds for natural conversation. The SIP/RTP layer contributes to this budget:

ComponentLatency contributionOptimization
RTP network (carrier to AI)10-50msColocate AI with carrier PoP
Audio buffering / ptime20-40msUse 20ms ptime, avoid buffering
Codec transcoding0-30msUse G.711 natively, no transcoding
STT end-of-utterance detection100-300msAggressive VAD, streaming STT
LLM first token200-800msSmaller models, streaming output
TTS first audio chunk50-200msStreaming TTS, sentence-level

The SIP/RTP layer (first three rows) should contribute under 100ms total. The AI processing layer dominates the latency budget. Optimizing the SIP layer: place your media server in the same data center region as your AI inference, use G.711 to eliminate transcoding, and minimize buffering.

5. Common SIP issues on AI voice platforms

Issue 01
Calls connect but AI does not respond
RTP not reaching the AI media server. Check firewall rules allow UDP on the RTP port range. Verify the SDP c= line contains the correct public IP. Common with BYOC where the AI platform sends RTP to a private IP in the SDP.
Issue 02
AI hears its own voice (echo loop)
The AI TTS audio is being captured by the STT engine, creating a feedback loop. Implement acoustic echo cancellation or mute STT input while TTS is playing. Most AI voice SDKs have barge-in handling for this — check that it is enabled.
Issue 03
Calls drop after 30-60 seconds
RTP inactivity timeout during AI processing gaps (when AI is thinking, no audio is sent). Enable RTP keepalives or comfort noise generation to maintain RTP flow during silence. Set RTP timeout on PBX to at least 60 seconds.
Issue 04
Poor STT accuracy
Audio quality issue in the RTP path. Check for packet loss and jitter between carrier and AI media server — even 1% loss significantly impacts STT accuracy. Verify codec is G.711 and no transcoding is occurring. Check for network congestion on the media path.

6. Monitoring AI voice call quality

AI voice deployments need monitoring at both the SIP/RTP layer and the AI layer:

SIP/RTP layer metrics

AI layer metrics

; Capture AI voice call PCAP for analysis tcpdump -i eth0 -w /tmp/ai-call.pcap udp portrange 10000-20000 ; Extract RTCP stats tshark -r ai-call.pcap -Y rtcp -T fields -e rtcp.ssrc -e rtcp.fraction_lost -e rtcp.inter_arrival_jitter

Frequently asked questions

How do AI voice agents connect to phone networks?

AI voice agents connect to phone networks (PSTN) via SIP trunks from carriers like Twilio, Bandwidth, or Telnyx. The carrier routes calls to the AI platform via SIP INVITE. The AI platform terminates the SIP call, receives RTP audio, and feeds it through an STT-LLM-TTS pipeline. The synthesized audio is sent back via RTP to the carrier and ultimately to the caller.

What SIP trunk should I use for AI voice agents?

For AI voice agents, use G.711 codec to avoid transcoding overhead, choose a carrier with media servers geographically close to your AI inference infrastructure, and select a provider with elastic concurrent call capacity. Twilio Elastic SIP Trunking, Bandwidth, and Telnyx are popular choices. For BYOC on platforms like Vapi or Retell, verify the platform supports your carrier format for the INVITE Request-URI.

Why do AI voice calls drop after 30 seconds?

AI voice calls drop after 30 seconds when RTP keepalives are not configured. During AI processing gaps (silence while the AI is generating a response), no audio is sent and the carrier or intermediate device times out the RTP stream. Enable RTP keepalives or comfort noise generation to send continuous low-level audio during silence. Set RTP inactivity timeout to at least 60 seconds on your PBX or media server.

Troubleshooting SIP issues in your AI voice deployment?

Capture RTP from your AI media server and upload to SIPSymposium. The analyzer measures packet loss, jitter, codec negotiation, and RTP timing issues that affect AI voice agent performance.

Analyze my trace Create free account
Related guides