Interop

Transcoding in VoIP

5 min read  ·  Updated April 2026

Transcoding is the operation of decoding RTP audio from one codec and re-encoding it in another. It enables interop between endpoints that don't share a codec but is computationally expensive, adds latency, and degrades audio quality. Most production designs try to minimize transcoding rather than treat it as routine.

In this guide

1. What transcoding is

Transcoding is converting audio from one codec to another in real time. The transcoder:

  1. Receives RTP packets encoded with codec A
  2. Decodes them to raw PCM audio
  3. Re-encodes the PCM audio with codec B
  4. Packages the result as RTP packets
  5. Forwards them to the destination

This happens in the middle of a call, in real time, for every audio packet. A typical call sends 50 RTP packets per second per direction. A transcoded call has 100 transcode operations per second per call (50 per direction).

Transcoding is performed by SBCs, media gateways, PBX media servers, or specialized transcoding farms. It cannot happen at SIP proxies (which do not touch RTP). Endpoints rarely transcode — they speak one codec at a time and rely on the network to bridge to other codecs if needed.

2. When transcoding is needed

Codec mismatch between endpoints

The most common case. One endpoint speaks G.722, the other speaks G.711. Without transcoding, the call fails with 488 Not Acceptable Here. With transcoding, both sides speak their preferred codec and the SBC converts in the middle.

PSTN bridging

The PSTN (and most carrier interconnects) use G.711. Modern endpoints often prefer G.722 or Opus. The PSTN gateway transcodes between the IP-side codec and G.711 for the TDM side.

Network bandwidth optimization

Endpoints may speak G.711 internally but the WAN link only has bandwidth for G.729. The SBC at the WAN edge transcodes G.711 to G.729 outbound and G.729 to G.711 inbound. Both endpoints think they are using G.711.

Recording at a different codec

Calls may be recorded in a codec different from the live call — often Opus or wideband for archive quality, regardless of the live codec. Transcoding occurs as part of the recording pipeline.

Conference mixing

A conference bridge mixes audio from multiple participants. Each participant may use a different codec. The bridge decodes all to PCM, mixes, and re-encodes per participant's codec. Every participant's stream is transcoded.

3. The costs of transcoding

CPU cost

Transcoding is one of the most CPU-intensive operations in VoIP. Approximate cost per session per direction (G.711 to G.729 transcoding):

Codec pairApprox CPU per sessionNotes
G.711 to G.711 (no transcode)~0.1%Pass-through baseline
G.711 to G.722~1-2%Cheapest transcode
G.711 to G.729~3-5%G.729 encoding is the bottleneck
G.711 to Opus~3-5%Comparable to G.729
G.729 to Opus (both encode)~6-8%Both codecs are computationally heavy

These are rough numbers and vary by hardware, software, and audio characteristics. Hardware DSPs can do orders of magnitude more sessions than CPU-based transcoding.

Latency

Each transcoding step adds latency. Typical figures:

Two transcoding hops (e.g., G.711 to Opus, then Opus to G.711) add 60-100 ms one-way, which is large enough to be perceptible.

Quality loss

Each lossy codec encode-decode cycle loses information. G.711-to-G.729-to-G.711 produces noticeably worse audio than G.711-to-G.711 (no transcoding). The loss is greatest when both endpoints use compressed codecs and the middle path uses a different one.

Tandem coding — multiple lossy codec hops in series — is the worst case. Quality degrades multiplicatively, not just additively.

4. Avoiding transcoding when possible

Design strategies to minimize transcoding:

Standardize on G.711

If interop and quality are more important than bandwidth, default to G.711 everywhere. PSTN bridging is direct; IP-to-IP works without transcoding; quality is consistently good. Bandwidth cost is the tradeoff.

Match codec lists across the deployment

If your endpoints offer Opus, G.722, G.711 in priority order, and your trunks accept the same set, the negotiated codec is whatever both sides agree on — usually no transcoding needed. The PSTN boundary still requires it but everything else flows through.

Use SBCs strategically

An SBC at the PSTN boundary handles the only required transcoding. Internal calls and trunk-to-trunk paths bypass the SBC media path and avoid transcoding.

Avoid asymmetric codec preferences

If your phones prefer G.722 but your SBC normalizes to G.711, every call transcodes. Either align the phones with the SBC's preference or update the SBC to pass G.722 through.

Use direct media when possible

For calls between two IP endpoints, allow direct media (RTP flows directly between endpoints, not through the SBC). This eliminates SBC transcoding entirely. Direct media has tradeoffs (less topology hiding, harder NAT) but eliminates a major source of transcoding.

5. Sizing for transcoding

If transcoding is necessary, capacity planning matters:

Software-based transcoding

A modern server CPU can transcode 100-500 simultaneous sessions of G.711 to G.729 (or similar pairs). Higher-quality codecs (Opus wideband) reduce capacity. CPU is the binding constraint for software transcoding.

Hardware DSP-based transcoding

SBCs with DSP cards (AudioCodes, Ribbon, Cisco) can transcode 1000-10000 sessions per card. Cost-effective for high-density deployments. Capacity is mostly limited by DSP licenses and channel counts.

Headroom

Plan for 2x peak transcoding capacity. Burst calling patterns (busy hour, conference start times) and codec failover scenarios can spike transcoding load above steady state.

Testing under load

Transcoded audio quality degrades under CPU pressure. Run load tests at 80% capacity to verify quality holds. Quality at 100% capacity is the practical max, but operating there leaves no headroom for spikes.

Frequently asked questions

What is transcoding in VoIP?

Transcoding is the operation of decoding RTP audio from one codec and re-encoding it in another in real time. It is performed by SBCs, media gateways, or PBX media servers when two endpoints don't share a common codec, when bridging between IP and PSTN, or when network bandwidth requires a different codec on the WAN than on the LAN. Each direction of audio is transcoded separately.

Why is VoIP transcoding expensive?

Transcoding requires decoding compressed audio to PCM and re-encoding to a different format, in real time, for every RTP packet. A typical call has 50 packets per second per direction. CPU cost per session ranges from 1-2% (cheap pairs like G.711 to G.722) to 6-8% (expensive pairs like G.729 to Opus). Transcoding also adds 30-50 ms of latency per hop and degrades audio quality through tandem coding losses.

How can I avoid transcoding in VoIP?

Standardize on G.711 across the deployment if quality matters more than bandwidth — G.711 bridges directly to PSTN and works between any IP endpoints. Match codec preference lists across phones, PBXes, and SBCs so negotiation reaches the same codec without conversion. Use direct media for IP-to-IP calls so RTP bypasses the SBC entirely. Concentrate transcoding at PSTN boundaries only, where it is unavoidable.

Diagnosing transcoding-related call quality issues?

Paste your SIP trace into SIPSymposium. The analyzer identifies codec changes between SDP segments, detects transcoding hops, and correlates them with quality metrics like MOS and packet loss.

Analyze my trace Create free account
Related guides