Transcoding is the operation of decoding RTP audio from one codec and re-encoding it in another. It enables interop between endpoints that don't share a codec but is computationally expensive, adds latency, and degrades audio quality. Most production designs try to minimize transcoding rather than treat it as routine.
Transcoding is converting audio from one codec to another in real time. The transcoder:
This happens in the middle of a call, in real time, for every audio packet. A typical call sends 50 RTP packets per second per direction. A transcoded call has 100 transcode operations per second per call (50 per direction).
Transcoding is performed by SBCs, media gateways, PBX media servers, or specialized transcoding farms. It cannot happen at SIP proxies (which do not touch RTP). Endpoints rarely transcode — they speak one codec at a time and rely on the network to bridge to other codecs if needed.
The most common case. One endpoint speaks G.722, the other speaks G.711. Without transcoding, the call fails with 488 Not Acceptable Here. With transcoding, both sides speak their preferred codec and the SBC converts in the middle.
The PSTN (and most carrier interconnects) use G.711. Modern endpoints often prefer G.722 or Opus. The PSTN gateway transcodes between the IP-side codec and G.711 for the TDM side.
Endpoints may speak G.711 internally but the WAN link only has bandwidth for G.729. The SBC at the WAN edge transcodes G.711 to G.729 outbound and G.729 to G.711 inbound. Both endpoints think they are using G.711.
Calls may be recorded in a codec different from the live call — often Opus or wideband for archive quality, regardless of the live codec. Transcoding occurs as part of the recording pipeline.
A conference bridge mixes audio from multiple participants. Each participant may use a different codec. The bridge decodes all to PCM, mixes, and re-encodes per participant's codec. Every participant's stream is transcoded.
Transcoding is one of the most CPU-intensive operations in VoIP. Approximate cost per session per direction (G.711 to G.729 transcoding):
| Codec pair | Approx CPU per session | Notes |
|---|---|---|
| G.711 to G.711 (no transcode) | ~0.1% | Pass-through baseline |
| G.711 to G.722 | ~1-2% | Cheapest transcode |
| G.711 to G.729 | ~3-5% | G.729 encoding is the bottleneck |
| G.711 to Opus | ~3-5% | Comparable to G.729 |
| G.729 to Opus (both encode) | ~6-8% | Both codecs are computationally heavy |
These are rough numbers and vary by hardware, software, and audio characteristics. Hardware DSPs can do orders of magnitude more sessions than CPU-based transcoding.
Each transcoding step adds latency. Typical figures:
Two transcoding hops (e.g., G.711 to Opus, then Opus to G.711) add 60-100 ms one-way, which is large enough to be perceptible.
Each lossy codec encode-decode cycle loses information. G.711-to-G.729-to-G.711 produces noticeably worse audio than G.711-to-G.711 (no transcoding). The loss is greatest when both endpoints use compressed codecs and the middle path uses a different one.
Tandem coding — multiple lossy codec hops in series — is the worst case. Quality degrades multiplicatively, not just additively.
Design strategies to minimize transcoding:
If interop and quality are more important than bandwidth, default to G.711 everywhere. PSTN bridging is direct; IP-to-IP works without transcoding; quality is consistently good. Bandwidth cost is the tradeoff.
If your endpoints offer Opus, G.722, G.711 in priority order, and your trunks accept the same set, the negotiated codec is whatever both sides agree on — usually no transcoding needed. The PSTN boundary still requires it but everything else flows through.
An SBC at the PSTN boundary handles the only required transcoding. Internal calls and trunk-to-trunk paths bypass the SBC media path and avoid transcoding.
If your phones prefer G.722 but your SBC normalizes to G.711, every call transcodes. Either align the phones with the SBC's preference or update the SBC to pass G.722 through.
For calls between two IP endpoints, allow direct media (RTP flows directly between endpoints, not through the SBC). This eliminates SBC transcoding entirely. Direct media has tradeoffs (less topology hiding, harder NAT) but eliminates a major source of transcoding.
If transcoding is necessary, capacity planning matters:
A modern server CPU can transcode 100-500 simultaneous sessions of G.711 to G.729 (or similar pairs). Higher-quality codecs (Opus wideband) reduce capacity. CPU is the binding constraint for software transcoding.
SBCs with DSP cards (AudioCodes, Ribbon, Cisco) can transcode 1000-10000 sessions per card. Cost-effective for high-density deployments. Capacity is mostly limited by DSP licenses and channel counts.
Plan for 2x peak transcoding capacity. Burst calling patterns (busy hour, conference start times) and codec failover scenarios can spike transcoding load above steady state.
Transcoded audio quality degrades under CPU pressure. Run load tests at 80% capacity to verify quality holds. Quality at 100% capacity is the practical max, but operating there leaves no headroom for spikes.
Transcoding is the operation of decoding RTP audio from one codec and re-encoding it in another in real time. It is performed by SBCs, media gateways, or PBX media servers when two endpoints don't share a common codec, when bridging between IP and PSTN, or when network bandwidth requires a different codec on the WAN than on the LAN. Each direction of audio is transcoded separately.
Transcoding requires decoding compressed audio to PCM and re-encoding to a different format, in real time, for every RTP packet. A typical call has 50 packets per second per direction. CPU cost per session ranges from 1-2% (cheap pairs like G.711 to G.722) to 6-8% (expensive pairs like G.729 to Opus). Transcoding also adds 30-50 ms of latency per hop and degrades audio quality through tandem coding losses.
Standardize on G.711 across the deployment if quality matters more than bandwidth — G.711 bridges directly to PSTN and works between any IP endpoints. Match codec preference lists across phones, PBXes, and SBCs so negotiation reaches the same codec without conversion. Use direct media for IP-to-IP calls so RTP bypasses the SBC entirely. Concentrate transcoding at PSTN boundaries only, where it is unavoidable.
Paste your SIP trace into SIPSymposium. The analyzer identifies codec changes between SDP segments, detects transcoding hops, and correlates them with quality metrics like MOS and packet loss.