Echo on VoIP calls is when the speaker hears their own voice repeated back to them with a delay. The cause is usually one of three things — acoustic feedback, hybrid coupling at a TDM-to-IP boundary, or network jitter that defeats echo cancellation. Each has a different fix.
Echo in voice calls comes in three distinct forms, each with different mechanics:
The speaker's microphone picks up audio from their own speaker and transmits it back to the far end. The far end hears their own voice with a round-trip delay. Caused by speakerphone use, poor handset acoustics, or feedback paths in the room.
At a 4-wire to 2-wire conversion point (any TDM-to-IP boundary touching the PSTN), some of the outbound signal leaks into the inbound path due to imperfect impedance matching. The far end hears their own voice reflected from the hybrid. Common in PRI gateways, FXO ports, and SIP-to-PSTN bridges.
Echo cancellation depends on consistent timing. If RTP packets arrive with high jitter, late, or out of order, the canceller's reference signal does not align with the actual echo, and cancellation fails. The echo is not generated by the network — it is acoustic or hybrid in origin — but the network causes the canceller to fail.
Echo cancellation works by maintaining a model of the echo path. The canceller knows what audio it just sent (the reference signal). It compares incoming audio against a delayed version of the reference. Anything that matches is identified as echo and subtracted from the incoming stream.
For this to work, three conditions must hold:
Modern echo cancellers in handsets and SBCs handle round-trip delays up to about 128ms with good cancellation. Above that, residual echo becomes audible. Above 250ms, cancellation is essentially absent and the user hears full echo.
Acoustic echo originates at the user's environment. The speaker plays audio, the microphone picks it up, the audio travels back to the far end. The far end hears themselves.
Fixes are physical: switch to a headset, lower volume, move the microphone, treat the room, or use a phone with stronger AEC.
Hybrid echo happens at impedance-mismatched boundaries between IP and analog or TDM voice. The classic case is an FXO port connecting a SIP PBX to an analog phone line. The 4-wire IP side has separate transmit and receive paths; the 2-wire analog side has them combined. The transformer that bridges them (the “hybrid”) leaks some transmit signal into the receive direction.
How much leaks depends on impedance match. Perfect match would leak nothing; real-world line impedance varies and the match is always imperfect. The leaked signal travels back to the IP side as audible echo to whoever is on the IP end of the call.
Hybrid echo can be cancelled effectively when:
Most modern PRI/FXO gateways and SBCs include hardware or DSP-based echo cancellation specifically for hybrid echo. If echo persists at a known TDM boundary, check the gateway's EC settings, especially the tail length parameter.
Network echo is misnamed — the echo is acoustic or hybrid, but the network defeats the canceller. The mechanism:
Less directly, packet loss and reordering can cause the canceller to drift. The reference signal is lost or arrives out of order, the model becomes incorrect, and echo leaks until the model recovers.
Echo that worsens during network congestion, varies over a single call, or correlates with high jitter is network-induced. The fix is at the network level — reducing jitter, applying QoS, prioritizing RTP, or upgrading congested links.
Codec choice also matters. Codecs that handle packet loss gracefully (Opus with FEC, G.722 with PLC) preserve echo cancellation reference quality better than older codecs. G.729 is particularly bad in this regard because the heavy compression reduces the canceller's ability to distinguish signal from echo.
Echo diagnosis depends on identifying which side hears it and what triggers it:
For persistent echo, the classic isolation method is to swap one variable at a time: different phone, different headset, different network path, different codec. Whatever change eliminates echo localizes the source.
Echo on VoIP calls comes from three sources: acoustic feedback (speaker audio picked up by microphone), hybrid coupling at TDM-to-IP boundaries (analog/PRI gateways with imperfect impedance matching), and echo cancellation failures caused by network jitter or packet loss. Each type has a different fix — acoustic is usually a hardware or volume issue, hybrid is a gateway echo canceller setting, and network-induced echo is a QoS or congestion issue.
Landline echo is hybrid echo. At the IP-to-PSTN boundary, the FXO or PRI gateway converts between 4-wire (separate transmit and receive) and 2-wire (combined) audio paths. Imperfect impedance matching causes some of the outbound signal to leak back into the inbound path. The fix is on the gateway: enable hardware echo cancellation with an appropriate tail length (32 to 128ms depending on line characteristics).
Yes, indirectly. Network jitter and packet loss do not generate echo themselves, but they prevent the echo canceller from working correctly. The canceller needs stable timing to model the echo path; high jitter makes the model inaccurate and echo passes through uncancelled. Echo that varies during a call or worsens with congestion is usually network-induced. The fix is reducing jitter and applying QoS to RTP traffic.
Paste your SIP trace into SIPSymposium. The analyzer correlates RTP jitter, codec negotiation, and call quality metrics to help identify whether echo is acoustic, hybrid, or network-induced.