SIP itself does not negotiate codecs. SDP (Session Description Protocol) does, using the offer-answer model defined in RFC 3264. The mechanics are simple but the edge cases — codec ordering, payload type collisions, fmtp parameters, multiple m= lines — are where most interop failures live.
RFC 3264 defines a simple two-step protocol for negotiating media:
In SIP, the offer is usually in the INVITE body, and the answer is in the 200 OK body. But offer-answer is decoupled from the SIP request type — an offer can be in an INVITE, a 183 Session Progress, an UPDATE, or even an empty INVITE answered by 200 OK with the offer (a special pattern).
The offer lists what the offerer supports, in preferred order. The answer lists what was actually selected from the offer. The answer must be a subset of the offer — the answerer cannot add codecs the offerer did not propose. The result is the intersection of capabilities, biased toward the offerer's preferences.
An SDP offer typically looks like this:
The key parts:
The offer should list codecs the offerer can actually use, in order of preference. Including codecs that are merely supported but not preferred is a common practice but creates risk: the answerer might pick a codec the offerer would have preferred not to use.
The answerer constructs an SDP that responds to the offer. Rules:
A typical answer to the offer above:
The answerer chose PCMU and telephone-event 101. Other codecs from the offer (PCMA, G722, G729) were not selected. After the offer-answer exchange, the agreed media is PCMU plus DTMF via RFC 4733. RTP flows on port 49170 in one direction and 53000 in the other.
If the answerer cannot accept any of the offered codecs, the response is 488 Not Acceptable Here. See the SIP 488 guide for the codec mismatch case in detail.
RTP payload types are 7-bit values (0-127) that identify the codec used in each RTP packet. They are split into static and dynamic ranges:
Reserved by IANA for specific codecs. For example: 0=PCMU, 8=PCMA, 9=G722, 18=G729. Both sides know what these mean by default. rtpmap is technically not required for static types but is often included for clarity.
Free for any codec. The offerer assigns a number in this range and uses rtpmap to declare what codec it represents. Common dynamic codecs include Opus, AMR-WB, and telephone-event (RFC 4733 DTMF).
If side A uses payload type 96 for Opus and side B uses payload type 111 for Opus, both sides must understand the rtpmap declarations to decode each other's RTP. Misreading rtpmap leads to receiving RTP that decodes as garbage — the codec names match but the payload types in the packets don't match the rtpmap.
Most modern stacks handle this correctly. Older or simpler stacks sometimes hard-code payload type assumptions and ignore rtpmap, leading to silent decode failure when the other side uses a different number.
fmtp lines carry codec-specific configuration. They can be a source of subtle interop issues.
a=fmtp:101 0-16 declares supported DTMF events. 0-9 are digits, 10 is *, 11 is #, 12-15 are A-D, 16 is the flash event. Some stacks send 0-15 (no flash) or 0-11 (digits only). If one side sends event 12 (A) and the other only declared support up to 11, the event is dropped silently.
Opus has many fmtp parameters: maxplaybackrate, maxaveragebitrate, stereo, useinbandfec, usedtx, sprop-maxcapturerate. Mismatched fmtp can lead to suboptimal but functional audio. More serious: some implementations refuse calls if the offered fmtp is unacceptable.
G.729 has an optional silence suppression annex (annexb). a=fmtp:18 annexb=yes enables it; annexb=no disables. If one side requires annexb=yes and the other declares annexb=no, the codec list intersection succeeds but actual interop fails. Many endpoints handle this poorly.
SIP itself does not negotiate codecs — SDP does, using the offer-answer model from RFC 3264. The offerer (typically the INVITE sender) lists supported codecs in their SDP body. The answerer (typically the 200 OK sender) responds with the codecs they selected from the offer, in the same order as the offer's m= lines. The intersection of supported codecs determines what is actually used. If no overlap exists, the answer is 488 Not Acceptable Here.
Static payload types (0-95) are reserved by IANA for specific codecs — for example 0=PCMU, 8=PCMA, 9=G722, 18=G729. Both sides know what they mean without explicit declaration. Dynamic payload types (96-127) are free for any codec; the offerer assigns a number and declares what it represents using a=rtpmap. Both sides must read rtpmap to interpret dynamic types correctly.
Codec name agreement does not guarantee functional audio. Common causes of decode failure despite successful negotiation include payload type mismatches (one side hard-codes assumptions), fmtp parameter disagreements (G.729 annex B is the classic example), rtpmap parsing differences, and clock rate confusion. Check fmtp lines on both sides and verify rtpmap declarations match between offer and answer.
Paste your SIP trace into SIPSymposium. The analyzer parses SDP offer-answer exchanges, identifies codec mismatches, fmtp differences, and the negotiation failures that produce 488 Not Acceptable Here.