Skip to main content

Phantom calls

Dead calls, zombie calls, or phantom calls, are calls which linger in the system without any users present in the call or otherwise benefiting. Generally, the endpoints of the call are unavailable/unresponsive.

Dead calls may occur in the system if call peers (user endpoints) become disconnected or terminated without a chance to signal their change of status through normal means (e.g. a SIP BYE message in the ongoing SIP session). In such cases, the system may take some time to realize the endpoint is not available in the call. If a single endpoint remains available in a call, the user behind that endpoint may be waiting for the call peer to come back, or for the call to terminate. If no endpoints remain available in the call (e.g. all endpoints were disconnected from a common network issue), then the call's system resources may persist until the system recognizes the change in state of the endpoints.

Various settings may help control the handling of such situations.

  • Asterisk channel timeout
  • SIP transport timeouts
  • SIP global timers
  • SIP endpoints RTP timeouts
  • SIP endpoints session timers

Settings

Asterisk channel timeout

An absolute timeout on maximum call duration can be defined by way of the Asterisk's TIMEOUT(absolute) channel setting. This provides an unconditional maximum duration for all calls. By default, user endpoints are set up with a TIMEOUT(absolute) value of 36000, or 10 hours, through a set_var pjsip endpoint option. This setting may be adjusted in the default SIP templates, or by defining a new SIP template and overriding this setting.

SIP transport timeouts

SIP transports have timeout configurations depending on the transport type.

For TCP-based timeouts, including websocket transports for webrtc, TCP keepalive settings may be useful to detect broken connections.

# curl --header 'Content-Type: application/json' -H 'X-Auth-Token:'$TOKEN -XPUT 'https://wazo.example.com/api/confd/1.1/sip/transports/{transport_uuid} -d
{
"name": "transport-tcp",
"options": [
[
"protocol",
"tcp"
],
[
"tcp_keepalive_enable",
"yes"
],
[
"tcp_keepalive_time",
"60"
],
[
"tcp_keepalive_intvl",
"10"
],
[
"tcp_keepalive_probes",
"3"
]
]
}

This example enables keepalives on a SIP TCP transport. The first keepalive probe would be sent 60 seconds after the last data packet was received, and the probe would be sent every 10 seconds up to 3 times before the connection is considered dead and is terminated.

For websocket transports (e.g. transport-wss for webrtc), the pjsip transport setting websocket_write_timeout may also be useful.
Its value is in milliseconds. A lower value means dead websocket connections are terminated faster, but also means slow connections/slow processing endpoints may get disconnected.

SIP global timers

The SIP protocol specification relies on configurable timers to define timeouts for SIP transactions. These timer settings are global, affecting all endpoints.

timer_t1 is a reference value used to compute other timers, and should be close to the average network roundtrip time of SIP endpoints. This timer is only used directly for UDP transports.

$ ping <sip endpoint IP>
...
8 packets transmitted, 8 received, 0% packet loss, time 7011ms
rtt min/avg/max/mdev = 26.547/27.170/27.623/0.332 ms

Here, an endpoint's average latency is 27ms. The minimal value for timer_t1 is 100ms, but the default value is 500ms. Deployments in such low latency network conditions should benefit from having timer_t1 set to 100ms.

timer_b is the timer used as the timeout for INVITE transactions. It is conventionally set to 64*timer_t1. Assuming timer_t1 is 100ms, timer_b would be 6400ms. This means a SIP INVITE request (call attempt) will fail after 6.4 seconds.

SIP RTP timeouts

RTP timeout settings on sip endpoints define expectations on the continuous flow of RTP traffic.

Some endpoints may stop sending RTP packets in some conditions, such as when put on mute. In this case, a low rtp_timeout may cause those endpoints to be disconnected undesirably.

However, if SIP endpoints in use usually keep sending RTP packets in all normal situations, a low rtp_timeout value may make sense to quickly detect unavailable endpoints or otherwise problematic configurations (e.g. networking issues preventing end-to-end RTP flow). rtp_timeout_hold controls the timeout on RTP traffic when an endpoint is in a hold state, to account for the RTP traffic pattern of devices when no audio is to be generated (similar to mute).

# curl --header 'Content-Type: application/json' -H 'X-Auth-Token:'$TOKEN -XPUT 'https://wazo.example.com/api/confd/1.1/endpoints/sip/templates/$global_sip_template_uuid' -d '{
...
"endpoint_section_options": [
["rtp_timeout", "60"],
["rtp_timeout_hold", "120"]
]
}'

Here, modifying the global sip template to set rtp_timeout to 60s and rtp_timeout_hold to 120s. This ensures an endpoint generating no RTP traffic for 60s, or 120s in hold state, will be considered dead and disconnected.

See documentation on sip templates for general information on how to configure sip templates.

SIP endpoint session timers

RFC-4028 defines the SIP protocol session timer extension, which implements periodic refresh of SIP sessions through re-INVITEs. This ensures long SIP sessions are refreshed periodically, which helps detect unavailable endpoints.

By default, the global sip template sets the endpoint section option timers to yes. This means only endpoints explicitly supporting the session timer extension will see session timers used to refresh the session. The refresh interval and timeout can be negotiated by supporting endpoints, with the wazo server proposing timers_session_expires but accepting values no lower than timers_min_se (both options in seconds).

However, the default webrtc sip template sets timers to always. This ensures session timers are used by the wazo server even if the endpoint has no explicit support for session timers. In this case, the wazo server will send re-INVITEs in ongoing SIP sessions periodically, and expect acknowledgement from the endpoint. If the endpoint fails to acknowledge the INVITE, the endpoint will be considered unavailable after the timeout value in seconds set by timers_session_expires.

The smallest configurable value for timers_session_expires and timers_min_se is 90, meaning an unresponsive endpoint cannot be detected in less than 1m30s using session timers.

# curl --header 'Content-Type: application/json' -H 'X-Auth-Token:'$TOKEN -XPUT 'https://wazo.example.com/api/confd/1.1/endpoints/sip/templates/$global_sip_template_uuid' -d '{
...
"endpoint_section_options": [
["timers", "always"],
["timers_session_expires", "300"]
]
}'

Here, modifying the global sip template to ensure session timers are always in use, with a periodic refresh of 5 minutes. Dead calls with unresponsive endpoints should last no longer than 5 minutes.

Conclusion

TIMEOUT(absolute) asterisk channel setting sets an absolute maximum duration for all calls. This can be set per endpoint through sip templates set_var endpoint section option.

TCP keepalive may be defined on TCP-based SIP transports (protocol tcp, tls, ws, wss) to detect broken tcp connections.

Global sip timer parameters timer_t1 and timer_b can be adjusted to set timing expectations for SIP transactions which affect how quickly endpoints are considered unresponsive.

websocket_write_timeout can control throughput expectations on websocket transports.

RTP timeout settings may be adjusted to lower values when endpoints are expected to keep a continuous flow of RTP traffic in all normal circumstances.

SIP session timers should be enabled on all endpoints, and may be forced even if endpoints have no explicit support, but cannot detect unavailable endpoints in less than 1m30s.

A combination of those settings tested in and adjusted based on real or simulated production environments should help provide optimal behavior when dealing with network instabilities and resource leaks from so-called "phantom calls".

See also