The "distress" feature of the Saisei STM is a strong tool in the software’s arsenal - it gives a quick synopsis of the state of the network from the point of view of a given user, application, geolocation or AS, from 0 (perfect) to 100 (terrible). This is a very user-friendly score comprehensible to all individuals using the STM (Rather than diving into the arcane details of the TCP protocol).
How is it calculated? What does it signify? This article sets out to answer these questions.
First, distress only applies to TCP. Distress is calculated for each TCP flow, based on TCP-related events: retransmissions, and timeouts. Retransmissions are a normal part of life in TCP. They occur when the network is trying to control the bandwidth that a flow uses, and in small quantities don't indicate that anything bad is happening. Timeouts are a different story - they indicate that the network is seriously congested, typically that it is dropping several consecutive packets.
The distress scores for higher-level objects (applications, etc) are computed by combining the distress scores for individual flows. It's a little bit more than just the average for all flows:
We keep three raw statistics for a TCP flow that are relevant to distress:
The distress score for a flow depends on:
These are combined to give the distress value between 0 and 100.
There are some things which you might expect to figure in the distress score, which do not. These are described in detail below to provide clarity on the topic.