Latency

Someone Who Understands Latency

latency
In this presentation at the Northwest C++ User’s Group, Herb Sutter explains how your hardware and software jumps through perversely complicated hoops in order to prevent you, the programmer, having to deal with the chasm in performance between CPUs and memory. There are slides in PDF and a video.

The questions he asks are “what is the cost of this programming operation?” and “can I speed it up?”. The answers are much more complex that you think. He takes pains to explain the impact of memory latency on all aspects of programming.
|

Completion Latency

There are different kinds of latency. Each one frustrates in a different way.

The third type of latency is Completion Latency, (Lc). This one is very frustrating. You've gotten there (Transport Latency), and you've waited in line for your ticket (Gate Latency), and now you are ready to go. But you are waiting. And waiting. And waiting. You're waiting for everyone else to get on the bus, because the bus won't leave until all the passengers are on and their luggage is stowed. And it doesn't matter if you are first on or last on (except for the looks from the other passengers anyway), the bus still has to wait and you cannot leave any sooner.

Completion Latency, as it turns out, can be expressed with a very simple equation, and it does not involve calculus, at least in the simple case:
tc
That's it. So if you can load 120 passengers an hour onto buses that seat 30, the completion latency is 1/4 hour.

To reduce Completion Latency, there are exactly two things you can do: decrease the payload (fewer butts on seats), or increase the throughput (figure out how to get butts and seats together faster). For tour buses this is how it works and the friendly-but-firm lady with the watch and the clipboard ushers everyone on in an efficient manner.

This equation also says something very important about payloads. If you are moving something and have a situation where the Completion Latency is fixed, such as a scheduled bus service, then throughput is proportional to payload. In other words, if you buy buses that are twice as big (and can be filled fully), then the throughput will double. For a bus service, this is very important to know, because throughput directly controls income, and cost is driven to a great extent by the number of buses you have, not their size. The same applies to airlines. This is one of the reasons passenger vehicles of all types that run on a schedule come in such a wide variety of sizes.

It also applies to computer networks. Data is always sent across a network in packets, frames, datagrams, or some other chunk that has to be received in its entirety before being verified as being correct. For a statistically fixed end-to-end latency, something very common is real-life networks, the throughput (bytes per second) is controlled almost completely by the number of bytes in each chunk. So to get more throughput, you must use bigger chunks.

If your throughput is fixed (cars coming off a production line, for instance) you can either vary the payload to match a certain latency (pick a train of a certain length to be able to make two deliveries a day), or live with the variable latency according to what payload you can muster (no control over the length of the train, deliver when it is filled).

What about the less simple case? I have assumed so far that the throughput is constant: things (people, bytes, cars) flow regularly. Efficient use of a payload (the need to fill it fully) leads to an unbounded latency if the throughput is variable. The equation turns into this:
tc2
which really means
tc3
In other words, the Completion Latency is the total time to process P things, where P is the payload. So Completion Latency is just an aggregation of other delays -- other latency. The time to process each thing could be due to Transport and/or Gate latencies. It has to be due to something.

There is no integral form of this equation because we are concerned with groups of discrete things. You could argue that processing some continuous thing (spaghetti for instance) that varies in speed and needs something completed (a spaghetti packet filled) is another form of Completion Latency. But it is not -- that is just Transport Latency.

The above sum makes another assumption. It assumes that talking about a thing-to-thing delay, Tn, makes sense. For some things it does, for instance processing things sequentially in order to make a complete group (like packing eggs from a conveyer belt). In other cases there is no sequence: the things you are waiting for are not correlated like they are in any sort of production line. An example of this would be T-shirt give-away. The first ten customers get a free T-shirt. How long does it take to give away all the T-shirts? If there is a line of people waiting for them at opening time, then it is sequential and it will take about ten times the time to give away the first one to give away all of them. But if the store is not that popular, it could take hours, days even, before ten customers show up. So Completion Latency is unbounded if the thing that must be completed is made of uncorrelated events.

There is just one more type of latency to discuss. The wait is almost over.
|

Gate Latency -- Cumulative Probability

The last time I wrote about Gate Latency, I ended with an attempt to diagram the chances of having to wait for a bathroom that is busy:
bathroom3
That's the chance of encountering gate latency (wait) for a bathroom that has one hour-long busy period and two half-hour busy periods per day. The problem with this diagram is that the spike with the arrow has to be infinitely high and infinitely narrow to express that we have 11/12 chance if finding the bathroom free and so have to wait exactly no time at all.

There is a better way to show this kind of information: a cumulative density function (CDF). Let's go back to the single die and look at its probability density function:
die5
Expressing this as a CDF gives:
cdf1
Now the chance accumulates as the die scores are added in left to right. Scoring less than 1 is impossible, so the chance is zero. At 1, the chance rises to 1/6, then at 2 it rises to 2/6, etc. all the way to 6, after which it does not rise past 1. All CDFs start at zero on the left and end at one on the right. For die throws this is not all that useful, since there is nothing special about accumulating the numbers 1, 2, 3.

But for waiting, this does a very useful thing. Here is the CDF for the bathroom wait diagrammed above:
cdf2
The chance of no wait starts off at 44/48 on the left. Moving to the right, the chance accumulates. The chances of having to wait 1/2 hour or less is 47/48. The chances of having to wait 1 hour or less is 1. The advantage with this diagram is that we can show the chances of exactly zero wait by just starting the graph at that point on the left. What were areas on the previous diagram become heights on this one.

The shape of the line as it goes from left to right shows how the probability accumulates. There is no summing to one here because the 1 representing 100% likelihood is always there on the right.
|

Gate Latency -- Visualizing Chance

In the last installment of my project looking into latency, I took a turn into statistics and chance, trying to diagram rolling a die and other things. The topic is gate latency, the time spent waiting for things.

We ended with this:
pdf4
The blue area represents the chance. So this means that we have to integrate the density function between two values to get the likelihood of whatever we are measuring falling between those two values, in this case the time to rewind a video tape.

Now look at the die probability chart from before:
die1
If you view this a a probability density function then you find that we are in trouble. Using the area rule and counting the score as continuous, this says that I have 1/12 chance of scoring between 2.5 and 3. That's 1/2 times 1/6. Look at this diagram:
die4
(Notice that I did move the numbers along the bottom, since I have to label actual locations along that axis now) But this result is nonsense for a die because it only scores whole numbers. So we change the diagram to look like this:
die5
Those arrows represent an infinitely high spike that is infinitesimally thin, but not so thin that the whole spike has no area. In fact its area is given by its height. Never mind that it is too weird to think about -- it actually works for our purposes. Add up all the area that is between 2.75 and 3.25 (blue above) and you get nothing before 3, exactly 1/6 at 3 and nothing after. Add up all the area from 0 to infinity and you get 1 (six spikes times 1/6 area each). Add up the area from 2.25 to 2.75 and you get zero. It works just the way we want a die to work.

It didn't make those spikes up. Those spikes have a name. They are called Dirac Delta functions, shown as ∂(x).

Back to gate latency. Here is the problem. We want to understand and express the chance that the bathroom will be busy and we want to understand and express how long we will have to wait for it if it is busy. If we assume that four other people keep it busy for a total of one two hours a day then we can easily see that there is a 11/12 (that's 22 out of 24 hours) chance of it not being busy and so having a wait of zero. If those two busy hours are consecutive (what a wait!) then the probability density function looks like this:
bathroom1
That looks very ugly. But if you examine it you will see that it is correct. 11/12 chance of there being no delay (purple spike). The area of the rest of the graph (green rectangle) is 2 x 1/24 = 1/12, so the total area is 1. You can never wait more than 2 hours or less than zero. The chance of waiting an hour is the same as waiting two hours, depending on when you start waiting in the two-hour period.

How about a household that keeps it busy for two hours, but in four half-hour chunks instead of one two-hour chunk (a much more pleasant place to live)?
bathroom2
Now you have a four-times more likely chance of waiting a quarter as long. What about one hour-long busy period and two half-hour busy periods?
bathroom3
The graphs are pretty ugly, but they do work.

The key point here is that gate latency, just like transport latency involves measuring the area under non-trivial graphs: ugly integration. Minimizing transport latency involved changing parameters and figuring out which graph to integrate to get the total latency. At least it was deterministic! Gate latency is not, trivially at least, but gate latency does have one redeeming feature: finding the maximum latency is easy, at least for these simple cases.

It turns out that gate latency can, in some situations, be defeated entirely. In other words, in our busy bathroom example you would never, ever have to wait to get in. The trick is synchronization, making sure that the time that the bathroom is busy never coincides with the time it is needed. In everything up until now, I have assumed that there is no correlation between the two competing events (or pairs of die throws). Synchronization involves communication or control that assures correlation and so minimizes gate latency. You have done this yourself many times: driving somewhere at a particular time because the traffic is less busy, for instance.

Next, a better way of diagramming gate latency.
|
The Bagelturf site welcomes Donations of any size