All the algorithms have the same underlying model of the system, which we will now describe. Each machine is assumed to have a timer that causes an interrupt H times a second. When this timer goes off, the interrupt handler adds 1 to a software clock that keeps track of the number of ticks (interrupts) since some agreed-upon time in the past. Let us call the value of this clock C. More specifically, when the UTC time is t, the value of the clock on machine p is C p(t). In a perfect world, we would have C p(t)=t for all p and all t. In other words, dC/dt ideally should be 1.
Real timers do not interrupt exactly H times a second. Theoretically, a timer with H = 60 should generate 216,000 ticks per hour. In practice, the relative error obtainable with modern timer chips is about 10 -5, meaning that a particular machine can get a value in the range 215,998 to 216,002 ticks per hour. More precisely, if there exists some constant ρ such that
the timer can be said to be working within its specification. The constant ρ is specified by the manufacturer and is known as the maximum drift rate.Slow, perfect, and fast clocks are shown in Fig. 3-5.
Fig. 3-5.Not all clocks tick precisely at the correct rate.
If two clocks are drifting from UTC in the opposite direction, at a time Δ t after they were synchronized, they may be as much as 2ρ Δ t apart. If the operating system designers want to guarantee that no two clocks ever differ by more than δ, clocks must be resynchronized (in software) at least every δ/2ρ seconds. The various algorithms differ in precisely how this resynchronization is done.
Cristian's Algorithm
Let us start with an algorithm that is well suited to systems in which one machine has a WWV receiver and the goal is to have all the other machines stay synchronized with it. Let us call the machine with the WWV receiver a timeserver.Our algorithm is based on the work of Cristian (1989) and prior work. Periodically, certainly no more than every δ/2ρ seconds, each machine sends a message to the time server asking it for the current time. That machine responds as fast as it can with a message containing its current time, C UTC, as shown in Fig. 3-6.
As a first approximation, when the sender gets the reply, it can just set its clock to C UTC . However, this algorithm has two problems, one major and one minor. The major problem is that time must never run backward. If the sender's clock is fast, C UTC will be smaller than the sender's current value of C. Just taking over C UTC could cause serious problems, such as an object file compiled just after the clock change having a time earlier than the source which was modified just before the clock change.
Fig. 3-6.Getting the current time from a time server.
Such a change must be introduced gradually. One way is as follows. Suppose that the timer is set to generate 100 interrupts per second. Normally, each interrupt would add 10 msec to the time. When slowing down, the interrupt routine adds only 9 msec each time, until the correction has been made. Similarly, the clock can be advanced gradually by adding 11 msec at each interrupt instead of jumping it forward all at once.
The minor problem is that it takes a nonzero amount of time for the time server's reply to get back to the sender. Worse yet, this delay may be large and vary with the network load. Cristian's way of dealing with it is to attempt to measure it. It is simple enough for the sender to record accurately the interval between sending the request to the time server and the arrival of the reply. Both the starting time, T 0, and the ending time, T 1, are measured using the same clock, so the interval will be relatively accurate, even if the sender's clock is off from UTC by a substantial amount.
In the absence of any other information, the best estimate of the message propagation time is ( T 1– T 0)/2. When the reply comes in, the value in the message can be increased by this amount to give an estimate of the server's current time. If the theoretical minimum propagation time is known, other properties of the time estimate can be calculated.
This estimate can be improved if it is known approximately how long it takes the time server to handle the interrupt and process the incoming message. Let us call the interrupt handling time I . Then the amount of the interval from T 0to T 1that was devoted to message propagation is T 1- T 0- I , so the best estimate of the one-way propagation time is half this. Systems do exist in which messages from A to B systematically take a different route than messages from B to A, and thus have a different propagation time, but we will not consider such systems here.
To improve the accuracy, Cristian suggested making not one measurement, but a series of them. Any measurements in which T 1– T 0exceeds some threshold value are discarded as being victims of network congestion and thus unreliable. The estimates derived from the remaining probes can then be averaged to get a better value. Alternatively, the message that came back fastest can be taken to be the most accurate since it presumably encountered the least traffic underway and thus is the most representative of the pure propagation time.
The Berkeley Algorithm
In Cristian's algorithm, the time server is passive. Other machines ask it for the time periodically. All it does is respond to their queries. In Berkeley UNIX, exactly the opposite approach is taken (Gusella and Zatti, 1989). Here the time server (actually, a time daemon) is active, polling every machine periodically to ask what time it is there. Based on the answers, it computes an average time and tells all the other machines to advance their clocks to the new time or slow their clocks down until some specified reduction has been achieved. This method is suitable for a system in which no machine has a WWV receiver. The time daemon's time must be set manually by the operator periodically. The method is illustrated in Fig. 3-7.
Fig. 3-7.(a) The time daemon asks all the other machines for their clock values. (b) The machines answer. (c) The time daemon tells everyone how to adjust their clock.
In Fig. 3-7(a), at 3:00, the time daemon tells the other machines its time and asks for theirs. In Fig. 3-7(b), they respond with how far ahead or behind the time daemon they are. Armed with these numbers, the time daemon computes the average and tells each machine how to adjust its clock [see Fig. 3-7(c)].
Averaging Algorithms
Both of the methods described above are highly centralized, with the usual disadvantages. Decentralized algorithms are also known. One class of decentralized clock synchronization algorithms works by dividing time into fixed-length resynchronization intervals. The i th interval starts at T 0 +iR and runs until T 0 +(i+1)R, where T 0is an agreed upon moment in the past, and R is a system parameter. At the beginning of each interval, every machine broadcasts the current time according to its clock. Because the clocks on different machines do not run at exactly the same speed, these broadcasts will not happen precisely simultaneously.
After a machine broadcasts its time, it starts a local timer to collect all other broadcasts that arrive during some interval 5. When all the broadcasts arrive, an algorithm is run to compute a new time from them. The simplest algorithm is just to average the values from all the other machines. A slight variation on this theme is first to discard the m highest and m lowest values, and average the rest. Discarding the extreme values can be regarded as self defense against up to m faulty clocks sending out nonsense.
Читать дальше