What is a vector clock?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Interesting:

Like dotted version vectors, vector clocks are a means of tracking a events in distributed systems. Unlike normal clocks, vector clocks have no sense of chronological time, i.e. they don’t care if something happened at 6 pm today or back in 1972. They care only about sequences of events. More specifically, they keep track of who—i.e. which actor in the system—has modified an object and how many times they’ve done so.

In a distributed system like Riak, multiple replicas of each object are active in the cluster all the time. Because it’s inevitable that objects will have conflicting values due to events like concurrent updates and healed network partitions, Riak needs a mechanism to keep track of which replica of an object is more current than another. In versions of Riak prior to 2.0, vector clocks were the means employed by Riak to do precisely that.

A number of important aspects of the relationship between object replicas can be determined using vector clocks:

Whether one object is a direct descendant of the other

Whether the objects are direct descendants of a common parent

Whether the objects are unrelated in recent heritage

Behind the scenes, Riak uses vector clocks as an essential element of its active anti-entropy subsystem and of its automatic read repair capabilities.

Vector clocks are non-human-readable metadata attached to all Riak objects. They look something like this:

a85hYGBgzGDKBVIcR4M2cgczH7HPYEpkzGNlsP/VfYYvCwA=

While vector clocks quite often resolve object conflicts without trouble, there are times when they can’t, i.e. when it’s unclear which value of an object is most current. When that happens, Riak, if configured to do so, will create siblings.

Post external references

  1. 1
    http://docs.basho.com/riak/latest/theory/concepts/context/#Vector-Clocks
Source