I’m a Systems/Software Engineer in the San Francisco Bay Area. I moved from Columbus, Ohio in 2007 after getting a B.S. in Physics from the Ohio State University. I'm married, and we have dogs.

Under my github account (https://github.com/addumb): I open-sourced python-aliyun in 2014, I have an outdated python 2 project starter template at python-example, and I have a pretty handy “sshec2” command and some others in tools.

The Cost of 4K 60fps video storage, a dad's perspective (2021-2-20)
Mobile App Operability (2020-7-15)
I don't write (2019-10-30)
I moved addumb.com into GitHub pages (2016-3-7)
Quick Debian Backporting (2014-3-10)
Considering different data systems? (2013-11-15)
I Moved Addumb.com into AWS (2013-3-18)
--> Truth In Distributed Systems
Updated: MySQL 5.0 and 5.1 Side-By-Side (2011-3-2)
MySQL Duplicate Key Error - InnoDB or MyISAM? (2011-1-20)
Linux Tip: awesome and synergy for less mouse/keyboard switching (2011-1-12)
vim and bash (2010-12-7)
Linux Tools (2010-10-10)
Linix tip - stderr skips pipes (2010-7-19)
ndislocate - A distributed service locator, written on top of Node.js (2010-6-17)
I want a tattoo (2010-5-17)
Red Hat Enterprise Linux 5.5 released (2010-3-30)
HP ProLiant Linux repositories (2010-3-24)
Linux tip 4 - bash history timestamps (2010-3-8)
Devops (2009-11-25)
While I wait for the locksmith... (2009-10-24)
Linux tip 3 - rsync gotchas (2009-10-22)
Linux tip 2 - read (2009-10-22)
Linux tip - du versus df (2009-10-12)
MySQL Slave Initialization Do's and Dont's (2009-7-5)
What am I doing? (2009-5-25)

Truth In Distributed Systems

February 24, 2012

I need to write a distributed computer systems post about the concept of "truth."

The gist of it goes like this: Suppose you have a group of servers (A) that need data from another group of servers (B). Each member of A could talk to whichever member of B it likes. Thing is, each member of A could see a member of B fail independently. How should the other members of A know if that member of B failed? The truth of the matter is that a member of B failed... we can get into arguments about "How dead is it?" but suffice it to say you no longer want to consider it a valid member of B.

There are a few ways to prevent other members of A from talking to the failed member of B, that I can see:

Send traffic through a bottleneck that determines health of B (a load balancer pair, e.g.)
Have the members of A tell each other about the failure
Don't ;)

So... thing is truth is relative in this situation. We have servers derp1 and derp2 are members of the client pool A and they are both trying to talk to herp1, a member of B. derp1 and derp2 can independently decide herp1 is dead. The event where derp1 discovers herp1 is dead is separate from the event when derp2 discovers herp1 is dead. It will take a non-zero amount of time for that information to go anywhere. In this situation, the bottleneck or load balancer is going through this same process, it's just been deigned Arbiter of Health of Pool B.

It is not an option to have all members of pool A know immediately when herp1 fails, thanks to special relativity. The time it takes to inform all members of A increases with the size of A.

I prefer to deal with this sort of truth-y information retrieval (e.g. "What are my options for getting data out of B right now?") as a trade-off between immediate single-point knowledge on one side and eventual distributed quorum on the other side. It's mostly a matter of how long you can wait to get "good" information out of pool B. If you need it RIGHT NOW, then you'll need to go through an aggregation point like a load balancer. If you can deal with some "stale" or "bad" responses, then you should relax your requirements and let the clients work it out amongst themselves.