lucb1e 3 hours ago

Cool, didn't know of it.

In a nutshell, assuming I understood it correctly: to find the storage location of a file, instead of doing the intuitive pattern of hash(filename)%num_servers (which would change completely every time the list of servers changes) you do something just a tiny bit more complicated:

    sort([hash(filename+serverID) for serverID in serverList])[0]
If you add a server to the list, it may become the new highest priority location for a given filename and it would have to move, but that's only the case for about 1 in length(serverList) files and makes the load balance equally across all servers again. And if the first entry (sortedList[0]) becomes unavailable, you fail over to the second one at [1] etc.
  • pkhuong 3 hours ago

    And this naturally extends to data replication.