In a nutshell, assuming I understood it correctly: to find the storage location of a file, instead of doing the intuitive pattern of hash(filename)%num_servers (which would change completely every time the list of servers changes) you do something just a tiny bit more complicated:
sort([hash(filename+serverID) for serverID in serverList])[0]
If you add a server to the list, it may become the new highest priority location for a given filename and it would have to move, but that's only the case for about 1 in length(serverList) files and makes the load balance equally across all servers again. And if the first entry (sortedList[0]) becomes unavailable, you fail over to the second one at [1] etc.
Cool, didn't know of it.
In a nutshell, assuming I understood it correctly: to find the storage location of a file, instead of doing the intuitive pattern of hash(filename)%num_servers (which would change completely every time the list of servers changes) you do something just a tiny bit more complicated:
If you add a server to the list, it may become the new highest priority location for a given filename and it would have to move, but that's only the case for about 1 in length(serverList) files and makes the load balance equally across all servers again. And if the first entry (sortedList[0]) becomes unavailable, you fail over to the second one at [1] etc.And this naturally extends to data replication.