Fault-tolerance in a distributed management system: a case study | IEEE Conference Publication | IEEE Xplore