Fault tolerance is an approach by which reliability of a computer system can be increased beyond what can be achieved by traditional methods. While hardware supported fault tolerance has been well-documented, the newer, "software" supported fault tolerance techniques have remained scattered throughout the literature. Comprehensive and self-contained, this book organizes that body of knowledge with a focus on fault tolerance in distributed systems. (The uniprocess case is treated as a special case of distributed systems.) KEY TOPICS: Treats fault tolerant distributed systems as consisting of levels of abstraction, providing different tolerant services. For researchers/practitioners working in the area of fault tolerance.