Topic: resourceful, redundant systems for reliability

topics > computer science > programming > Group: goals for a programming system


communication protocols
consistency testing
defensive programming
design for change
error safe systems
exception handling by recovery block or rescue clause
file system reliability
language flexibility
log-structured file system
log-structured rollback-recovery
logging data and events
mobile code
open systems
preventing accidental errors
process migration
reliable communication
reliability of distributed systems
safety critical systems
testing by voting or N-version
Thesa as a database of modules


A system can be reliable despite faulty components and unanticipated events. The basic method is a functionally rich system with multiple ways to achieve a result. Problems must be detected soon after their occurrence, and appropriate responses generated.

N-version programming creates multiple, independent versions of a program. The independent versions check each other with majority voting. Independence may be difficult to achieve. (cbb 4/98)

Subtopic: engineering for reliability up

Quote: how do engineers create reliable structures from imperfect mechanisms [»demiRA5_1979]

Subtopic: redundancy up

Quote: failure of a system without redundancies is massive and uncontrolled; like the Titanic
Quote: if we feel that something is provably right then redundancies are removed [»demiRA5_1979]
Quote: use redundancy of programming language for ease of use and syntactic error checking [»wilkMV_1957]

Subtopic: verification vs. trust up

Quote: a verified program is provably corrected but not reliable and trustworthy; no information about limits [»demiRA5_1979]

Subtopic: resourceful, functionally rich system up

Quote: a resourceful system achieves its goals despite failures of standard methods [»abboRJ3_1990]
Quote: a resourceful system has functional richness, testable goals, and some planning abilities [»abboRJ3_1990]
Quote: a resourceful system can combine basic functions into programs or plans [»abboRJ3_1990]
Quote: a resourceful system will automatically create programs to deal with contingencies [»abboRJ3_1990]
Quote: a functionally rich system is not orthogonal; there are multiple ways to achieve a result
Quote: many existing systems are functionally rich, especially life threatening ones such as airplanes [»abboRJ3_1990]
Quote: an altitude control system can use any three of four reaction wheels to maintain a satellite's orientation [»abboRJ3_1990]
Quote: build internets from flows, packet sequences from source to destination; helps resource management and accountability; gateways keep track of soft, flow state [»clarDD8_1988]

Subtopic: intermittent errors up

Quote: intermittent machine errors are exasperating; run tests, check identities, make duplicate runs [»turiA3_1951]
Quote: use burst computations to workaround intermittent machine errors; if both burst runs differ, rerun using saved state

Subtopic: frequent check up

Quote: run programmed checks often to detect problems early, avoid disastrous results, and diagnosis problems [»turiA3_1951]
Quote: check for mechanical and electrical failures frequently; no more than 20 minutes without checks [»compHU_1946]

Subtopic: backup processor up

Quote: for fault-tolerance, use swact (switch of activity) between active and passive forms of an application; passive form only keeps track of state of active form; resynchronizes on recovery [»dereF3_2001]

Subtopic: hot swap modules up

Quote: hot swap modules need to change types while preserving type safety; use reflective mechanism with programmer-defined version adapters [»duggD9_2001]
Quote: a version adapter maps a value from the old to new version of a type; use run-time type tags to identify versioned types [»duggD9_2001]

Subtopic: checkpoint up

Quote: coordinated checkpoints create a global consistent state; simplifies recovery with good performance; recovery of uncoordinated checkpoints can domino to the initial state [»elnoEN9_2002]

Subtopic: roll-back recovery up

Quote: survey of automatic, rollback-recovery from checkpoints of message-passing systems [»elnoEN9_2002]
Quote: message-passing systems may propagate rollback recovery because each message creates a dependency between sender and receiver; can domino to starting point [»elnoEN9_2002]
Quote: causal logging is as fast as optimistic logging while allowing each process to commit output independently; roll-back to most recent checkpoint; more complex [»elnoEN9_2002]
Quote: because roll-back recovery can be complex, all commercial implementations use pessimistic logging [»elnoEN9_2002]

Subtopic: retry up

Quote: solved problem of occasional process failure in Unix by repeated retries at increasing lengths of time [»doloTA7_1978]

Subtopic: multi-version up

Quote: multi-version coding can compare imprecise results with precise ones [»stoyAD7_1993]

Subtopic: multi-version debugging up

Quote: Guard performs relative debugging with assertions to compare data structures, permutations to identify subarrays, and plots of error surfaces

Subtopic: oracle up

Quote: test by a pseudo-oracle which independently implements a program and compare results; use a very high level language [»daviMD_1981]

Subtopic: voting up

Quote: n-version programming for increasing fault tolerance by voting [»knigJC1_1986]
Quote: n-version systems are self-diagnostic; i.e., log disagreements and debug individual channels [»hattL11_1997]
Quote: independently write two versions of a program; the same errors should not occur in both [»pariG_1980]

Subtopic: problems with voting up

Quote: when testing n-version programs found dependent errors between versions; reduces its effectiveness [»knigJC1_1986]
Quote: experimental evidence for the failure of the independence model of multi-version reliability [»butlRW12_1991]
Quote: an asynchronous, triply-redundant voting system was unstable because channels sampled sensors at different times; bad at control points [»rushJ12_1991]

Subtopic: limitations of redundancy up

Quote: diversity increases reliability only if systems are redundant, failures are independent, diversity is deep, and alternatives interoperate smoothly [»parnDL8_2007]
Quote: computer systems exhibit little redundancy; for example, redundant tax systems would help only if they implement exactly the same rules [»parnDL8_2007]
Quote: interconnected, computer-based systems are not independent; failures in one system can cause problems for other systems

Related Topics up

Group: systems   (17 topics, 530 quotes)

Topic: communication protocols (62 items)
Topic: consistency testing (60 items)
Topic: defensive programming (22 items)
Topic: design for change (76 items)
Topic: error safe systems (76 items)
Topic: exception handling by recovery block or rescue clause (22 items)
Topic: file system reliability (26 items)
Topic: language flexibility (34 items)
Topic: log-structured file system (11 items)
Topic: log-structured rollback-recovery (13 items)
Topic: logging data and events (17 items)
Topic: mobile code (14 items)
Topic: open systems (33 items)
Topic: preventing accidental errors (37 items)
Topic: process migration (3 items)
Topic: reliable communication (29 items)
Topic: reliability of distributed systems (35 items)
Topic: safety critical systems (32 items)
Topic: testing by voting or N-version (10 items)
Topic: Thesa as a database of modules
(23 items)

Updated barberCB 3/06
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.