Th Topic: resourceful, redundant systems for reliability

Topic: resourceful, redundant systems for reliability

topics > computer science > programming > Group: goals for a programming system

Group:
systems

Topic:
communication protocols
Topic:
consistency testing
Topic:
defensive programming
Topic:
design for change
Topic:
error safe systems
Topic:
exception handling by recovery block or rescue clause
Topic:
file system reliability
Topic:
language flexibility
Topic:
log-structured file system
Topic:
log-structured rollback-recovery
Topic:
logging data and events
Topic:
mobile code
Topic:
open systems
Topic:
preventing accidental errors
Topic:
process migration
Topic:
reliable communication
Topic:
reliability of distributed systems
Topic:
safety critical systems
Topic:
testing by voting or N-version
Topic:
Thesa as a database of modules

Summary

A system can be reliable despite faulty components and unanticipated events. The basic method is a functionally rich system with multiple ways to achieve a result. Problems must be detected soon after their occurrence, and appropriate responses generated.
N-version programming creates multiple, independent versions of a program. The independent versions check each other with majority voting. Independence may be difficult to achieve. (cbb 4/98)

Subtopic: engineering for reliability

Quote: how do engineers create reliable structures from imperfect mechanisms [»demiRA5_1979]

Subtopic: redundancy

Quote: failure of a system without redundancies is massive and uncontrolled; like the Titanic
Quote: if we feel that something is provably right then redundancies are removed [»demiRA5_1979]
Quote: use redundancy of programming language for ease of use and syntactic error checking [»wilkMV_1957]

Subtopic: verification vs. trust

Quote: a verified program is provably corrected but not reliable and trustworthy; no information about limits [»demiRA5_1979]

Subtopic: resourceful, functionally rich system

Quote: a resourceful system achieves its goals despite failures of standard methods [»abboRJ3_1990]
Quote: a resourceful system has functional richness, testable goals, and some planning abilities [»abboRJ3_1990]
Quote: a resourceful system can combine basic functions into programs or plans [»abboRJ3_1990]
Quote: a resourceful system will automatically create programs to deal with contingencies [»abboRJ3_1990]
Quote: a functionally rich system is not orthogonal; there are multiple ways to achieve a result
Quote: many existing systems are functionally rich, especially life threatening ones such as airplanes [»abboRJ3_1990]
Quote: an altitude control system can use any three of four reaction wheels to maintain a satellite's orientation [»abboRJ3_1990]
Quote: build internets from flows, packet sequences from source to destination; helps resource management and accountability; gateways keep track of soft, flow state [»clarDD8_1988]

Subtopic: intermittent errors

Quote: intermittent machine errors are exasperating; run tests, check identities, make duplicate runs [»turiA3_1951]
Quote: use burst computations to workaround intermittent machine errors; if both burst runs differ, rerun using saved state

Subtopic: frequent check

Quote: run programmed checks often to detect problems early, avoid disastrous results, and diagnosis problems [»turiA3_1951]
Quote: check for mechanical and electrical failures frequently; no more than 20 minutes without checks [»compHU_1946]

Subtopic: backup processor

Quote: for fault-tolerance, use swact (switch of activity) between active and passive forms of an application; passive form only keeps track of state of active form; resynchronizes on recovery [»dereF3_2001]

Subtopic: hot swap modules

Quote: hot swap modules need to change types while preserving type safety; use reflective mechanism with programmer-defined version adapters [»duggD9_2001]
Quote: a version adapter maps a value from the old to new version of a type; use run-time type tags to identify versioned types [»duggD9_2001]

Subtopic: checkpoint

Quote: coordinated checkpoints create a global consistent state; simplifies recovery with good performance; recovery of uncoordinated checkpoints can domino to the initial state [»elnoEN9_2002]

Subtopic: roll-back recovery

Quote: survey of automatic, rollback-recovery from checkpoints of message-passing systems [»elnoEN9_2002]
Quote: message-passing systems may propagate rollback recovery because each message creates a dependency between sender and receiver; can domino to starting point [»elnoEN9_2002]
Quote: causal logging is as fast as optimistic logging while allowing each process to commit output independently; roll-back to most recent checkpoint; more complex [»elnoEN9_2002]
Quote: because roll-back recovery can be complex, all commercial implementations use pessimistic logging [»elnoEN9_2002]

Subtopic: retry

Quote: solved problem of occasional process failure in Unix by repeated retries at increasing lengths of time [»doloTA7_1978]

Subtopic: multi-version

Quote: multi-version coding can compare imprecise results with precise ones [»stoyAD7_1993]

Subtopic: multi-version debugging

Quote: Guard performs relative debugging with assertions to compare data structures, permutations to identify subarrays, and plots of error surfaces

Subtopic: oracle

Quote: test by a pseudo-oracle which independently implements a program and compare results; use a very high level language [»daviMD_1981]

Subtopic: voting

Quote: n-version programming for increasing fault tolerance by voting [»knigJC1_1986]
Quote: n-version systems are self-diagnostic; i.e., log disagreements and debug individual channels [»hattL11_1997]
Quote: independently write two versions of a program; the same errors should not occur in both [»pariG_1980]

Subtopic: problems with voting

Quote: when testing n-version programs found dependent errors between versions; reduces its effectiveness [»knigJC1_1986]
Quote: experimental evidence for the failure of the independence model of multi-version reliability [»butlRW12_1991]
Quote: an asynchronous, triply-redundant voting system was unstable because channels sampled sensors at different times; bad at control points [»rushJ12_1991]

Subtopic: limitations of redundancy

Quote: diversity increases reliability only if systems are redundant, failures are independent, diversity is deep, and alternatives interoperate smoothly [»parnDL8_2007]
Quote: computer systems exhibit little redundancy; for example, redundant tax systems would help only if they implement exactly the same rules [»parnDL8_2007]
Quote: interconnected, computer-based systems are not independent; failures in one system can cause problems for other systems
[»parnDL8_2007]

Related Topics

Group: systems (17 topics, 530 quotes)
Topic: communication protocols (62 items)
Topic: consistency testing (60 items)
Topic: defensive programming (22 items)
Topic: design for change (76 items)
Topic: error safe systems (76 items)
Topic: exception handling by recovery block or rescue clause (22 items)
Topic: file system reliability (26 items)
Topic: language flexibility (34 items)
Topic: log-structured file system (11 items)
Topic: log-structured rollback-recovery (13 items)
Topic: logging data and events (17 items)
Topic: mobile code (14 items)
Topic: open systems (33 items)
Topic: preventing accidental errors (37 items)
Topic: process migration (3 items)
Topic: reliable communication (29 items)
Topic: reliability of distributed systems (35 items)
Topic: safety critical systems (32 items)
Topic: testing by voting or N-version (10 items)
Topic: Thesa as a database of modules
(23 items)