Group: systems
Topic: communication protocols
Topic: consistency testing
Topic: defensive programming
Topic: design for change
Topic: error safe systems
Topic: exception handling by recovery block or rescue clause
Topic: file system reliability
Topic: language flexibility
Topic: log-structured file system
Topic: log-structured rollback-recovery
Topic: logging data and events
Topic: mobile code
Topic: open systems
Topic: preventing accidental errors
Topic: process migration
Topic: reliable communication
Topic: reliability of distributed systems
Topic: safety critical systems
Topic: testing by voting or N-version
Topic: Thesa as a database of modules
| |
Summary
A system can be reliable despite faulty components and unanticipated events. The basic method is a functionally rich system with multiple ways to achieve a result. Problems must be detected soon after their occurrence, and appropriate responses generated.
N-version programming creates multiple, independent versions of a program. The independent versions check each other with majority voting. Independence may be difficult to achieve. (cbb 4/98)
Subtopic: engineering for reliability
Quote: how do engineers create reliable structures from imperfect mechanisms [»demiRA5_1979]
| Subtopic: redundancy
Quote: failure of a system without redundancies is massive and uncontrolled; like the Titanic
| Quote: if we feel that something is provably right then redundancies are removed [»demiRA5_1979]
| Quote: use redundancy of programming language for ease of use and syntactic error checking [»wilkMV_1957]
| Subtopic: verification vs. trust
Quote: a verified program is provably corrected but not reliable and trustworthy; no information about limits [»demiRA5_1979]
| Subtopic: resourceful, functionally rich system
Quote: a resourceful system achieves its goals despite failures of standard methods [»abboRJ3_1990]
| Quote: a resourceful system has functional richness, testable goals, and some planning abilities [»abboRJ3_1990]
| Quote: a resourceful system can combine basic functions into programs or plans [»abboRJ3_1990]
| Quote: a resourceful system will automatically create programs to deal with contingencies [»abboRJ3_1990]
| Quote: a functionally rich system is not orthogonal; there are multiple ways to achieve a result
| Quote: many existing systems are functionally rich, especially life threatening ones such as airplanes [»abboRJ3_1990]
| Quote: an altitude control system can use any three of four reaction wheels to maintain a satellite's orientation [»abboRJ3_1990]
| Quote: build internets from flows, packet sequences from source to destination; helps resource management and accountability; gateways keep track of soft, flow state [»clarDD8_1988]
| Subtopic: intermittent errors
Quote: intermittent machine errors are exasperating; run tests, check identities, make duplicate runs [»turiA3_1951]
| Quote: use burst computations to workaround intermittent machine errors; if both burst runs differ, rerun using saved state
| Subtopic: frequent check
Quote: run programmed checks often to detect problems early, avoid disastrous results, and diagnosis problems [»turiA3_1951]
| Quote: check for mechanical and electrical failures frequently; no more than 20 minutes without checks [»compHU_1946]
| Subtopic: backup processor
Quote: for fault-tolerance, use swact (switch of activity) between active and passive forms of an application; passive form only keeps track of state of active form; resynchronizes on recovery [»dereF3_2001]
| Subtopic: hot swap modules
Quote: hot swap modules need to change types while preserving type safety; use reflective mechanism with programmer-defined version adapters [»duggD9_2001]
| Quote: a version adapter maps a value from the old to new version of a type; use run-time type tags to identify versioned types [»duggD9_2001]
| Subtopic: checkpoint
Quote: coordinated checkpoints create a global consistent state; simplifies recovery with good performance; recovery of uncoordinated checkpoints can domino to the initial state [»elnoEN9_2002]
| Subtopic: roll-back recovery
Quote: survey of automatic, rollback-recovery from checkpoints of message-passing systems [»elnoEN9_2002]
| Quote: message-passing systems may propagate rollback recovery because each message creates a dependency between sender and receiver; can domino to starting point [»elnoEN9_2002]
| Quote: causal logging is as fast as optimistic logging while allowing each process to commit output independently; roll-back to most recent checkpoint; more complex [»elnoEN9_2002]
| Quote: because roll-back recovery can be complex, all commercial implementations use pessimistic logging [»elnoEN9_2002]
| Subtopic: retry
Quote: solved problem of occasional process failure in Unix by repeated retries at increasing lengths of time [»doloTA7_1978]
| Subtopic: multi-version
Quote: multi-version coding can compare imprecise results with precise ones [»stoyAD7_1993]
| Subtopic: multi-version debugging
Quote: Guard performs relative debugging with assertions to compare data structures, permutations to identify subarrays, and plots of error surfaces
| Subtopic: oracle
Quote: test by a pseudo-oracle which independently implements a program and compare results; use a very high level language [»daviMD_1981]
| Subtopic: voting
Quote: n-version programming for increasing fault tolerance by voting [»knigJC1_1986]
| Quote: n-version systems are self-diagnostic; i.e., log disagreements and debug individual channels [»hattL11_1997]
| Quote: independently write two versions of a program; the same errors should not occur in both [»pariG_1980]
| Subtopic: problems with voting
Quote: when testing n-version programs found dependent errors between versions; reduces its effectiveness [»knigJC1_1986]
| Quote: experimental evidence for the failure of the independence model of multi-version reliability [»butlRW12_1991]
| Quote: an asynchronous, triply-redundant voting system was unstable because channels sampled sensors at different times; bad at control points [»rushJ12_1991]
| Subtopic: limitations of redundancy
Quote: diversity increases reliability only if systems are redundant, failures are independent, diversity is deep, and alternatives interoperate smoothly [»parnDL8_2007]
| Quote: computer systems exhibit little redundancy; for example, redundant tax systems would help only if they implement exactly the same rules [»parnDL8_2007]
| Quote: interconnected, computer-based systems are not independent; failures in one system can cause problems for other systems [»parnDL8_2007]
|
Related Topics
Group: systems (17 topics, 530 quotes)
Topic: communication protocols (62 items)
Topic: consistency testing (60 items)
Topic: defensive programming (22 items)
Topic: design for change (76 items)
Topic: error safe systems (76 items)
Topic: exception handling by recovery block or rescue clause (22 items)
Topic: file system reliability (26 items)
Topic: language flexibility (34 items)
Topic: log-structured file system (11 items)
Topic: log-structured rollback-recovery (13 items)
Topic: logging data and events (17 items)
Topic: mobile code (14 items)
Topic: open systems (33 items)
Topic: preventing accidental errors (37 items)
Topic: process migration (3 items)
Topic: reliable communication (29 items)
Topic: reliability of distributed systems (35 items)
Topic: safety critical systems (32 items)
Topic: testing by voting or N-version (10 items)
Topic: Thesa as a database of modules (23 items)
|