Topic: error safe systems

topics > computer science > programming > Group: goals for a programming system

exception handling
program proving
software maintenance
type checking

children vs. adults
consistency testing
database consistency and reliability
database security
defensive programming
deletion of information
ease of learning
ease of use
error messages
exception handling by recovery block or rescue clause
exceptions from invalid input
incremental testing
log-structured rollback-recovery
minimal manuals and guided exploration
open systems
operating system security
optimistic update for concurrency control
power fail recovery
preventing accidental errors
program proving is infeasible
program web
programming without errors
proof-carrying code
reliability of distributed systems
resourceful, redundant systems for reliability
run-time assertions
safe use of pointers
safety critical systems
self-identifying data structures
sensitivity of software to change
testing by voting or N-version
training wheels for the user interface
type-safe and secure languages
undoing actions in a UserInterface
usability errors


Error safe systems are needed because all combinations of logical errors can not be tested. Only through reliability is a system predictable and hence useful.

Both users and programmers will make errors. These errors should be easy to detect and correct. They shouldn't cause irreversible damage. For instance destroyed text or data should be at least temporarily recoverable. Typographical errors should be easily detected and all errors should be detected as early as possible.

Robust or fault-tolerant software performs despite errors and unexpected events. For instance file creation can occur in stages to maintain database consistency despite system failure. Redundant information should be stored with critical structures. For instance, disks should be reconstructible without a file directory. All inputs should have an output. (cbb 5/80)

Subtopic: trustworthy up

Quote: a reliable, error-free system is a predictable one; it does what it was intended to do [»hamiM_1978]
Quote: reliability requires simplicity [»lampBW10_1983]
Quote: all actions should be atomic or restartable; i.e., use transactions [»lampBW10_1983]
Quote: the proper product of programming is arguments that a program is a trustworthy solution [»dijkEW_1982]
Quote: a disk is apparently safer than battery-backed memory because it is much harder to change [»chenPM9_1996]
Quote: the Web has anarchic scalability; works despite load, malicious data, unknown servers, no state, no lists of back references, multiple trust boundaries [»fielRT5_2002]
Quote: software errors caused less downtime of the telephone network, 2%, than any other source of failure except vandalism

Subtopic: reliability despite change up

Quote: systems built under Mesa will usually remain reliability after changes to interfaces [»laueHC_1979]

Subtopic: security despite attack up

Quote: assurance and authenticated operation are important security goals; assurance is correct behavior despite attacks [»englP7_2003]

Subtopic: robust library up

Quote: a robust library expresses constraints as types, reports errors relative to user, and does not cause application errors [»fricA4_2000]
Quote: Cyclone is a safe dialect of C; avoids buffer overflows, format string attacks, and memory management errors; static analysis plus run-time checks and annotations [»jimT6_2002]

Subtopic: always check up

Quote: never trust the hardware; compute a checksum with the data; store them separately on disk; recover data if they do not agree [»browD9_2007]
Quote: principle-driven design allows effective checks for errors; in eight years, every EROS kernel bug was caught by an assertion check [»shapJS1_2002]
Quote: it is absurd to make elaborate security checks on debugging runs, and then remove them in production runs, when an erroneous result could be disastrous [»hoarCA_1974]
Quote: do not disable assertions in the field; needed for identifying problems [»shorJ9_2004]
Quote: errors should be checked at multiple levels to ensure that all errors are caught somewhere [»maclBJ_1987]
Quote: half of the software in telephone switches concerns error detection and correction; this may explain the low outage rate due to software [»kuhnDR4_1997]

Subtopic: allow for error up

Quote: high-quality software should keep working in spite of anticipated glitches [»wirfR7_2006]
Quote: on failure, silently ignore the request, admit failure, resign, retry, alternative action, or appeal to a higher authority [»wirfR7_2006]
Quote: if an error is possible, someone will make it; should be easily detected, minimal consequences, and reversible effects [»normDA_1988]
Quote: design the system to allow for errors: make errors easy to discover and correct [»normDA_1988]
Quote: nothing can prove the absence of bugs; verification can not prevent failures from unforeseen causes [»abraP12_1986]
Quote: create your ideas first, design for failure, let users complain about errors, make assumptions and intentions explicit, iterate quickly [»frisL1_2006]

Subtopic: expect failure up

Quote: services are expected to fail; the computer disappears; mass appeal from new, updated, improved, etc. [»tursWM10_2003]

Subtopic: fault manager up

Quote: the fault manager manages the list of problems for human administrators; summarizes the underlying error messages [»shapMW12_2004]

Subtopic: well-defined failure up

Quote: it is difficult to design functions that either succeed or fail in a well defined way; but, it is essential to divide a program into subsystems that either succeed or fail in a well defined way [»stroB_1991]
Quote: EROS truncates messages to undefined destinations; otherwise, fault handlers may lead to denial-of-service, buffering creates local state, and timeouts are not repeatable under load [»shapJS1_2002]

Subtopic: handle all inputs up

Quote: Hehner's 'if' statement is more robust than Dijkstra's 'do' because it fails if no guard is true, e.g., an unexpected state [»redeDH7_1979]
Quote: there must be an output for every possible input [»hamiM3_1976]
Quote: write functions that, given valid inputs, cannot fail [»maguS_1993]

Subtopic: safe code up

Quote: Java guarantees memory and type safety at runtime and compile time; programs cannot forge pointers, overrun arrays, or apply an operator to the wrong type [»hartPH12_2001]
Quote: review of literature on formal models of Java safety; applications to smartcards; need better models [»hartPH12_2001]
Quote: SafeTSA is compact, type-safe mobile code based on static single assignment; safe by construction with referential integrity, type separation, and type check elimination [»ammeW6_2001]
Quote: a safe module includes no unsafe interfaces or operations such as unchecked type transfer, address arithmetic, dispose object, and untraced allocations [»cardL_1991]
Quote: a language feature is unsafe if its misuse can corrupt the runtime system; e.g., array assignment without bounds checking [»nelsG_1991]
Quote: an error in a safe program can only cause a run-time error, a wrong answer, or an infinite loop
Quote: safe programs can share the same address space; no accidental side effects from other programs

Subtopic: isolate unsafe code up

Quote: a standard stream package for Modula-3; illustrates partially opaque types and the isolation of unsafe code; based on Topaz [»browMR_1991]
Quote: errors may be reported at compile time or run time; unchecked errors may only occur in unsafe modules [»cardL_1991]
Quote: the lowest level of a system is not safe; e.g., bus addresses for an I/O controller [»nelsG_1991]
Quote: Modula-3 and Cedar distinguish safe modules from unsafe ones; in the later, programmers must avoid memory corruption

Subtopic: self-repair up

Quote: Exterminator automatically corrects heap-based memory errors detected by a probablisitic debugging allocator (DieFast); diffs heap images to identify overlows and dangling pointers; fixed by padding objects and deferring allocation [»novaG6_2007]
Quote: self-repairing application by specifying the key data structure consistency constraints; automatically detect and repair violations to these constraints [»demsB10_2003]
Quote: self-repair by consistency constraints in disjunctive normal form; repair actions to restore basic propositions; picks the least perturbation and prevents cycles [»demsB10_2003]
Quote: Google stores docID, length, URL and document; with error log, can rebuild everything
Quote: a self-healing system corrects errors, produces telemetry for automated diagnosis, and provides recursive, fine-grained restart; simplified administration [»shapMW12_2004]

Subtopic: system restart up

Quote: software must allow restart; define a contract with descendant processes and the receipient for restart events [»shapMW12_2004]
Quote: restarting a telephone switch temporarily fixed a significant number of software-caused outages
Quote: master and chunkservers restart in seconds no matter how they are terminated; no abnormal termination [»gherS10_2003]
Quote: many operating systems will crash and require a complete restart; often due to incorrect coordination of concurrent activity; better now [»dennPJ_1980]

Subtopic: rollback recovery up

Quote: survey of automatic, rollback-recovery from checkpoints of message-passing systems [»elnoEN9_2002]
Quote: lookahead-rollback synchronization--execute until a conflict is discovered then roll back the offending processes and reexecute [»jeffDR7_1985]
Quote: message-passing systems may propagate rollback recovery because each message creates a dependency between sender and receiver; can domino to starting point [»elnoEN9_2002]
Quote: signals and interrupts do not work with many log-based, rollback-recovery protocols; they require piecewise determinism between messages [»slyeJH10_1998]

Subtopic: fault injection up

Quote: tested reliability under system crash by injecting faults; random bit flips in kernel; imitate programming errors such as pointer corruption, copy overrun, off-by-one; most crashes happened within 15 seconds [»chenPM9_1996]
Quote: test tolerance to unusual events by fault injections and a test for unsafe or unacceptable outcomes; e.g., change a variable's value [»voasF7_1997]
Quote: software is failure-tolerant if it produces acceptable results despite fault injections or malicious inputs [»voasF7_1997]

Subtopic: interrupts up

Quote: signals and interrupts do not work with many log-based, rollback-recovery protocols; they require piecewise determinism between messages [»slyeJH10_1998]

Subtopic: stateless up

Quote: REST interactions are stateless; they transfer representations of identified resources

Subtopic: typestate up

Quote: NIL's typestate interfaces meant that system testing did not reveal new errors in unit-testing; locality of errors was assured [»stroRE5_1985]
Quote: NIL limits side-effects by erroneous programs to inappropriate results of the correct type; by typestate checking [»stroRE5_1985]

Subtopic: static analysis up

Quote: Cyclone is a safe dialect of C; avoids buffer overflows, format string attacks, and memory management errors; static analysis plus run-time checks and annotations [»jimT6_2002]
Quote: CSSV for static analysis of buffer overflows in C; optional contract per procedure reduces to integer expressions; handles heap allocation, multi-level arrays, function pointers, casting; faster than authors' previous algorithm [»dorN6_2003]

Subtopic: self-stabilizing up

Quote: self-stabilizing algorithm for leader election under dynamic topologies [»schnM3_1993]
Quote: self-stabilizing, token-based system with no race conditions; if multiple tokens, guaranteed to remove all but one from system [»browGM6_1989]

Subtopic: machine code errors up

Quote: machine code programming allows any change; errors may be hard to trace, especially with index registers [»hoarCA_1974]

Subtopic: hardware errors up

Quote: intermittent machine errors are exasperating; run tests, check identities, make duplicate runs [»turiA3_1951]
Quote: use burst computations to workaround intermittent machine errors; if both burst runs differ, rerun using saved state

Subtopic: cost of reliability up

Quote: estimate costs when using formal methods; high-integrity systems are intrinsically expensive [»boweJP4_1995]
Quote: full formal development with machine-checked proofs is too expensive except for failure-critical applications

Subtopic: problems with code up

Quote: while abstract, vector and string templates are a perilously thin veneer over a mass of complexity; too easy to punch through the veneer [»briaM1_2001]
Quote: pointers weakly support relations; too complex for casual users; only degree 2, directional, complex data structures [»coddEF_1990]
Quote: AI researchers write their programs for other AI researchers; incredibly complex data structures [»coddEF_1990]

Subtopic: problems with hiding errors up

Quote: defensive programming is bad because it hides errors that should never happen [»maguS_1993]
Quote: use assertions to fail fast; identifies difficult to detect and diagnose bugs; do not hide problems [»shorJ9_2004]

Subtopic: problems with error handling up

Quote: error handling should be strictly hierarchical; otherwise get cycles of system dependencies if a function asks its caller for help with recovery or resource acquisition

Related Topics up

Group: exception handling   (12 topics, 314 quotes)
Group: program proving   (10 topics, 311 quotes)
Group: security   (23 topics, 874 quotes)
Group: software maintenance   (14 topics, 368 quotes)
Group: testing   (18 topics, 557 quotes)
Group: type checking   (12 topics, 392 quotes)

Topic: children vs. adults (33 items)
Topic: consistency testing (60 items)
Topic: constraints (35 items)
Topic: database consistency and reliability (15 items)
Topic: database security (12 items)
Topic: defensive programming (22 items)
Topic: deletion of information (11 items)
Topic: ease of learning (38 items)
Topic: ease of use (47 items)
Topic: error messages (37 items)
Topic: exception handling by recovery block or rescue clause (22 items)
Topic: exceptions from invalid input (4 items)
Topic: incremental testing (26 items)
Topic: log-structured rollback-recovery (13 items)
Topic: minimal manuals and guided exploration (44 items)
Topic: open systems (33 items)
Topic: operating system security (18 items)
Topic: optimistic update for concurrency control (35 items)
Topic: power fail recovery (6 items)
Topic: preventing accidental errors (37 items)
Topic: program proving is infeasible (47 items)
Topic: program web (8 items)
Topic: programming without errors (28 items)
Topic: proof-carrying code (7 items)
Topic: reliability of distributed systems (35 items)
Topic: resourceful, redundant systems for reliability (38 items)
Topic: run-time assertions (25 items)
Topic: safe use of pointers (102 items)
Topic: safety critical systems (32 items)
Topic: self-identifying data structures (18 items)
Topic: sensitivity of software to change (44 items)
Topic: testing by voting or N-version (10 items)
Topic: training wheels for the user interface (10 items)
Topic: type-safe and secure languages (43 items)
Topic: undoing actions in a UserInterface (23 items)
Topic: usability errors
(6 items)

Updated barberCB 8/05
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.