Group: exception handling
Group: program proving
Group: security
Group: software maintenance
Group: testing
Group: type checking
Topic: children vs. adults
Topic: consistency testing
Topic: constraints
Topic: database consistency and reliability
Topic: database security
Topic: defensive programming
Topic: deletion of information
Topic: ease of learning
Topic: ease of use
Topic: error messages
Topic: exception handling by recovery block or rescue clause
Topic: exceptions from invalid input
Topic: incremental testing
Topic: log-structured rollback-recovery
Topic: minimal manuals and guided exploration
Topic: open systems
Topic: operating system security
Topic: optimistic update for concurrency control
Topic: power fail recovery
Topic: preventing accidental errors
Topic: program proving is infeasible
Topic: program web
Topic: programming without errors
Topic: proof-carrying code
Topic: reliability of distributed systems
Topic: resourceful, redundant systems for reliability
Topic: run-time assertions
Topic: safe use of pointers
Topic: safety critical systems
Topic: self-identifying data structures
Topic: sensitivity of software to change
Topic: testing by voting or N-version
Topic: training wheels for the user interface
Topic: type-safe and secure languages
Topic: undoing actions in a UserInterface
Topic: usability errors
| |
Summary
Error safe systems are needed because all combinations of logical errors can not be tested. Only through reliability is a system predictable and hence useful.
Both users and programmers will make errors. These errors should be easy to detect and correct. They shouldn't cause irreversible damage. For instance destroyed text or data should be at least temporarily recoverable. Typographical errors should be easily detected and all errors should be detected as early as possible.
Robust or fault-tolerant software performs despite errors and unexpected events. For instance file creation can occur in stages to maintain database consistency despite system failure. Redundant information should be stored with critical structures. For instance, disks should be reconstructible without a file directory. All inputs should have an output. (cbb 5/80)
Subtopic: trustworthy
Quote: a reliable, error-free system is a predictable one; it does what it was intended to do [»hamiM_1978]
| Quote: reliability requires simplicity [»lampBW10_1983]
| Quote: all actions should be atomic or restartable; i.e., use transactions [»lampBW10_1983]
| Quote: the proper product of programming is arguments that a program is a trustworthy solution [»dijkEW_1982]
| Quote: a disk is apparently safer than battery-backed memory because it is much harder to change [»chenPM9_1996]
| Quote: the Web has anarchic scalability; works despite load, malicious data, unknown servers, no state, no lists of back references, multiple trust boundaries [»fielRT5_2002]
| Quote: software errors caused less downtime of the telephone network, 2%, than any other source of failure except vandalism
| Subtopic: reliability despite change
Quote: systems built under Mesa will usually remain reliability after changes to interfaces [»laueHC_1979]
| Subtopic: security despite attack
Quote: assurance and authenticated operation are important security goals; assurance is correct behavior despite attacks [»englP7_2003]
| Subtopic: robust library
Quote: a robust library expresses constraints as types, reports errors relative to user, and does not cause application errors [»fricA4_2000]
| Quote: Cyclone is a safe dialect of C; avoids buffer overflows, format string attacks, and memory management errors; static analysis plus run-time checks and annotations [»jimT6_2002]
| Subtopic: always check
Quote: never trust the hardware; compute a checksum with the data; store them separately on disk; recover data if they do not agree [»browD9_2007]
| Quote: principle-driven design allows effective checks for errors; in eight years, every EROS kernel bug was caught by an assertion check [»shapJS1_2002]
| Quote: it is absurd to make elaborate security checks on debugging runs, and then remove them in production runs, when an erroneous result could be disastrous [»hoarCA_1974]
| Quote: do not disable assertions in the field; needed for identifying problems [»shorJ9_2004]
| Quote: errors should be checked at multiple levels to ensure that all errors are caught somewhere [»maclBJ_1987]
| Quote: half of the software in telephone switches concerns error detection and correction; this may explain the low outage rate due to software [»kuhnDR4_1997]
| Subtopic: allow for error
Quote: high-quality software should keep working in spite of anticipated glitches [»wirfR7_2006]
| Quote: on failure, silently ignore the request, admit failure, resign, retry, alternative action, or appeal to a higher authority [»wirfR7_2006]
| Quote: if an error is possible, someone will make it; should be easily detected, minimal consequences, and reversible effects [»normDA_1988]
| Quote: design the system to allow for errors: make errors easy to discover and correct [»normDA_1988]
| Quote: nothing can prove the absence of bugs; verification can not prevent failures from unforeseen causes [»abraP12_1986]
| Quote: create your ideas first, design for failure, let users complain about errors, make assumptions and intentions explicit, iterate quickly [»frisL1_2006]
| Subtopic: expect failure
Quote: services are expected to fail; the computer disappears; mass appeal from new, updated, improved, etc. [»tursWM10_2003]
| Subtopic: fault manager
Quote: the fault manager manages the list of problems for human administrators; summarizes the underlying error messages [»shapMW12_2004]
| Subtopic: well-defined failure
Quote: it is difficult to design functions that either succeed or fail in a well defined way; but, it is essential to divide a program into subsystems that either succeed or fail in a well defined way [»stroB_1991]
| Quote: EROS truncates messages to undefined destinations; otherwise, fault handlers may lead to denial-of-service, buffering creates local state, and timeouts are not repeatable under load [»shapJS1_2002]
| Subtopic: handle all inputs
Quote: Hehner's 'if' statement is more robust than Dijkstra's 'do' because it fails if no guard is true, e.g., an unexpected state [»redeDH7_1979]
| Quote: there must be an output for every possible input [»hamiM3_1976]
| Quote: write functions that, given valid inputs, cannot fail [»maguS_1993]
| Subtopic: safe code
Quote: Java guarantees memory and type safety at runtime and compile time; programs cannot forge pointers, overrun arrays, or apply an operator to the wrong type [»hartPH12_2001]
| Quote: review of literature on formal models of Java safety; applications to smartcards; need better models [»hartPH12_2001]
| Quote: SafeTSA is compact, type-safe mobile code based on static single assignment; safe by construction with referential integrity, type separation, and type check elimination [»ammeW6_2001]
| Quote: a safe module includes no unsafe interfaces or operations such as unchecked type transfer, address arithmetic, dispose object, and untraced allocations [»cardL_1991]
| Quote: a language feature is unsafe if its misuse can corrupt the runtime system; e.g., array assignment without bounds checking [»nelsG_1991]
| Quote: an error in a safe program can only cause a run-time error, a wrong answer, or an infinite loop
| Quote: safe programs can share the same address space; no accidental side effects from other programs
| Subtopic: isolate unsafe code
Quote: a standard stream package for Modula-3; illustrates partially opaque types and the isolation of unsafe code; based on Topaz [»browMR_1991]
| Quote: errors may be reported at compile time or run time; unchecked errors may only occur in unsafe modules [»cardL_1991]
| Quote: the lowest level of a system is not safe; e.g., bus addresses for an I/O controller [»nelsG_1991]
| Quote: Modula-3 and Cedar distinguish safe modules from unsafe ones; in the later, programmers must avoid memory corruption
| Subtopic: self-repair
Quote: Exterminator automatically corrects heap-based memory errors detected by a probablisitic debugging allocator (DieFast); diffs heap images to identify overlows and dangling pointers; fixed by padding objects and deferring allocation [»novaG6_2007]
| Quote: self-repairing application by specifying the key data structure consistency constraints; automatically detect and repair violations to these constraints [»demsB10_2003]
| Quote: self-repair by consistency constraints in disjunctive normal form; repair actions to restore basic propositions; picks the least perturbation and prevents cycles [»demsB10_2003]
| Quote: Google stores docID, length, URL and document; with error log, can rebuild everything
| Quote: a self-healing system corrects errors, produces telemetry for automated diagnosis, and provides recursive, fine-grained restart; simplified administration [»shapMW12_2004]
| Subtopic: system restart
Quote: software must allow restart; define a contract with descendant processes and the receipient for restart events [»shapMW12_2004]
| Quote: restarting a telephone switch temporarily fixed a significant number of software-caused outages
| Quote: master and chunkservers restart in seconds no matter how they are terminated; no abnormal termination [»gherS10_2003]
| Quote: many operating systems will crash and require a complete restart; often due to incorrect coordination of concurrent activity; better now [»dennPJ_1980]
| Subtopic: rollback recovery
Quote: survey of automatic, rollback-recovery from checkpoints of message-passing systems [»elnoEN9_2002]
| Quote: lookahead-rollback synchronization--execute until a conflict is discovered then roll back the offending processes and reexecute [»jeffDR7_1985]
| Quote: message-passing systems may propagate rollback recovery because each message creates a dependency between sender and receiver; can domino to starting point [»elnoEN9_2002]
| Quote: signals and interrupts do not work with many log-based, rollback-recovery protocols; they require piecewise determinism between messages [»slyeJH10_1998]
| Subtopic: fault injection
Quote: tested reliability under system crash by injecting faults; random bit flips in kernel; imitate programming errors such as pointer corruption, copy overrun, off-by-one; most crashes happened within 15 seconds [»chenPM9_1996]
| Quote: test tolerance to unusual events by fault injections and a test for unsafe or unacceptable outcomes; e.g., change a variable's value [»voasF7_1997]
| Quote: software is failure-tolerant if it produces acceptable results despite fault injections or malicious inputs [»voasF7_1997]
| Subtopic: interrupts
Quote: signals and interrupts do not work with many log-based, rollback-recovery protocols; they require piecewise determinism between messages [»slyeJH10_1998]
| Subtopic: stateless
Quote: REST interactions are stateless; they transfer representations of identified resources
| Subtopic: typestate
Quote: NIL's typestate interfaces meant that system testing did not reveal new errors in unit-testing; locality of errors was assured [»stroRE5_1985]
| Quote: NIL limits side-effects by erroneous programs to inappropriate results of the correct type; by typestate checking [»stroRE5_1985]
| Subtopic: static analysis
Quote: Cyclone is a safe dialect of C; avoids buffer overflows, format string attacks, and memory management errors; static analysis plus run-time checks and annotations [»jimT6_2002]
| Quote: CSSV for static analysis of buffer overflows in C; optional contract per procedure reduces to integer expressions; handles heap allocation, multi-level arrays, function pointers, casting; faster than authors' previous algorithm [»dorN6_2003]
| Subtopic: self-stabilizing
Quote: self-stabilizing algorithm for leader election under dynamic topologies [»schnM3_1993]
| Quote: self-stabilizing, token-based system with no race conditions; if multiple tokens, guaranteed to remove all but one from system [»browGM6_1989]
| Subtopic: machine code errors
Quote: machine code programming allows any change; errors may be hard to trace, especially with index registers [»hoarCA_1974]
| Subtopic: hardware errors
Quote: intermittent machine errors are exasperating; run tests, check identities, make duplicate runs [»turiA3_1951]
| Quote: use burst computations to workaround intermittent machine errors; if both burst runs differ, rerun using saved state
| Subtopic: cost of reliability
Quote: estimate costs when using formal methods; high-integrity systems are intrinsically expensive [»boweJP4_1995]
| Quote: full formal development with machine-checked proofs is too expensive except for failure-critical applications
| Subtopic: problems with code
Quote: while abstract, vector and string templates are a perilously thin veneer over a mass of complexity; too easy to punch through the veneer [»briaM1_2001]
| Quote: pointers weakly support relations; too complex for casual users; only degree 2, directional, complex data structures [»coddEF_1990]
| Quote: AI researchers write their programs for other AI researchers; incredibly complex data structures [»coddEF_1990]
| Subtopic: problems with hiding errors
Quote: defensive programming is bad because it hides errors that should never happen [»maguS_1993]
| Quote: use assertions to fail fast; identifies difficult to detect and diagnose bugs; do not hide problems [»shorJ9_2004]
| Subtopic: problems with error handling
Quote: error handling should be strictly hierarchical; otherwise get cycles of system dependencies if a function asks its caller for help with recovery or resource acquisition [»stroB_1991]
|
Related Topics
Group: exception handling (12 topics, 314 quotes)
Group: program proving (10 topics, 311 quotes)
Group: security (23 topics, 874 quotes)
Group: software maintenance (14 topics, 368 quotes)
Group: testing (18 topics, 557 quotes)
Group: type checking (12 topics, 392 quotes)
Topic: children vs. adults (33 items)
Topic: consistency testing (60 items)
Topic: constraints (35 items)
Topic: database consistency and reliability (15 items)
Topic: database security (12 items)
Topic: defensive programming (22 items)
Topic: deletion of information (11 items)
Topic: ease of learning (38 items)
Topic: ease of use (47 items)
Topic: error messages (37 items)
Topic: exception handling by recovery block or rescue clause (22 items)
Topic: exceptions from invalid input (4 items)
Topic: incremental testing (26 items)
Topic: log-structured rollback-recovery (13 items)
Topic: minimal manuals and guided exploration (44 items)
Topic: open systems (33 items)
Topic: operating system security (18 items)
Topic: optimistic update for concurrency control (35 items)
Topic: power fail recovery (6 items)
Topic: preventing accidental errors (37 items)
Topic: program proving is infeasible (47 items)
Topic: program web (8 items)
Topic: programming without errors (28 items)
Topic: proof-carrying code (7 items)
Topic: reliability of distributed systems (35 items)
Topic: resourceful, redundant systems for reliability (38 items)
Topic: run-time assertions (25 items)
Topic: safe use of pointers (102 items)
Topic: safety critical systems (32 items)
Topic: self-identifying data structures (18 items)
Topic: sensitivity of software to change (44 items)
Topic: testing by voting or N-version (10 items)
Topic: training wheels for the user interface (10 items)
Topic: type-safe and secure languages (43 items)
Topic: undoing actions in a UserInterface (23 items)
Topic: usability errors (6 items)
|