Topic: approximate string matching and pattern matching with errors
Topic: bugs
Topic: hash filter
Topic: pattern matching
Topic: text editing
Topic: word processing
Topic: words in natural languages
| |
Summary
Most spelling errors are one letter wrong, a missing letter, an extra letter, or two letters transposed. A typist frequently makes the same mistakes. Some systems have automatic correction of spelling errors based on a dictionary of correct spellings. Others correct mistakes previously corrected. (cbb 5/80)
Subtopic: spelling errors
Quote: the largest source of errors in nucleotide sequence data is transcription errors during publication
| QuoteRef: dameF3_1964 ;; damerau 64: 80% spelling errors by one letter wrong or missing, extra letter inserted, two adjacent chars transposed -- Morgan confirmed
| Quote: most misspelled word may be 'accommodate' [»mcilMD1_1982]
| Quote: the number of distinct words in a text is nearly linear in the size of the text; e.g., misspellings [»moffA3_1994]
| Subtopic: correcting errors
QuoteRef: dameF3_1964 ;; correct spelling errors by testing for match against possible correct words by syntax
| QuoteRef: peteJL12_1980 ;;Computer programs for detecting and correcting spelling errors
| Quote: spelling corrector based on the Viterbi algorithm and a tree-structured dictionary [»srihSN_1982, OK]
| Quote: Z attempts to correct a misspelling by testing letter deletions, transpositions, and missing letters [»woodSR6_1981]
| Quote: if a dictionary can't locate a word, should be able to locate similar words [»heckP8_1982]
| Subtopic: data structures for spelling correction
Quote: Z uses a Bloom filter for checking spelling in a dictionary [»woodSR6_1981]
| Quote: compress hash values for spelling list by storing differences; partition into M bins for speed [»mcilMD1_1982]
| Quote: use a length-segmented list for static, spelling dictionaries; e.g., sort by word length and then by collating sequence [»turbTN6_1981]
| Subtopic: trie data structure
Quote: implement a trie with a double array; 3-5x faster retrieval, 5-9x slower insertion, 17% smaller than list [»aoeJI9_1992]
| Quote: Scrabble game uses a 'dawg'; i.e., a trie with merged sub-tries; takes a fifth the space [»appeAW5_1988]
| Quote: trie hash by sort on leading digits with split between buckets by a prefix; better than B-tree, can 100% load for ordered insertion [»litwWA7_1991]
| Quote: implement a trie with a double array; 3-5x faster retrieval, 5-9x slower insertion, 17% smaller than list [»aoeJI9_1992]
| Quote: Bonsai trees compress large trie structures to 1/3; encodes indices with 32 by 32 bit multiplication [»darrJJ3_1993]
| Quote: a pointer-free trie structure for very large sets on secondary storage; 2 bits per node; organized into pages in a tree [»shanH8_1996]
|
Related Topics
Topic: approximate string matching and pattern matching with errors (19 items)
Topic: bugs (66 items)
Topic: hash filter (18 items)
Topic: pattern matching (42 items)
Topic: text editing (34 items)
Topic: word processing (13 items)
Topic: words in natural languages (40 items)
|