Th Topic: spelling errors

Topic: spelling errors

topics > computer science > Group: document preparation

Topic:
approximate string matching and pattern matching with errors
Topic:
bugs
Topic:
hash filter
Topic:
pattern matching
Topic:
text editing
Topic:
word processing
Topic:
words in natural languages

Summary

Most spelling errors are one letter wrong, a missing letter, an extra letter, or two letters transposed. A typist frequently makes the same mistakes. Some systems have automatic correction of spelling errors based on a dictionary of correct spellings. Others correct mistakes previously corrected. (cbb 5/80)

Subtopic: spelling errors

Quote: the largest source of errors in nucleotide sequence data is transcription errors during publication
QuoteRef: dameF3_1964 ;; damerau 64: 80% spelling errors by one letter wrong or missing, extra letter inserted, two adjacent chars transposed -- Morgan confirmed
Quote: most misspelled word may be 'accommodate' [»mcilMD1_1982]
Quote: the number of distinct words in a text is nearly linear in the size of the text; e.g., misspellings [»moffA3_1994]

Subtopic: correcting errors

QuoteRef: dameF3_1964 ;; correct spelling errors by testing for match against possible correct words by syntax
QuoteRef: peteJL12_1980 ;;Computer programs for detecting and correcting spelling errors
Quote: spelling corrector based on the Viterbi algorithm and a tree-structured dictionary [»srihSN_1982, OK]
Quote: Z attempts to correct a misspelling by testing letter deletions, transpositions, and missing letters [»woodSR6_1981]
Quote: if a dictionary can't locate a word, should be able to locate similar words [»heckP8_1982]

Subtopic: data structures for spelling correction

Quote: Z uses a Bloom filter for checking spelling in a dictionary [»woodSR6_1981]
Quote: compress hash values for spelling list by storing differences; partition into M bins for speed [»mcilMD1_1982]
Quote: use a length-segmented list for static, spelling dictionaries; e.g., sort by word length and then by collating sequence [»turbTN6_1981]

Subtopic: trie data structure

Quote: implement a trie with a double array; 3-5x faster retrieval, 5-9x slower insertion, 17% smaller than list [»aoeJI9_1992]
Quote: Scrabble game uses a 'dawg'; i.e., a trie with merged sub-tries; takes a fifth the space [»appeAW5_1988]
Quote: trie hash by sort on leading digits with split between buckets by a prefix; better than B-tree, can 100% load for ordered insertion [»litwWA7_1991]
Quote: implement a trie with a double array; 3-5x faster retrieval, 5-9x slower insertion, 17% smaller than list [»aoeJI9_1992]
Quote: Bonsai trees compress large trie structures to 1/3; encodes indices with 32 by 32 bit multiplication [»darrJJ3_1993]
Quote: a pointer-free trie structure for very large sets on secondary storage; 2 bits per node; organized into pages in a tree [»shanH8_1996]

Related Topics

Topic: approximate string matching and pattern matching with errors (19 items)
Topic: bugs (66 items)
Topic: hash filter (18 items)
Topic: pattern matching (42 items)
Topic: text editing (34 items)
Topic: word processing (13 items)
Topic: words in natural languages
(40 items)