Group: information retrieval
Topic: hash filter
Topic: pattern matching
Topic: search algorithms
Topic: searching compressed data
Topic: signature files
Topic: spelling errors
Topic: string operations
Topic: suffix trie and suffix array
| |
Summary
Text contains spelling errors and users make spelling errors in entering a query. Genetic sequence alignment is another application of approximate string matching. Similar sequences indicate a common ancestor.
Here are several algorithms for approximate string matching. (cbb 4/98)
Subtopic: approximate matching
Quote: simple, fast algorithm for approximate string matching; special cases k-break periodic strings; uses suffix tree; short, overlapping text partitions; pattern at least 5k^3 where k is the maximum edit distance [»coleR9_2002]
| Quote: survey of approximate string matching; focus on edit distance; includes performance tests [»navaG3_2001]
| Quote: Glimpse features a small index and approximate matching; divide data into blocks, index the blocks, full text search of possible matches [»manbU10_1993]
| Quote: about approximate matching of strings, e.g., due to spelling mistakes [»hallPA12_1980]
| Quote: fast algorithms for approximate string matching with hashing [»duMW8_1994]
| Quote: simple, efficient algorithm for pattern matching with k mismatches; based on shift-add and character skip; O(nk) [»elmaN6_1996]
| Quote: use trie methods for k-approximate string matching; 4 times faster than agrep for k=1, otherwise slower [»shanH8_1996]
| Quote: use dynamic suffix arrays to detect, on-line, the occurrences of a string subject to incremental insertions and deletions [»ferrP1_1995]
| Quote: perform approximate matching on words in index, then use inverted index to locate data [»manbU10_1993]
| Quote: approximate matching with k errors by exact matching on k+1 parts and search the resulting matches [»manbU4_1997]
| Quote: use trie methods for k-approximate string matching; 4 times faster than agrep for k=1, otherwise slower [»shanH8_1996]
| Quote: improved filtration algorithm for approximate string matching with moderate error levels; faster than agrep [»navaG1_1999]
| Subtopic: approximate range
Quote: optimal, approximate range query using skip quadtrees; produces a convex or non-convex k-fat region [»eppsD6_2005]
| Subtopic: compressed text
Quote: first practical algorithm for approximate pattern matching over Ziv-Lempel compressed text; multipattern search; faster than decompressing plus searching [»navaG3_2001a]
| Subtopic: n-grams, proper names, membership
Quote: gives approximate membership tester that does not need an independent sequence of hash functions [»cartL5_1978]
| Quote: Synoname matches personal names by heuristics derived from correct matches of personal names [»grosAD4_1991]
| Quote: n-grams of text streams are efficient, robust, and capture statistically significant phrases [»coheJD12_1998]
| Subtopic: misspelling, and transcription errors
Quote: the number of distinct words in a text is nearly linear in the size of the text; e.g., misspellings [»moffA3_1994]
| Quote: the largest source of errors in nucleotide sequence data is transcription errors during publication
|
Related Topics
Group: information retrieval (25 topics, 674 quotes)
Topic: hash filter (18 items)
Topic: pattern matching (42 items)
Topic: search algorithms (40 items)
Topic: searching compressed data (9 items)
Topic: signature files (21 items)
Topic: spelling errors (18 items)
Topic: string operations (20 items)
Topic: suffix trie and suffix array (20 items)
|