Th Topic: approximate string matching and pattern matching with errors

Topic: approximate string matching and pattern matching with errors

topics > computer science > programming > Group: patterns

Group:
information retrieval

Topic:
hash filter
Topic:
pattern matching
Topic:
search algorithms
Topic:
searching compressed data
Topic:
signature files
Topic:
spelling errors
Topic:
string operations
Topic:
suffix trie and suffix array

Summary

Text contains spelling errors and users make spelling errors in entering a query. Genetic sequence alignment is another application of approximate string matching. Similar sequences indicate a common ancestor.
Here are several algorithms for approximate string matching. (cbb 4/98)

Subtopic: approximate matching

Quote: simple, fast algorithm for approximate string matching; special cases k-break periodic strings; uses suffix tree; short, overlapping text partitions; pattern at least 5k^3 where k is the maximum edit distance [»coleR9_2002]
Quote: survey of approximate string matching; focus on edit distance; includes performance tests [»navaG3_2001]
Quote: Glimpse features a small index and approximate matching; divide data into blocks, index the blocks, full text search of possible matches [»manbU10_1993]
Quote: about approximate matching of strings, e.g., due to spelling mistakes [»hallPA12_1980]
Quote: fast algorithms for approximate string matching with hashing [»duMW8_1994]
Quote: simple, efficient algorithm for pattern matching with k mismatches; based on shift-add and character skip; O(nk) [»elmaN6_1996]
Quote: use trie methods for k-approximate string matching; 4 times faster than agrep for k=1, otherwise slower [»shanH8_1996]
Quote: use dynamic suffix arrays to detect, on-line, the occurrences of a string subject to incremental insertions and deletions [»ferrP1_1995]
Quote: perform approximate matching on words in index, then use inverted index to locate data [»manbU10_1993]
Quote: approximate matching with k errors by exact matching on k+1 parts and search the resulting matches [»manbU4_1997]
Quote: use trie methods for k-approximate string matching; 4 times faster than agrep for k=1, otherwise slower [»shanH8_1996]
Quote: improved filtration algorithm for approximate string matching with moderate error levels; faster than agrep [»navaG1_1999]

Subtopic: approximate range

Quote: optimal, approximate range query using skip quadtrees; produces a convex or non-convex k-fat region [»eppsD6_2005]

Subtopic: compressed text

Quote: first practical algorithm for approximate pattern matching over Ziv-Lempel compressed text; multipattern search; faster than decompressing plus searching [»navaG3_2001a]

Subtopic: n-grams, proper names, membership

Quote: gives approximate membership tester that does not need an independent sequence of hash functions [»cartL5_1978]
Quote: Synoname matches personal names by heuristics derived from correct matches of personal names [»grosAD4_1991]
Quote: n-grams of text streams are efficient, robust, and capture statistically significant phrases [»coheJD12_1998]

Subtopic: misspelling, and transcription errors

Quote: the number of distinct words in a text is nearly linear in the size of the text; e.g., misspellings [»moffA3_1994]
Quote: the largest source of errors in nucleotide sequence data is transcription errors during publication

Related Topics

Group: information retrieval (25 topics, 674 quotes)
Topic: hash filter (18 items)
Topic: pattern matching (42 items)
Topic: search algorithms (40 items)
Topic: searching compressed data (9 items)
Topic: signature files (21 items)
Topic: spelling errors (18 items)
Topic: string operations (20 items)
Topic: suffix trie and suffix array
(20 items)