Topic: suffix trie and suffix array

topics > computer science > data > Group: data structures

approximate string matching and pattern matching with errors
compressed data
pattern matching
search algorithms
text compression


A suffix trie is a tree whose nodes represents suffices. The root represents all possible strings in a text.

Suffix tries identify common subsequences. A suffix trie may need compression. (cbb 6/06)

Subtopic: searching suffix tries and suffix arrays up

Quote: with suffix trie, search time independent of text length [»nelsMR8_1996]
Quote: use compressed suffix arrays and trees to search for short strings in O(1) time with O(n) space [»grosR5_2000]
Quote: implement a trie with a double array; 3-5x faster retrieval, 5-9x slower insertion, 17% smaller than list [»aoeJI9_1992]
Quote: simple, fast algorithm for approximate string matching; special cases k-break periodic strings; uses suffix tree; short, overlapping text partitions; pattern at least 5k^3 where k is the maximum edit distance [»coleR9_2002]
Quote: grep pattern matching by suffix array of vocabulary or by inverted index of digrams or trigrams [»zobeJ7_2006]

Subtopic: constructing suffix trees up

Quote: survey of suffix array construction algorithms; more space efficient than suffix trees [»puglSJ6_2007]
Quote: Maniscalco & Puglisi's algorithm was fastest for suffix array construction; best three algorithms had a working memory of 5-6n bytes [»puglSJ6_2007]
Quote: on-line construction of suffix tree in O(N) [»nelsMR8_1996]
Quote: could construct suffix tree for human genome in 46G memory and nine hours [»kurtS11_1999]
Quote: wotdlazy, an efficient algorithm for write-only, top-down construction of suffix trees; less than 12 bytes per input character; faster and less space than McCreight's hash table implementation [»giegR7_1999]

Subtopic: space efficient tries up

Quote: use Kurtz' space efficient suffix trees in 10n space, McCreight's constructor in O(kn) time, and an indexed root [»balkB10_2000]
Quote: space efficient suffix trees; 10 bytes per input character, just as fast; best for binary and text [»kurtS11_1999]
Quote: Bonsai trees compress large trie structures to 1/3; encodes indices with 32 by 32 bit multiplication [»darrJJ3_1993]
Quote: Scrabble game uses a 'dawg'; i.e., a trie with merged sub-tries; takes a fifth the space [»appeAW5_1988]
Quote: simple, space efficient suffix trees using balanced parentheses; O(n lg n) space and O(m) search [»munrJI5_2001]

Subtopic: burst trie up

Quote: use burst trie for sorted data; containers (e.g., binary tree) accessed by a trie; better performance than binary tree, splay tree, ternary tree; only 25% slower than hash table [»heinS4_2002]
Quote: trend heuristic for bursting by using one counter per container; start value with bonuses and penalties; burst container when 0 [»heinS4_2002]

Subtopic: dynamic trie up

Quote: use dynamic suffix arrays to detect, on-line, the occurrences of a string subject to incremental insertions and deletions [»ferrP1_1995]

Subtopic: common substring up

Quote: compress assembly code by identifying common substrings via suffix trees; 7% compression on average [»frasCW6_1984]

Subtopic: problems with suffix arrays up

Quote: while fast for grep, suffix arrays do not scale well; no compression, 1.7x memory-resident data, no ranked queries

Related Topics up

Topic: approximate string matching and pattern matching with errors (19 items)
Topic: compressed data (16 items)
Topic: pattern matching (42 items)
Topic: search algorithms (40 items)
Topic: strings (13 items)
Topic: text compression (17 items)
Topic: trees
(21 items)

Updated barberCB 6/05
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.