Map
Index
Random
Help
Topics
th

Topic: searching compressed data

topics > computer science > programming > Group: patterns



Group:
information retrieval

Topic:
approximate string matching and pattern matching with errors
Topic:
compressed data
Topic:
data compression algorithms
Topic:
database queries, joins, and relational algebra
Topic:
external search and sort
Topic:
search algorithms
Topic:
text compression

Summary

Searching compressed data can be faster than decompressing the data and then searching.

Simple compression schemes, such as byte-pair or Huffman encoding, are better for searching. (cbb 2/07)

Subtopic: searching block-sorted data up

Quote: use Burrows-Wheeler block-sorted data for dynamic searching; compressed, rapid searches; groups similar substrings together [»willK12_2003]

Subtopic: search compressed text up

Quote: string matching on LZW compressed text; up to 50% faster than decompress and search; LZgrep for gzip files [»navaG10_2005]
Quote: algorithm for pattern matching in UNIX Z-compressed files; O(n log m + m) for short patterns of length m [»amirA1_1994]
Quote: search compressed text with byte-pair compression; faster search, 30% compression, byte-pairs can not overlap [»manbU4_1997]
Quote: fast searching of compressed text using Huffman encoding, shift-or search, and Boyer-Moore filter; 30% compression, 2-8x faster than agrep [»demoES8_1998]
Quote: DCA compression and antidictionaries allow search and parallel algorithms because their encodings do not depend on context except for the block's prefix [»crocM7_1999]

Subtopic: regular expression up

Quote: first solution for regular expression searching in compressed text; faster than decompression plus search [»navaG7_2001]

Subtopic: approximate search up

Quote: first practical algorithm for approximate pattern matching over Ziv-Lempel compressed text; multipattern search; faster than decompressing plus searching [»navaG3_2001a]

Subtopic: search compressed database up

Quote: use a skipped index for fast queries in a compressed database
[»wittIH_1994]

Related Topics up

Group: information retrieval   (25 topics, 674 quotes)

Topic: approximate string matching and pattern matching with errors (19 items)
Topic: compressed data (16 items)
Topic: data compression algorithms (53 items)
Topic: database queries, joins, and relational algebra (34 items)
Topic: external search and sort (23 items)
Topic: search algorithms (40 items)
Topic: text compression
(17 items)


Updated barberCB 10/05
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.