Topic: approximate string matching and pattern matching with errors
Topic: external search and sort
Topic: hash table and hash functions
Topic: hash filter
Topic: information retrieval by searching
Topic: information retrieval with queries
Topic: one-way hash function
| |
Summary
A signature file stores properties that help identify files. A query retrieves those files with the same signatures, and then scans the files for false positives. A signature file may use a bit string, like a hash filter. (cbb 11/07)
Subtopic: signature file as content filter
Quote: use filter to retrieve messages by contents; visual inspection to find relevant messages [»tsicD1_1983]
| Quote: use a signature file to speed filtered searches [»tsicD1_1983]
| Quote: assign a abstraction signature to each message; a query scans the signature file and rejects nonqualifying messages [»faloC7_1987]
| Quote: signature files good for archives; 10% space overhead, append-only, sequential scans [»faloC_1988]
| Quote: signature files use derived signatures from data objects and queries; must resolve false drops; size is 10-20% of database [»zezuP10_1991]
| Subtopic: superimposed coding
Quote: signature files (superimposed coding) by setting m_i bits of a signature for each string in field i [»faloC_1988]
| Quote: signatures for message filters should be about fifteen bits [»tsicD1_1983]
| Quote: optimal signatures should set half of the bits [»faloC_1988]
| Quote: superimposed coding hashes each word to a sparse bit string; OR all words to form the object signature; weight is the number of ones [»zezuP10_1991]
| Quote: an object signature is the OR-ing of signatures for each word; a word signature is a hash function that sets m-bits
| Subtopic: hash access to signature file
Quote: quick filter partitions a signature file; uses linear hashing; for large files of dynamic data and multimedia with high weight queries [»zezuP10_1991]
| Subtopic: compressed signature file
Quote: a compressed signature file is like a hash index; 10-30% space overhead, fast retrieval and insertion, suitable for write-only [»faloC9_1988]
| Quote: a bit-sliced signature file stores each bit position separately; only retrieve 1-bits; want to access one file and improve insertions [»faloC9_1988]
| Quote: a typical signature file has 600-1000 bit positions
| Quote: compress bit-sliced signature files with posting buckets of target files and optional word offsets; may use secondary hash and two levels [»faloC9_1988]
| Subtopic: optimization
Quote: a signature tree improves signature file scanning by 10x [»chenY5_2002]
| Subtopic: page signature
Quote: optimal distributed algorithm for identifying corrupted files by exchanging page signatures [»abdeKA1_1994]
| Subtopic: block index
Quote: Glimpse features a small index and approximate matching; divide data into blocks, index the blocks, full text search of possible matches [»manbU10_1993]
| Subtopic: comparative studies
Quote: simple evaluation of Quick Filter and Fixed Prefix performance for partitioning of signature files [»ciacP4_1993]
| Subtopic: problems with signature file
Quote: problems with signature files -- false matches are linear in the collection size, large index, more disk accesses for short queries, and no ranked queries [»zobeJ7_2006]
| Quote: besides providing an access path for retrieval, an index describes the term usage of a database; signature files lose this information [»eastCM1_1988]
|
Related Topics
Topic: approximate string matching and pattern matching with errors (19 items)
Topic: external search and sort (23 items)
Topic: hash table and hash functions (41 items)
Topic: hash filter (18 items)
Topic: information retrieval by searching (35 items)
Topic: information retrieval with queries (18 items)
Topic: one-way hash function (24 items)
|