Topic: signature files

topics > computer science > information > Group: information retrieval

approximate string matching and pattern matching with errors
external search and sort
hash table and hash functions
hash filter
information retrieval by searching
information retrieval with queries
one-way hash function


A signature file stores properties that help identify files. A query retrieves those files with the same signatures, and then scans the files for false positives. A signature file may use a bit string, like a hash filter. (cbb 11/07)
Subtopic: signature file as content filter up

Quote: use filter to retrieve messages by contents; visual inspection to find relevant messages [»tsicD1_1983]
Quote: use a signature file to speed filtered searches [»tsicD1_1983]
Quote: assign a abstraction signature to each message; a query scans the signature file and rejects nonqualifying messages [»faloC7_1987]
Quote: signature files good for archives; 10% space overhead, append-only, sequential scans [»faloC_1988]
Quote: signature files use derived signatures from data objects and queries; must resolve false drops; size is 10-20% of database [»zezuP10_1991]

Subtopic: superimposed coding up

Quote: signature files (superimposed coding) by setting m_i bits of a signature for each string in field i [»faloC_1988]
Quote: signatures for message filters should be about fifteen bits [»tsicD1_1983]
Quote: optimal signatures should set half of the bits [»faloC_1988]
Quote: superimposed coding hashes each word to a sparse bit string; OR all words to form the object signature; weight is the number of ones [»zezuP10_1991]
Quote: an object signature is the OR-ing of signatures for each word; a word signature is a hash function that sets m-bits

Subtopic: hash access to signature file up

Quote: quick filter partitions a signature file; uses linear hashing; for large files of dynamic data and multimedia with high weight queries [»zezuP10_1991]

Subtopic: compressed signature file up

Quote: a compressed signature file is like a hash index; 10-30% space overhead, fast retrieval and insertion, suitable for write-only [»faloC9_1988]
Quote: a bit-sliced signature file stores each bit position separately; only retrieve 1-bits; want to access one file and improve insertions [»faloC9_1988]
Quote: a typical signature file has 600-1000 bit positions
Quote: compress bit-sliced signature files with posting buckets of target files and optional word offsets; may use secondary hash and two levels [»faloC9_1988]

Subtopic: optimization up

Quote: a signature tree improves signature file scanning by 10x [»chenY5_2002]

Subtopic: page signature up

Quote: optimal distributed algorithm for identifying corrupted files by exchanging page signatures [»abdeKA1_1994]

Subtopic: block index up

Quote: Glimpse features a small index and approximate matching; divide data into blocks, index the blocks, full text search of possible matches [»manbU10_1993]

Subtopic: comparative studies up

Quote: simple evaluation of Quick Filter and Fixed Prefix performance for partitioning of signature files [»ciacP4_1993]

Subtopic: problems with signature file up

Quote: problems with signature files -- false matches are linear in the collection size, large index, more disk accesses for short queries, and no ranked queries [»zobeJ7_2006]
Quote: besides providing an access path for retrieval, an index describes the term usage of a database; signature files lose this information

Related Topics up

Topic: approximate string matching and pattern matching with errors (19 items)
Topic: external search and sort (23 items)
Topic: hash table and hash functions (41 items)
Topic: hash filter (18 items)
Topic: information retrieval by searching (35 items)
Topic: information retrieval with queries (18 items)
Topic: one-way hash function
(24 items)

Updated barberCB 1/05
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.