Topic: hash filter

topics > computer science > information > Group: information retrieval

approximate string matching and pattern matching with errors
hash table and hash functions
signature files
information filters
spelling errors


A hash filter removes uninteresting entries from a retrieval set. It is an approximate membership tester.

A bloom filter hashes values to bits. It matches if all bits are set. (cbb 10/07)

Subtopic: Bloom filter up

Quote: Bloom filter (failure hash) by hashing to multiple bits; need at least half 0's and constant time processing [»blooBH7_1970]
Quote: if allow false hits and a sparse message universe, then can use a much smaller hash area to identify messages [»blooBH7_1970]
Quote: explicit-state model checking with bitstate verification and Bloom filters; e.g., Spin, cache coherence, and network protocols [»dillPC4_2004]
Quote: Z uses a Bloom filter for checking spelling in a dictionary [»woodSR6_1981]
Quote: designed Bloom filter for differential file by simulating heavy day and extreme loads [»gremLL9_1982, OK]

Subtopic: approximate membership testing up

Quote: gives approximate membership testers that do better than Bloom filters [»cartL5_1978]
Quote: an approximate membership tester correctly accepts every member but rarely accepts a non-member [»cartL5_1978]
Quote: gives approximate membership tester that does not need an independent sequence of hash functions [»cartL5_1978]

Subtopic: compressing hash filters up

Quote: compress hash values for spelling list by storing differences; partition into M bins for speed [»mcilMD1_1982]
Quote: store hash filter by differences between successive hash values [»bentJ5_1985]

Subtopic: optimizing Bloom filter up

Quote: for open-addressing, triple hashing modifies the increment as well as the hash value; much higher accuracy for Bloom filters [»dillPC4_2004]
Quote: triple hashing with LOOKUP2 generating 96 bits of data; as fast as double hashing with 1/13th as many omissions [»dillPC4_2004]
Quote: avoid MOD operations within the iteration part of a Bloom filter [»dillPC4_2004]
Quote: found the best Bloom filter for 2,300 unique keys was 3K-4K bytes and three or more transformations; <2% filter error rate [»gremLL9_1982]
Quote: the performance of a Bloom filter depends on its loading factor, the number of set bits [»gremLL9_1982]
Quote: instead of using multiple hash values in a Bloom filter, use a partitioned bit vector; easier monitoring and no performance difference [»mullJK8_1983]
Quote: gives probability of unset bits in a Bloom filter [»mullJK8_1983]
Quote: expression for false-positives in a hash filter
[»stanC12_1986, OK]

Related Topics up

Topic: approximate string matching and pattern matching with errors (19 items)
Topic: hash table and hash functions (41 items)
Topic: signature files (21 items)
Topic: information filters (22 items)
Topic: spelling errors
(18 items)

Updated barberCB 6/04
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.