Group: data structures
Topic: adaptive hash table
Topic: arrays
Topic: associative memory
Topic: data caching
Topic: database implementation
Topic: database keys and indexing
Topic: hash filter
Topic: oneway hash function
Topic: perfect hash table
Topic: search algorithms
Topic: signature files
Topic: symbol table
 
Summary
A hash function generates a random set of nbits from a larger number. In a good hash function, all bits of the original number can effect all bits of the result. A perfect hash function generates unique values for a given set of input values. A oneway hash function generates a value which is cryptographically secure, i.e., an adversary can not locate an input value that hashes to a given value. Jenkins has a theory of good hash functions.
A hash function may be used for many purposes. The primary use is locating a list of possible values in a hash table. With a good hash function, each list of possible values is short. The list may be stored implicitly or explicitly. Since most entries are accessed infrequently, a movetofront heuristic allows long hash chains.
Other uses include information retrieval signatures, authentication, trie hash, and duplicate detection. (cbb 4/98)
Subtopic: selection of hash functions
Quote: universal classes of hash functions with rapid evaluation; allows random choice of a hash function [»cartJL_1979]
 Quote: tested several hash functions: division by primes, shifting, folding, and midsquare [»gremLL9_1982]
 Quote: for hashing, use division by any number without small prime factors; avoids periodicities in the source, e.g., even numbers [»lumVY4_1971]
 Quote: a hash function should avoid funneling, i.e., the map of a set of input bits to a smaller set of output bits [»jenkB9_1997]
 Quote: the fastest code for a hash function to mix three 32bit values reversibly
 Quote: a hash function should be reversible, i.e., a permutation; otherwise it loses information about earlier blocks [»jenkB9_1997]
 Quote: a good mixing function for natural language: a transposition followed by a sequence of alternating substitutions and linear operations
 Quote: Jenkins' LOOKUP2 is a high quality, fast hash function; e.g., used in Spin [»dillPC4_2004]
 Quote: hash function by relatively prime, radix transform; separates data, e.g., high order bits mapped to many digits [»linAD5_1963]
 Subtopic: double and triple hashing
Quote: for openaddressing, triple hashing modifies the increment as well as the hash value; much higher accuracy for Bloom filters [»dillPC4_2004]
 Quote: triple hashing with LOOKUP2 generating 96 bits of data; as fast as double hashing with 1/13th as many omissions [»dillPC4_2004]
 Quote: if hash into the shorter of two buckets, the fraction of buckets with k keys is about (n/km)^2 [»mitzM5_2002]
 Quote: if hash into two groups of buckets, always break ties into the same group; greatly reduces long chains [»mitzM5_2002]
 Subtopic: string hash functions
Quote: hash text strings by xor and a table of randomish bytes for each character [»pearPK6_1990]
 Quote: an initial hash value of 0 is better than using the length of the string [»mckeBJ2_1990]
 Quote: for hashing, use double shifted sum modulo an odd number; or select number by testing
 Quote: a shifted sum works well for hashing but the modulus needs careful selection; primes aren't best [»mckeBJ2_1990]
 Quote: combine string characters for hashing by rotating add, prime multiplication, or summing groups of 4 [»ahoAV_1986]
 Quote: use a recursive hash function to compute hash values for consecutive ngrams; up to 7 times faster than nonrecursive integer division; e.g. for indexing all character sequences in a document [»coheJD7_1997]
 Subtopic: binary hash functions
Quote: Pearson's hash function uses an intermediate table; very uniform [»reicC6_1991]
 Quote: Rabin and Karp's hash function can be computed incrementally; more uniform than Pearson, but slower [»reicC6_1991]
 Quote: use incremental hash function for efficient deltas of binary files; faster than SCCS [»reicC6_1991]
 Subtopic: hash buckets
Quote: map records into hash buckets with overflow into trailer buckets; independent of key length and source language [»linAD5_1963]
 Subtopic: hash chaining
Quote: movetofront works well for hash tables; better than extendible hashing; effective for vocabulary accumulation at 80 strings per slot [»zobeJ12_2001]
 Quote: chained hash tables are best for building vocabularies; considerably faster than splay trees and binary trees [»zobeJ12_2001]
 Quote: compact chaining stores strings in the hash node; faster and less space [»zobeJ11_2005]
 Quote: use a contiguous array for all strings in a hash chain; good spatial locality; reduces overhead for skew distributions to 2 bits per string [»zobeJ11_2005]
 Quote: use 7 or 15bit lengths for hashed strings; string match fails on first character [»zobeJ11_2005]
 Quote: for words, exactfit array hashing most efficient with load average of 20; for URLs, best load average was less than 1 [»zobeJ11_2005]
 Subtopic: hash table sizing
Quote: with twice as many hash buckets as keys, about 1.6 out of every 10,000 buckets will have five or more keys [»mitzM5_2002]
 Subtopic: hashing execution
Quote: oblivious hashing is a hash function over assignments and control flows; verify software protection by hashing securitysensitive code paths [»chenY10_2002]
 Subtopic: hashing not fast
Quote: computing hash values is the most expensive operation in a model checker [»dillPC4_2004]
 Quote: ternary search trees better than binary trees, tries, and hashing; dynamic, fast operations including unsuccessful search and sort [»bentJ4_1998]
 Subtopic: associative array
Quote: script programmers used associative arrays while nonscript programmers used arrays and 10ary trees
 Quote: dictionary by triples ; the functional form 'fetch' can retrieve value given a name [»backJ8_1978a]
  QuoteRef: feldJA8_1969 ;;440 associative memory: accessing by specifying part of the contents
 Subtopic: other uses of hashing
Quote: a compressed signature file is like a hash index; 1030% space overhead, fast retrieval and insertion, suitable for writeonly [»faloC9_1988]
 Quote: crisscross hash joins interleaves zoned partitions configured through page maps; low memory and I/O costs; even better if ordered [»gopaRD7_2001]
 Quote: use ngramhashandskip for matching keywords in a large dictionary of 10,000100,000 entries [»coheJD12_1998]
 Quote: trie hash by sort on leading digits with split between buckets by a prefix; better than Btree, can 100% load for ordered insertion [»litwWA7_1991]
 Quote: delete duplicates by multiple passes using hashing; linear average time; constant extra space [»teuhJ3_1991]

Related Topics
Group: data structures (12 topics, 278 quotes)
Topic: adaptive hash table (19 items)
Topic: arrays (58 items)
Topic: associative memory (5 items)
Topic: data caching (35 items)
Topic: database implementation (18 items)
Topic: database keys and indexing (18 items)
Topic: hash filter (18 items)
Topic: oneway hash function (24 items)
Topic: perfect hash table (9 items)
Topic: search algorithms (40 items)
Topic: signature files (21 items)
Topic: symbol table (4 items)
