Approximately Minwise Independence With Twisted Tabulation
Dahlgaard Søren, Thorup Mikkel. Arxiv 2014
[Paper]
ARXIV
FOCS
Independent
A random hash function is -minwise if for any set ,
, and element , .
Minwise hash functions with low bias have widespread applications
within similarity estimation.
Hashing from a universe , the twisted tabulation hashing of
P\v{a}tra\c{s}cu and Thorup [SODA’13] makes lookups in tables of size
. Twisted tabulation was invented to get good concentration for
hashing based sampling. Here we show that twisted tabulation yields -minwise hashing.
In the classic independence paradigm of Wegman and Carter [FOCS’79] -minwise hashing requires -independence [Indyk
SODA’99]. P\v{a}tra\c{s}cu and Thorup [STOC’11] had shown that simple
tabulation, using same space and lookups yields -minwise
independence, which is good for large sets, but useless for small sets. Our
analysis uses some of the same methods, but is much cleaner bypassing a
complicated induction argument.
Similar Work