Approximately Minwise Independence With Twisted Tabulation | Awesome Learning to Hash Add your paper to Learning2Hash

Approximately Minwise Independence With Twisted Tabulation

Dahlgaard Søren, Thorup Mikkel. Arxiv 2014

[Paper]    
ARXIV FOCS Independent

A random hash function h is ϵ-minwise if for any set S, |S|=n, and element xS, Pr[h(x)=minh(S)]=(1±ϵ)/n. Minwise hash functions with low bias ϵ have widespread applications within similarity estimation. Hashing from a universe [u], the twisted tabulation hashing of P\v{a}tra\c{s}cu and Thorup [SODA’13] makes c=O(1) lookups in tables of size u1/c. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields O~(1/u1/c)-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS’79] O~(1/u1/c)-minwise hashing requires Ω(logu)-independence [Indyk SODA’99]. P\v{a}tra\c{s}cu and Thorup [STOC’11] had shown that simple tabulation, using same space and lookups yields O~(1/n1/c)-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.

Similar Work