Approximately Minwise Independence With Twisted Tabulation

Dahlgaard Søren, Thorup Mikkel. Arxiv 2014

A random hash function $h$ is $ϵ$ -minwise if for any set $S$ , $| S | = n$ , and element $x \in S$ , $Pr [h (x) = min h (S)] = (1 \pm ϵ) / n$ . Minwise hash functions with low bias $ϵ$ have widespread applications within similarity estimation. Hashing from a universe $[u]$ , the twisted tabulation hashing of P\v{a}tra\c{s}cu and Thorup [SODA’13] makes $c = O (1)$ lookups in tables of size $u^{1 / c}$ . Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields $\tilde{O} (1 / u^{1 / c})$ -minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS’79] $\tilde{O} (1 / u^{1 / c})$ -minwise hashing requires $Ω (l o g u)$ -independence [Indyk SODA’99]. P\v{a}tra\c{s}cu and Thorup [STOC’11] had shown that simple tabulation, using same space and lookups yields $\tilde{O} (1 / n^{1 / c})$ -minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.

Awesome Learning to Hash

Approximately Minwise Independence With Twisted Tabulation

Dahlgaard Søren, Thorup Mikkel. Arxiv 2014

Similar Work