Understanding The Moments Of Tabulation Hashing Via Chaoses
Houen Jakob Bæk Tejs, Thorup Mikkel. Arxiv 2022
[Paper]
ARXIV
FOCS
Independent
Simple tabulation hashing dates back to Zobrist in 1970 and is defined as
follows: Each key is viewed as characters from some alphabet , we
have fully random hash functions , and a key is hashed to
where is
the bitwise XOR operation. The previous results on tabulation hashing by P{\v
a}tra{\c s}cu and Thorup~[J.ACM’11] and by Aamand et al.~[STOC’20] focused on
proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys
hashing to a given value, for simple tabulation hashing, but their bounds do
not cover the entire tail.
Chaoses are random variables of the form where are independent random
variables. Chaoses are a well-studied concept from probability theory, and
tight analysis has been proven in several instances, e.g., when the independent
random variables are standard Gaussian variables and when the independent
random variables have logarithmically convex tails. We notice that hash-based
sums of simple tabulation hashing can be seen as a sum of chaoses that are not
independent. This motivates us to use techniques from the theory of chaoses to
analyze hash-based sums of simple tabulation hashing.
In this paper, we obtain bounds for all the moments of hash-based sums for
simple tabulation hashing which are tight up to constants depending only on
. In contrast with the previous attempts, our approach will mostly be
analytical and does not employ intricate combinatorial arguments. The improved
analysis of simple tabulation hashing allows us to obtain bounds for the
moments of hash-based sums for the mixed tabulation hashing introduced by
Dahlgaard et al.~[FOCS’15].
Similar Work