Pthash Revisiting FCH Minimal Perfect Hashing
Pibiri Giulio Ermanno, Trani Roberto. SIGIR 2021
[Paper]
Independent
SIGIR
Given a set of distinct keys, a function that bijectively maps
the keys of into the range is called a minimal perfect
hash function for . Algorithms that find such functions when is large
and retain constant evaluation time are of practical interest; for instance,
search engines and databases typically use minimal perfect hash functions to
quickly assign identifiers to static sets of variable-length keys such as
strings. The challenge is to design an algorithm which is efficient in three
different aspects: time to find (construction time), time to evaluate
on a key of (lookup time), and space of representation for . Several
algorithms have been proposed to trade-off between these aspects. In 1992, Fox,
Chen, and Heath (FCH) presented an algorithm at SIGIR providing very fast
lookup evaluation. However, the approach received little attention because of
its large construction time and higher space consumption compared to other
subsequent techniques. Almost thirty years later we revisit their framework and
present an improved algorithm that scales well to large sets and reduces space
consumption altogether, without compromising the lookup time. We conduct an
extensive experimental assessment and show that the algorithm finds functions
that are competitive in space with state-of-the art techniques and provide
better lookup time.
Similar Work