Locality-preserving Minimal Perfect Hashing Of K-mers
Pibiri Giulio Ermanno, Shibuya Yoshihiro, Limasset Antoine. Arxiv 2022
[Paper]
ARXIV
Independent
Minimal perfect hashing is the problem of mapping a static set of
distinct keys into the address space bijectively. It is
well-known that bits are necessary to specify a minimal perfect
hash function (MPHF) , when no additional knowledge of the input keys is to
be used. However, it is often the case in practice that the input keys have
intrinsic relationships that we can exploit to lower the bit complexity of .
For example, consider a string and the set of all its distinct -mers as
input keys: since two consecutive -mers share an overlap of symbols,
it seems possible to beat the classic bits/key barrier in this
case. Moreover, we would like to map consecutive -mers to consecutive
addresses, as to also preserve as much as possible their relationship in the
codomain. This is a useful feature in practice as it guarantees a certain
degree of locality of reference for , resulting in a better evaluation time
when querying consecutive -mers. Motivated by these premises, we initiate
the study of a new type of locality-preserving MPHF designed for -mers
extracted consecutively from a collection of strings. We design a construction
whose space usage decreases for growing and discuss experiments with a
practical implementation of the method: in practice, the functions built with
our method can be several times smaller and even faster to query than the most
efficient MPHFs in the literature.
Similar Work