Fully Understanding The Hashing Trick

Freksen Casper Benjamin, Kamma Lior, Larsen Kasper Green. Arxiv 2018

Feature hashing, also known as {\em the hashing trick}, introduced by Weinberger et al. (2009), is one of the key techniques used in scaling-up machine learning algorithms. Loosely speaking, feature hashing uses a random sparse projection matrix $A : R^{n} \to R^{m}$ (where $m ≪ n$ ) in order to reduce the dimension of the data from $n$ to $m$ while approximately preserving the Euclidean norm. Every column of $A$ contains exactly one non-zero entry, equals to either $- 1$ or $1$ . Weinberger et al. showed tail bounds on $|Ax|2^2$. Specifically they showed that for every $ϵ, δ$ , if $|x|{\infty} / |x|2$ is sufficiently small, and $m$ is sufficiently large, then $ $Pr [| | A x |_{2}^{2} - | x |_{2}^{2} | < ϵ | x |_{2}^{2}] \geq 1 - δ .$ $T h e s e b o u n d s w e r e l a t e r e x t e n d e d b y D a s g u p t a \etal (2010) a n d m o s t r e c e n t l y r e f i n e d b y D a h l g a a r d e t a l . (2017), h o w e v e r, t h e t r u e n a t u r e o f t h e p e r f o r m a n c e o f t h i s k e y t e c h n i q u e, a n d s p e c i f i c a l l y t h e c o r r e c t t r a d e o f f b e t w e e n t h e p i v o t a l p a r a m e t e r s$ |x|{\infty} / |x|_2, m, \epsilon, \delta $r e m a i n e d a n o p e n q u e s t i o n . W e s e t t l e t h i s q u e s t i o n b y g i v i n g t i g h t a s y m p t o t i c b o u n d s o n t h e e x a c t t r a d e o f f b e t w e e n t h e c e n t r a l p a r a m e t e r s, t h u s p r o v i d i n g a c o m p l e t e u n d e r s t a n d i n g o f t h e p e r f o r m a n c e o f f e a t u r e h a s h i n g . W e c o m p l e m e n t t h e a s y m p t o t i c b o u n d w i t h e m p i r i c a l d a t a, w h i c h s h o w s t h a t t h e c o n s t a n t s “ h i d i n g ” i n t h e a s y m p t o t i c n o t a t i o n a r e, i n f a c t, v e r y c l o s e t o$ 1$, thus further illustrating the tightness of the presented bounds in practice.

Awesome Learning to Hash

Fully Understanding The Hashing Trick

Freksen Casper Benjamin, Kamma Lior, Larsen Kasper Green. Arxiv 2018

Similar Work