One Permutation Hashing | Awesome Learning to Hash Add your paper to Learning2Hash

One Permutation Hashing

Ping Li, Art Owen, Cun-hui Zhang. Neural Information Processing Systems 2012

[Paper]    
NEURIPS Supervised

While minwise hashing is promising for large-scale learning in massive binary data, the preprocessing cost is prohibitive as it requires applying (e.g.,) \(k=500\) permutations on the data. The testing time is also expensive if a new data point (e.g., a new document or a new image) has not been processed. In this paper, we develop a simple \textbf{one permutation hashing} scheme to address this important issue. While it is true that the preprocessing step can be parallelized, it comes at the cost of additional hardware and implementation. Also, reducing \(k\) permutations to just one would be much more \textbf{energy-efficient}, which might be an important perspective as minwise hashing is commonly deployed in the search industry. While the theoretical probability analysis is interesting, our experiments on similarity estimation and SVM \& logistic regression also confirm the theoretical results.

Similar Work