Scaling Up HBM Efficiency Of Top-k Spmv For Approximate Embedding Similarity On Fpgas | Awesome Learning to Hash Add your paper to Learning2Hash

Scaling Up HBM Efficiency Of Top-k Spmv For Approximate Embedding Similarity On Fpgas

Alberto Parravicini, Luca Giuseppe Cellamare, Marco Siracusa, Marco Domenico Santambrogio . 2021 58th ACM/IEEE Design Automation Conference (DAC) 2021 – 4 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Efficiency Evaluation Similarity Search

Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power-efficiency.

Similar Work