Scalable K-means Clustering For Large K Via Seeded Approximate Nearest-neighbor Search | Awesome Learning to Hash Add your paper to Learning2Hash

Scalable K-means Clustering For Large K Via Seeded Approximate Nearest-neighbor Search

Jack Spalding-Jamieson, Eliot Wong Robson, da Wei Zheng . Arxiv 2025 – 0 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Datasets

For very large values of (k), we consider methods for fast (k)-means clustering of massive datasets with (10^7\sim10^9) points in high-dimensions ((d\geq100)). All current practical methods for this problem have runtimes at least (Ω(k^2)). We find that initialization routines are not a bottleneck for this case. Instead, it is critical to improve the speed of Lloyd’s local-search algorithm, particularly the step that reassigns points to their closest center. Attempting to improve this step naturally leads us to leverage approximate nearest-neighbor search methods, although this alone is not enough to be practical. Instead, we propose a family of problems we call “Seeded Approximate Nearest-Neighbor Search”, for which we propose “Seeded Search-Graph” methods as a solution.

Similar Work