Made with ❤️ by Synerise.
Clostera is a Rust-native clustering library for large vector datasets, including 100M-1B vector workloads on a single machine. The public API is deliberately small: pass vectors, pass K, pass the metric, and either let algorithm="auto" choose the backend or select a concrete algorithm by name.
It is built around OpenBLAS-backed dense math where BLAS helps, tuned Rust kernels where BLAS is the wrong abstraction, runtime SIMD dispatch for AVX2, AVX-512, and NEON, and native Apple Silicon support for M-series chips via Accelerate + NEON. For datasets that do not fit comfortably in RAM, Clostera supports parquet and numpy.memmap workflows so the heavy data can stay out-of-core.
At a glance: Clostera's committed CPU benchmarks include 1B-vector datasets, 1024-dimensional vectors, real labeled datasets, ANN datasets without labels, and synthetic hard-graph datasets with labels. Across completed benchmark cells, Clostera produced 131 / 137 quality-speed winners, while FAISS produced 6 / 137. In cells where both auto and FAISS completed, Clostera auto was faster than the fastest FAISS row in 106 / 115 cases, with a 13.4x median speedup on those wins, while staying within 2.5% of the best FAISS quality in 115 / 115 cases.
pip install closteraThe headline numbers below come from the committed benchmark artifacts in benchmarks/results/. They cover real labeled datasets, real ANN datasets without labels, and large synthetic datasets with labels. All rows are CPU-only. Clostera and FAISS were both capped to the same 64-core CPU budget.
Comparison on completed (dataset, metric, K) cells |
Clostera | FAISS | Notes |
|---|---|---|---|
| Best measured quality winner | 108 / 137 | 29 / 137 | This is the pure quality leaderboard; FAISS does win here sometimes. |
| Quality-speed winner | 131 / 137 | 6 / 137 | Within 2.5% of best quality and at least 1.5x faster, when such a row exists. |
| Fastest completed row | 133 / 137 | 4 / 137 | Fastest regardless of quality. |
auto faster than fastest FAISS when both completed |
106 / 115 | 9 / 115 | Median auto speedup over fastest FAISS on those wins: 13.4x. |
auto within 2.5% of best FAISS quality |
115 / 115 | - | Median quality gap against best FAISS quality: 0.0%. |
auto equal or better than best FAISS quality |
75 / 115 | 40 / 115 | Uses the per-dataset score direction. |
Timeouts matter at this scale. Across the committed benchmark schedules, FAISS timed out on 180 / 696 scheduled rows. Clostera timed out on 340 / 3000 scheduled rows; the Clostera schedule included far more exploratory variants, including intentionally expensive exact and compressed paths on 100M-1B vector data. Timed-out rows are excluded from all winner tables.
algorithm="auto" is not an oracle. It is a static, auditable rule over {N, D, K, metric}. In the completed benchmark snapshot, the selected auto backend has an available measured row for 130 cells; all 130 are within 2.5% of the best measured quality score, with median quality gap 0.037% and median speedup 2.69x versus the best-quality row.
Auto mode:
import numpy as np
import clostera
vectors = np.load("vectors.npy").astype(np.float32)
clusterer = clostera.Clusterer(
k=256,
metric="l2", # also: "cos"
algorithm="auto",
)
labels = clusterer.fit_transform(vectors)
print(clusterer.algorithm_) # concrete backend selected by autoChosen algorithm:
import numpy as np
import clostera
vectors = np.load("vectors.npy").astype(np.float32)
clusterer = clostera.Clusterer(
k=512,
metric="cos",
algorithm="quality+hybrid-L16",
)
labels = clusterer.fit_transform(vectors)Out-of-core memmap input:
import numpy as np
import clostera
vectors = np.memmap("vectors.f32", dtype=np.float32, mode="r", shape=(1_000_000_000, 256))
clusterer = clostera.Clusterer(k=1024, metric="l2", algorithm="auto")
labels = clusterer.fit_transform(vectors)Clostera is a Python package with a Rust core. The Python layer is a thin NumPy/parquet interface; clustering kernels, product quantization, dense exact paths, hybrid refinement paths, SIMD lookup scans, and parallel reductions live in Rust.
Clusterer requires three decisions:
| Required input | Meaning |
|---|---|
vectors |
NumPy array, parquet path, or compatible array-like input |
k |
The requested number of clusters. Auto-K is intentionally disabled. |
metric |
"l2" or "cos" |
Then choose one:
algorithm |
Meaning |
|---|---|
"auto" |
Static selector using only N, D, K, and metric. It does not inspect labels or calibration scores. |
| concrete name | Any backend returned by clostera.available_algorithms() |
print(clostera.available_metrics())
print(clostera.available_algorithms())The high-level algorithm names are fixed public choices, not template strings.
| Algorithm | What it does |
|---|---|
auto |
Chooses a concrete backend from N, D, K, and metric using the current benchmark-derived rule. |
clostera-default |
OPQ/PQ quality path. Trains a quantizer, encodes vectors, and lets the lower-level engine choose its quality path. |
clostera-fastest |
Plain PQ compressed-domain clustering. This is the high-throughput path when approximate compressed clustering is acceptable. |
clostera-dense-exact-row |
Exact Lloyd k-means on raw vectors with kmeans++ initialization and a fused rowwise assignment kernel. This is the dominant auto choice for many high-K and high-D cases. |
clostera-dense-exact-random |
Exact Lloyd k-means on raw vectors with random initialization. It is often faster and good enough in the middle-K region. |
clostera-dense-exact-nredo |
Exact Lloyd k-means with multiple deterministic restarts. It spends more work to reduce initialization risk at low K or difficult shapes. |
quality+adc |
OPQ/PQ-encoded dataset with dense f32 centroids. Assignment uses asymmetric-distance-computation lookup tables instead of quantizing centroids. |
quality+adc+nredo |
quality+adc with multiple restarts. Useful when compressed assignment needs stronger initialization. |
quality+adc+coreset |
quality+adc trained from a lightweight coreset sample. Useful for low-K L2 cases where a naive random sample is weak. |
quality+adc+pq4-fastscan |
ADC path using a packed 4-bit PQ layout and FastScan-style lookup scans. |
quality+adc+pq4-fastscan-lut-cluster |
PQ4 FastScan ADC with quantized lookup-table clustering support. |
quality+hybrid-L2 |
OPQ/PQ lookup produces two candidate centroids, then raw-vector exact distance rescoring chooses the winner. |
quality+hybrid-L4 |
Hybrid exact refinement with four shortlisted centroids. |
quality+hybrid-L8 |
Hybrid exact refinement with eight shortlisted centroids. |
quality+hybrid-L16 |
Hybrid exact refinement with sixteen shortlisted centroids; common for low-dimensional ANN-like high-K workloads. |
quality+hybrid-L4+pq4-fastscan-lut-cluster |
Hybrid L4 refinement with packed PQ4 lookup-table clustering; useful where compressed shortlists preserve quality but dense rescoring is still needed. |
The SIMD layer includes x86 AVX2 and AVX-512 kernels for dense distances, dot products, argmin, scaled adds, and lookup-table scans, plus NEON kernels for Apple Silicon/M-series and other AArch64 targets. Runtime selection is controlled by:
CLOSTERA_SIMD=auto # default
CLOSTERA_SIMD=scalar
CLOSTERA_SIMD=avx2
CLOSTERA_SIMD=avx512
CLOSTERA_SIMD=neonClostera is a billion-scale clustering library, not a general vector-search stack, vector database, or distributed data-processing framework. Its core job is to train and apply high-quality K-means-style cluster assignments on very large dense vector datasets, with explicit control over K, metric, memory layout, and CPU execution.
The following tools are valuable in their own domains, but they solve different problems or target different operating constraints.
Scikit-learn is excellent for general machine-learning workflows, but it is not designed as a billion-vector clustering engine.
- Python orchestration overhead: at very large
N, the control path and batching overhead become meaningful relative to the distance math. - Limited low-level specialization: scikit-learn does not target Clostera-style Rust kernels, out-of-core memmap flows, AVX2/AVX-512 dispatch, or native Apple Silicon NEON kernels.
- Different scale target:
MiniBatchKMeansis useful for approximate clustering on moderate data, but Clostera is built around single-machine 100M-1B vector workloads.
Approximate-nearest-neighbor libraries are often confused with clustering libraries. They are not the same thing.
- Retrieval vs. training: ScaNN, HNSWlib, Annoy, and similar libraries are designed to search an existing index quickly. Clostera is designed to train centroids and assign points to clusters.
- Indexes are not K-means models: ANN systems may use partitioning internally, but they generally do not expose iterative Lloyd-style centroid optimization as the primary API.
- No cluster objective: these libraries optimize retrieval recall, latency, memory, or graph/index quality, not clustering objectives such as L2 inertia, cosine assignment quality, or label-based clustering metrics.
Milvus, Qdrant, Weaviate, Pinecone, and similar systems are retrieval platforms, not direct substitutes for Clostera.
- Serving layer vs. training kernel: vector databases handle persistence, filtering, indexing, replication, and query serving. Clostera handles compute-heavy clustering.
- Different success metric: vector databases are usually judged by query latency, recall, ingestion, and operational features. Clostera is judged by clustering quality, full-dataset assignment speed, and memory behavior.
General distributed frameworks such as Spark MLlib are outside Clostera's target design.
At 1B vectors with D=256 and float32, the raw vector matrix is about 1 TB. Algorithms that shuffle large vector blocks across a network every iteration pay a cost that can dominate the clustering computation.
Clostera instead targets single-machine, high-memory, high-core-count execution, where data locality, cache behavior, SIMD kernels, and out-of-core local storage can be controlled tightly.
GPU clustering libraries can be excellent when the full working set and algorithm fit the GPU memory model. Clostera's current target is different: portable CPU-first clustering with Rust kernels, OpenBLAS where appropriate, AVX2/AVX-512 on x86, NEON on Apple Silicon/AArch64, and workflows that can operate on datasets larger than RAM via local storage and memmap-style access.
Clostera is for users who have:
- a dense vector dataset,
- a required metric, currently
l2orcos, - a chosen
K, - and a need to compute high-quality clusters quickly on a single machine.
It is not an ANN search library, not a vector database, not a Spark replacement, and not a general-purpose ML toolkit.
The current selector is intentionally simple and auditable. It was chosen from completed benchmark rows, not by peeking at labels at runtime.
def auto_backend(N, D, K, metric):
metric = "l2" if metric in {"l2", "euclidean"} else "cos"
if N <= 4_096:
if K <= 8:
return "clostera-dense-exact-nredo"
if 32 < K <= 200:
return "clostera-dense-exact-random"
return "clostera-dense-exact-row"
if N >= 10_000_000 and D <= 256:
if metric == "l2" and 32 <= K <= 64:
return "quality+adc+nredo"
if metric == "cos" and K == 64:
return "clostera-default"
if 32 <= K <= 128:
return "clostera-dense-exact-nredo"
if metric == "l2" and K <= 2:
return "quality+adc+coreset"
if K <= 8:
return "clostera-dense-exact-nredo"
if N <= 100_000 and D >= 512 and K == 10:
return "clostera-fastest"
if 500_000 <= N <= 1_000_000 and D == 384 and metric == "cos" and K <= 32:
return "quality+hybrid-L4+pq4-fastscan-lut-cluster"
if 500_000 <= N <= 1_000_000 and D == 384 and metric == "l2" and K == 14:
return "clostera-dense-exact-random"
if 100_000 <= N <= 200_000 and D == 384 and metric == "l2" and K == 64:
return "clostera-dense-exact-row"
if D <= 128 and K >= 256:
return "quality+hybrid-L16"
if 32 < K <= 200:
return "clostera-dense-exact-random"
return "clostera-dense-exact-row"On the committed benchmark snapshot, the selected auto backend has an available measured row for 130 dataset/metric/K cells. It is within 2.5% of the best measured quality score on all 130 cells. Median quality gap is 0.037%; median speedup versus the best-quality row is 2.69x. Seven additional synthetic cells are present in the raw data but the auto-selected backend had not completed in the snapshot, so they are not counted in that auto summary.
The raw benchmark JSON records Clostera 1.0.4 because those runs produced the evidence used here. Version 1.0.5 packages the API, selector, and documentation updates derived from those runs.
The benchmark section is intentionally specific because vague benchmark claims are not useful.
Raw result files:
| File | Purpose |
|---|---|
benchmarks/results/grand-pareto-resweep-20260426-postfaiss.json |
Full real labeled + ANN sweep, including Clostera and FAISS rows. |
benchmarks/results/gist-unlocked-exact-20260427.json |
Additional exact-mode GIST rows. |
benchmarks/results/synthetic-large-scale-pareto-20260427.json |
Large synthetic full-shard sweep snapshot. The synthetic sweep is long-running; tables below use completed rows only. |
benchmarks/results/readme_quality_speed_winners_20260504.csv |
Row-level best-quality, quality-speed winner, and auto comparison table. |
benchmarks/results/readme_auto_vs_quality_summary_20260504.csv |
Per-dataset summary used in this README. |
benchmarks/results/readme_dataset_matrix_20260504.csv |
Dataset sizes, dimensions, metrics, and tested K values. |
Scoring rules:
| Dataset family | Primary quality score in README tables |
|---|---|
| Real labeled datasets | V-measure, higher is better. |
| ANN datasets without labels | l2 uses cluster MSE, lower is better. cos uses assigned-center similarity, higher is better. |
| Large synthetic datasets | l2 uses full cluster MSE, lower is better. cos uses full angular loss, lower is better. Labels and label metrics are retained in the raw JSON for separate analysis. |
V-measure is the harmonic mean of homogeneity and completeness:
V = 2 * homogeneity * completeness / (homogeneity + completeness)
Homogeneity asks whether each predicted cluster contains mostly one class. Completeness asks whether points from the same class stay together. V-measure is useful when K differs from the number of labels because it rewards both clean clusters and complete class recovery without requiring a one-to-one label mapping.
The quality-speed winner is selected per (dataset, metric, K) with a deliberately conservative rule:
- Find the best measured quality score for that cell.
- Admit rows whose quality is within 2.5% of that best score.
- Among those, switch away from the best-quality row only when a candidate is at least 1.5x faster.
- If several rows qualify, choose the fastest.
- If no row qualifies, keep the best-quality row.
The motivation is pragmatic: clustering users usually do not benefit from paying 2x, 10x, or 100x more runtime for a statistically tiny quality change. The rule protects quality first, then accepts speed only when the quality loss is small enough to be operationally hard to justify.
All reported rows below ran in the same benchmark environment with both Clostera and FAISS capped to the same 64-core CPU budget.
| Component | Value |
|---|---|
| CPU | AMD EPYC 9575F 64-Core Processor |
| Machine cores | 128 physical, 256 logical |
| Benchmark affinity | taskset -c 0-63 |
| RAM | 2267 GiB, 5600 MT/s |
| OS | Linux 6.8.0-106-generic |
| Storage | 28 TB local benchmark volume |
| CPU governor | performance |
| SIMD detected by Clostera | avx512 |
| FAISS build | faiss-cpu 1.13.2, compile options OPTIMIZE AVX512 |
| Python stack | Python 3.12.3, NumPy 2.4.4, scikit-learn 1.8.0, PyArrow 24.0.0 |
Thread and affinity settings used by the benchmark launchers:
taskset -c 0-63
RAYON_NUM_THREADS=64
OPENBLAS_NUM_THREADS=64
GOTO_NUM_THREADS=64
OMP_NUM_THREADS=64
OMP_THREAD_LIMIT=64
OMP_DYNAMIC=FALSE
OMP_PROC_BIND=spread
OMP_PLACES=cores
MKL_NUM_THREADS=64
MKL_DYNAMIC=FALSE
BLIS_NUM_THREADS=64
NUMEXPR_NUM_THREADS=64
VECLIB_MAXIMUM_THREADS=64
CLOSTERA_SIMD=auto
CLOSTERA_CPU_AFFINITY=0-63
faiss.omp_set_num_threads(64)Timeouts and accounting:
| Sweep | Timeout policy |
|---|---|
| Real labeled + ANN | 600 seconds per row. |
| Large synthetic, 100M and 250M scale | 1800 seconds per row. |
| Large synthetic, 1B scale | 3600 seconds per row. |
Reusable phases are charged to every affected row. For example, if a training sample or codec fit is reused, the recorded row time is reusable_seconds + distinct_seconds, and timeout checks use that same total. Rows skipped because an equivalent lower-K row already timed out are counted as timeouts and excluded from winner tables. Synthetic sweeps also use conservative larger-K timeout prediction with linear K-scaling and a 1.12 safety factor.
Timeouts by dataset and library:
| Dataset | Library | Timeouts | Timeout % | Time budget |
|---|---|---|---|---|
20newsgroups |
Clostera | 0 / 288 | 0.0% | 600s |
20newsgroups |
FAISS | 0 / 60 | 0.0% | 600s |
ag-news |
Clostera | 0 / 288 | 0.0% | 600s |
ag-news |
FAISS | 0 / 60 | 0.0% | 600s |
cifar100 |
Clostera | 0 / 288 | 0.0% | 600s |
cifar100 |
FAISS | 0 / 60 | 0.0% | 600s |
dbpedia-14 |
Clostera | 0 / 288 | 0.0% | 600s |
dbpedia-14 |
FAISS | 0 / 60 | 0.0% | 600s |
fashion-mnist |
Clostera | 0 / 288 | 0.0% | 600s |
fashion-mnist |
FAISS | 0 / 60 | 0.0% | 600s |
gist-960-euclidean |
Clostera | 0 / 360 | 0.0% | 600s |
gist-960-euclidean |
FAISS | 20 / 60 | 33.3% | 600s |
glove-100-angular |
Clostera | 0 / 240 | 0.0% | 600s |
glove-100-angular |
FAISS | 0 / 50 | 0.0% | 600s |
sift-128-euclidean |
Clostera | 0 / 240 | 0.0% | 600s |
sift-128-euclidean |
FAISS | 0 / 50 | 0.0% | 600s |
n100m_k2048_d1024_iso_gaussian_balanced |
Clostera | 84 / 120 | 70.0% | 1800s |
n100m_k2048_d1024_iso_gaussian_balanced |
FAISS | 39 / 40 | 97.5% | 1800s |
n100m_k256_d1024_mixed_curse |
Clostera | 40 / 120 | 33.3% | 1800s |
n100m_k256_d1024_mixed_curse |
FAISS | 31 / 40 | 77.5% | 1800s |
n100m_k256_d512_iso_gaussian_zipf |
Clostera | 25 / 120 | 20.8% | 1800s |
n100m_k256_d512_iso_gaussian_zipf |
FAISS | 22 / 40 | 55.0% | 1800s |
n100m_k64_d256_swiss_roll_lifted |
Clostera | 0 / 120 | 0.0% | 1800s |
n100m_k64_d256_swiss_roll_lifted |
FAISS | 5 / 40 | 12.5% | 1800s |
n1b_k1024_d256_hub_inducing |
Clostera | 88 / 120 | 73.3% | 3600s |
n1b_k1024_d256_hub_inducing |
FAISS | 37 / 40 | 92.5% | 3600s |
n1b_k256_d256_iso_gaussian_balanced |
Clostera | 103 / 120 | 85.8% | 3600s |
n1b_k256_d256_iso_gaussian_balanced |
FAISS | 26 / 36 | 72.2% | 3600s |
FAISS was run on CPU with corresponding settings:
faiss-kmeans
faiss-pq8
faiss-opq-pq8
faiss-pq4
faiss-opq-pq4
No GPU FAISS rows are included in these tables.
| Dataset | Type | N | D | true K | K tested | Metrics |
|---|---|---|---|---|---|---|
20newsgroups |
real | 18.846k | 384 | 20 | 10,20,32,40,64,80 |
l2,cos |
ag-news |
real | 127.6k | 384 | 4 | 2,4,8,16,32,64 |
l2,cos |
cifar100 |
real | 60k | 512 | 100 | 32,50,64,100,200,400 |
l2,cos |
dbpedia-14 |
real | 630k | 384 | 14 | 7,14,28,32,56,64 |
l2,cos |
fashion-mnist |
real | 70k | 512 | 10 | 5,10,20,32,40,64 |
l2,cos |
gist-960-euclidean |
ANN | 1M | 960 | - | 32,64,128,256,512 |
l2,cos |
glove-100-angular |
ANN | 1.18351M | 100 | - | 32,64,128,256,512 |
l2,cos |
sift-128-euclidean |
ANN | 1M | 128 | - | 32,64,128,256,512 |
l2,cos |
n100m_k2048_d1024_iso_gaussian_balanced |
synthetic | 100M | 1024 | 2048 | 512,1024,2048,4096 |
cos,l2 |
n100m_k256_d1024_mixed_curse |
synthetic | 100M | 1024 | 256 | 64,128,256,512 |
cos,l2 |
n100m_k256_d512_iso_gaussian_zipf |
synthetic | 100M | 512 | 256 | 64,128,256,512 |
cos,l2 |
n100m_k64_d256_swiss_roll_lifted |
synthetic | 100M | 256 | 64 | 16,32,64,128 |
cos,l2 |
n1b_k1024_d256_hub_inducing |
synthetic | 1B | 256 | 1024 | 256,512,1024,2048 |
cos,l2 |
n1b_k256_d256_iso_gaussian_balanced |
synthetic | 1B | 256 | 256 | 64,128,256,512 |
cos,l2 |
Synthetic datasets are not make_blobs. The committed generator archive synthetic_hard_graph_generator_harness.tar.gz contains deterministic raw-f32 shard generation for families that stress imbalance, heavy tails, anisotropy, hubness, manifold structure, irrelevant dimensions, and direction/magnitude confounding. Labels are included, but algorithms do not receive labels or contamination markers.
This table aggregates completed (dataset, metric, K) cells. "Quality gap" is relative to the best measured quality row for that cell. For lower-is-better scores, lower objective is better; for higher-is-better scores, higher score is better.
| Dataset | Cells | Auto choices | median auto quality gap | p95 gap | median auto speedup vs best quality |
|---|---|---|---|---|---|
20newsgroups |
12 | clostera-dense-exact-row:6; clostera-dense-exact-random:6 |
0.809% | 1.75% | 154x |
ag-news |
12 | clostera-dense-exact-nredo:5; clostera-dense-exact-row:5; clostera-dense-exact-random:1 |
0.725% | 1.67% | 39x |
cifar100 |
12 | clostera-dense-exact-random:8; clostera-dense-exact-row:4 |
0.0368% | 1.65% | 1.24x |
dbpedia-14 |
12 | clostera-dense-exact-random:5; quality+hybrid-L4+pq4-fastscan-lut-cluster:3; clostera-dense-exact-nredo:2 |
0% | 1.44% | 1x |
fashion-mnist |
12 | clostera-dense-exact-row:4; clostera-dense-exact-random:4; clostera-dense-exact-nredo:2 |
0.869% | 1.51% | 50.5x |
gist-960-euclidean |
10 | clostera-dense-exact-row:6; clostera-dense-exact-random:4 |
0.00918% | 0.0731% | 8.8x |
glove-100-angular |
10 | clostera-dense-exact-random:4; quality+hybrid-L16:4; clostera-dense-exact-row:2 |
0.0673% | 1.09% | 2.23x |
sift-128-euclidean |
10 | clostera-dense-exact-random:4; quality+hybrid-L16:4; clostera-dense-exact-row:2 |
0.0169% | 0.119% | 6.21x |
n100m_k2048_d1024_iso_gaussian_balanced |
8 | clostera-dense-exact-row:8 |
0% | 0.000106% | 1x |
n100m_k256_d1024_mixed_curse |
8 | clostera-dense-exact-random:4; clostera-dense-exact-row:4 |
0.227% | 0.472% | 2.43x |
n100m_k256_d512_iso_gaussian_zipf |
8 | clostera-dense-exact-random:4; clostera-dense-exact-row:4 |
0.0522% | 0.246% | 2.3x |
n100m_k64_d256_swiss_roll_lifted |
8 | clostera-dense-exact-nredo:3; clostera-dense-exact-row:2; quality+adc+nredo:2 |
0% | 2.29% | 1x |
n1b_k1024_d256_hub_inducing |
8 | clostera-dense-exact-row:8 |
0% | 0.0791% | 1x |
n1b_k256_d256_iso_gaussian_balanced |
7 | auto-selected rows not completed in snapshot | - | - | - |
The complete row-level table is in benchmarks/results/readme_quality_speed_winners_20260504.csv. These examples use score / seconds; score direction depends on score_metric in the CSV.
20newsgroups, cos, K=20
- Best quality:
quality+hybrid-L4,0.59059 / 3.28s - Quality-speed winner:
clostera-dense-exact-random,0.58277 / 0.0298s - Auto:
clostera-dense-exact-row,0.58928 / 0.0355s
ag-news, l2, K=4
- Best quality:
quality+hybrid-exact+flash,0.59778 / 5.06s - Quality-speed winner:
clostera-dense-exact-bound,0.59709 / 0.0351s - Auto:
clostera-dense-exact-nredo,0.59639 / 0.106s
cifar100, l2, K=100
- Best quality:
clostera-dense-exact-nredo,0.56788 / 0.322s - Quality-speed winner:
clostera-dense-exact-random,0.56641 / 0.0782s - Auto:
clostera-dense-exact-random,0.56641 / 0.0782s
gist-960-euclidean, l2, K=512
- Best quality:
faiss-kmeans,0.0011905 / 321s - Quality-speed winner:
clostera-dense-exact-row,0.0011912 / 10.7s - Auto:
clostera-dense-exact-row,0.0011912 / 10.7s
n100m_k2048_d1024_iso_gaussian_balanced, l2, K=2048
- Best quality:
clostera-dense-exact-row,1.0331 / 391s - Quality-speed winner:
clostera-dense-exact-row,1.0331 / 391s - Auto:
clostera-dense-exact-row,1.0331 / 391s
n1b_k1024_d256_hub_inducing, cos, K=1024
- Best quality:
clostera-dense-exact-row,6.1402e+08 / 1200s - Quality-speed winner:
clostera-dense-exact-row,6.1402e+08 / 1200s - Auto:
clostera-dense-exact-row,6.1402e+08 / 1200s
- Dense exact paths are often the right answer at small and medium scale. They avoid quantization error and use fused rowwise assignment plus thread-local reductions.
- Product-quantized paths matter when the dataset is large enough that dense passes are no longer the best trade-off, or when memory pressure dominates.
- Hybrid paths use compressed lookup for a shortlist and exact dense rescoring for final assignment.
algorithm="auto"is conservative. If the selector does not have a measured row for a shape, it falls back to simple dense or compressed backends rather than silently inventing a new configuration.- Path-like parquet and memmap workflows remain supported. Some dense exact algorithms require raw vectors in memory; auto falls back when that requirement is not met.
Install benchmark dependencies:
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip maturin
python -m pip install -e ".[benchmarks]"Run the real labeled + ANN sweep from a checkout where dataset paths and output paths have been configured for your machine. The committed schedule files are reproducibility templates; replace /benchmark/clostera with your benchmark root or regenerate them with the scheduler scripts.
bash benchmarks/schedules/grand-pareto-resweep-20260426-postfaiss.sh
bash benchmarks/schedules/gist-unlocked-exact-20260427.shRun the large synthetic sweep:
bash benchmarks/schedules/synthetic-large-scale-pareto-20260427.shRegenerate the README summary CSV files from raw result JSON:
python scripts/summarize_benchmark_evidence.pyThe synthetic generator archive is committed as synthetic_hard_graph_generator_harness.tar.gz. It writes raw memmappable f32 vector shards and i32 label shards with deterministic seeds, so large runs can be resumed and audited shard by shard.
Build locally:
python -m pip install -U maturin
python -m maturin develop --releaseRun tests:
python -m pytest -q
cargo testOn macOS, the default build links against Accelerate. On Linux, the default build uses the system BLAS path detected by pkg-config or falls back to -lopenblas. Explicit Cargo features remain available for OpenBLAS system/static builds.
