feat(seekdb): add SeekDB backend and HNSW benchmark support#770
Conversation
|
Hi, @XuanYang-cn I have fixed the lint error. can you approve the workflow and review the pr? thank you very much |
sorry,I made a typo. I have fixed it. 两项全部通过: make lint ✅ black + ruff 全通过 |
XuanYang-cn
left a comment
There was a problem hiding this comment.
Requesting changes for the label-filter correctness issue and the missing optional dependency.
Add a new vector database backend for SeekDB, connecting via mysql-connector-python over the MySQL wire protocol. Key components: - seekdb.py: VectorDB implementation with heap-organized table, HNSW vector index, and version-aware optimize() that calls dbms_index_manager.refresh() on SeekDB >= 1.3.0 - config.py: DBConfig with host/port/user/password/database and SeekDBHNSWConfig with m/ef_construction/ef_search parameters - cli.py: Click command `SeekDBHNSW` for command-line benchmarks Registration: - Add SeekDB to the DB enum in backend/clients/__init__.py with lazy imports for init_cls, config_cls, and case_config_cls - Register SeekDBHNSW CLI command in cli/vectordbbench.py - Add seekdb optional dependency in pyproject.toml (pip install vectordb-bench[seekdb]) Filter support: - NonFilter and NumGE (id >= N) filters are supported - StrEqual (label filter) is intentionally excluded since the table schema only has id and embedding columns Thread safety: - mysql.connector is not thread-safe (thread_safe = False). ConcurrentInsertRunner uses max_workers=1 accordingly - rate_runner.py handles SeekDB specially: copies the db object, resets the connection, and calls init() per worker thread Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hi, @XuanYang-cn I have fixed the two issues. please review it again. Thanks very much. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liuhao6741, XuanYang-cn The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
Introduce SeekDB as a first-class benchmark target: a MySQL-protocol vector database (OceanBase-style SQL and session variables). The integration supports standard performance workflows and StreamingPerformanceCase-style fixed-rate inserts by combining an upfront HNSW vector index with per-thread connection handling in the rate-based insert runner.
On each benchmark session, SeekDB.init() now applies OceanBase-style tenant system parameters so the engine does not cap workload memory or CPU for the session scope of the connection (ALTER SYSTEM SET memory_limit = "0M" and cpu_count = 0), matching operator guidance for unconstrained resource tests. memory_limit uses the engine-required string size form (e.g. "0M"), not a bare integer.
After bulk load, optimize() runs SELECT VERSION(), parses the embedded seekdb-vX.Y.Z… token (e.g. "5.7.25-OceanBase seekdb-v1.3.0.0"), and when the parsed version is >= 1.3.0 executes CALL dbms_index_manager.refresh() to align index metadata; older or unrecognized version strings skip the call.
New files (vectordb_bench/backend/clients/seekdb/) --------------------------------------------------
seekdb.py — SeekDB VectorDB implementation
config.py — SeekDBConfig (DBConfig), SeekDBHNSWConfig (DBCaseConfig)
cli.py — Click command SeekDBHNSW (registered as seekdbhnsw)
Plumbing
vectordb_bench/backend/clients/init.py
vectordb_bench/cli/vectordbbench.py
Rate runner (StreamingPerformanceCase / fixed-rate inserts) -----------------------------------------------------------
Documentation for operators
Example invocation (Python 3.11+ recommended for this repo):
python -m vectordb_bench.cli.vectordbbench seekdbhnsw
--case-type StreamingPerformanceCase
--host --port 2881 --user root --password ''
--database vectordbbench
--m 16 --ef-construction 200 --ef-search 64
Ensure the target database exists and mysql-connector-python is installed. The SeekDB user must be allowed to execute ALTER SYSTEM if init-time tuning is required; otherwise connection setup may fail at init().
Notes / non-goals