wpool — Experimental Worker Pool for Go

⚠️ Experimental. This is a research-grade implementation exploring lock-free queue design and heuristic memory reclamation in Go. It is not production-ready. See Known Limitations before use.

wpool is a bounded-concurrency worker pool for Go built around a lock-free segmented FIFO queue and batch-based scheduling. The primary goal is to explore how far contention and allocation overhead can be pushed down on the submission path.

Empty-job baseline (scheduler + queue overhead only, 1 producer):

~93 ns/op   ·   ~10.7 M jobs/sec   ·   0 allocs/op   (w=1, p=1)

Measured on AMD Ryzen 7 8845HS · Go 1.22 · Linux. See Benchmarks for full worker/workload breakdown.

Why wpool?

wpool exists primarily as a research project.

It explores a few ideas:

Lock-free queue built from linked segments with per-slot CAS reservation — producers never block each other
Batch draining means workers wake once to process many jobs, not once per job — cuts atomic traffic and scheduler overhead
Zero allocations on the hot path — segments are pre-allocated and recycled using generation counters (ABA-safe)
Heuristic memory reclamation via a limbo queue — safe for typical configurations, with known edge cases under investigation (see Known Limitations)

The design deliberately trades simplicity and operational confidence for visibility into lower-level behavior. In practice, that means wpool is useful for exploring the costs of atomic coordination, cache movement, reclamation, and topology sensitivity.

It is not presented here as a universally better worker pool. One of the main lessons of this project is that the value of a lock-free design depends heavily on the workload, the machine, and how much complexity a project can justify.

Installation

go get github.com/azargarov/wpool

Quick Start

package main

import (
    "context"
    "fmt"

    wp "github.com/azargarov/wpool"
)

func main() {
    pool := wp.NewPool(
        wp.NoopMetrics{},
        wp.WithWorkers(4),
        wp.WithSegmentSize(64),
        wp.WithSegmentCount(64),
    )
    defer pool.Stop()

    _ = pool.Submit(wp.Job[int]{
        Payload: 42,
        Ctx:     context.Background(),
        Fn: func(n int) error {
            fmt.Println("processing", n)
            return nil
        },
    }, 0)
}

Architecture

Producers (N goroutines)
        │
        ▼
┌───────────────────────────────────┐
│         Segmented FIFO Queue      │
│  ┌────────┐  ┌────────┐           │
│  │ Seg 0  │→ │ Seg 1  │→  ...     │
│  │ bitmap │  │ bitmap │           │
│  │ CAS ix │  │ CAS ix │           │
│  └────────┘  └────────┘           │
│         ↑ recycled via pool       │
└──────────────────┬────────────────┘
                   │ batch drain
        ┌──────────▼──────────┐
        │   Batch Scheduler   │
        │  (timer + threshold)│
        └──────┬──────────────┘
               │
       ┌───────┴────────┐
       ▼                ▼
   Worker 0  ...   Worker N-1

Each queue segment holds a fixed-size job buffer with a readiness bitmap. Producers reserve slots via CAS; consumers drain contiguous ready ranges as a single batch — keeping cache lines warm and synchronization amortized.

Benchmarks

All results: AMD Ryzen 7 8845HS · Go 1.22 · Linux · 1 producer goroutine (p=1).

Empty job (scheduler + queue overhead only)

Workers	ns/op	Throughput
w=1	93	~10.7 Mj/s
w=2	115	~8.7 Mj/s
w=4	146	~6.9 Mj/s
w=8	261	~3.8 Mj/s
w=16	855	~1.2 Mj/s

SHA-256 job (~4 µs CPU work)

Workers	ns/op	Throughput
w=1	4379	228 kj/s
w=2	2267	441 kj/s
w=4	1200	833 kj/s
w=8	683	1463 kj/s
w=16	557	1794 kj/s

CPU-bound job (~40 µs)

Workers	ns/op	Throughput
w=1	40939	24.4 kj/s
w=4	11064	90.4 kj/s
w=8	7395	135 kj/s
w=16	5927	169 kj/s

The pool adds near-zero overhead on the submission path. For CPU-bound or IO-bound jobs, throughput scales with worker count as expected.

Note: throughput decreases as workers increase for empty jobs. This is expected — with no real work, more workers means more scheduler/synchronization overhead without any benefit. The zero-allocation property holds across all scenarios.

Known Limitations

Memory reclamation in lock-free data structures is a hard problem. wpool currently uses a heuristic "limbo" approach: segments that may still be referenced are deferred and reclaimed lazily based on observed state rather than precise reference counting (e.g. hazard pointers or epoch-based reclamation).

This has a known failure mode:

With very small segments (e.g. SegmentSize=1) and a small segment pool (SegmentCount), the limbo queue can exhaust available segments under load, causing submission to stall or fail.

Recommended configuration: use SegmentSize ≥ 64 and size SegmentCount generously relative to your worker count. The defaults are chosen to avoid this in typical use.

Status: memory reclamation is under active investigation. The goal is to find a provably safe approach that fits Go's memory model without introducing significant overhead. Contributions and ideas welcome.

Features

Feature	Details
Lock-free queue	Segmented MPMC FIFO, CAS-based reservation
Batch scheduling	Amortized wakeups, better cache locality
Zero allocations	Segment pool with generation counters
Bounded concurrency	Fixed goroutine count, no unbounded spawning
Context-aware jobs	`context.Context` checked before execution
Panic-safe workers	Panics are isolated per worker
Graceful shutdown	Deadline-aware draining via `pool.Shutdown(ctx)`
Pluggable metrics	Interface injection keeps the hot path clean
CPU affinity	Optional Linux worker pinning

Job Model

type Job[T any] struct {
    Payload     T
    Fn          func(T) error
    Ctx         context.Context
    CleanupFunc func()         // always runs after execution
}

Jobs are generic — no interface boxing on the submission path.

Graceful Shutdown

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

if err := pool.Shutdown(ctx); err != nil {
    log.Println("shutdown timed out:", err)
}

Shutdown blocks new submissions, drains queued work, and waits for in-flight jobs to complete — or until the deadline expires.

Metrics

The metrics interface is injected at pool construction, keeping instrumentation off the critical path:

type MetricsPolicy interface {
    IncQueued()
    BatchDecQueued(n int)
}

Use wp.NoopMetrics{} for zero overhead, or plug in Prometheus counters, etc.

Configuration

pool := wp.NewPool(
    metrics,
    wp.WithWorkers(runtime.GOMAXPROCS(0)),
    wp.WithSegmentSize(4096),   // jobs per segment
    wp.WithSegmentCount(64),    // pre-allocated segments
)

Defaults are applied automatically via FillDefaults() for any unset options.

Roadmap

Safe memory reclamation — replace limbo heuristic with hazard pointers or epoch-based reclamation (active)
Bucket-based priority scheduler
Queue aging / rotation
Adaptive segment provisioning
NUMA-aware worker placement

When to use wpool

Suitable for:

Experimentation, benchmarking, and learning about lock-free queue design
Studying contention, cache behavior, batching, and topology effects
Internal tools where you control configuration and can tolerate edge cases
High-frequency, short-lived jobs where allocation overhead matters

Not suitable for:

Production systems requiring proven memory safety guarantees
Extreme segment configurations (SegmentSize=1) — see Known Limitations
Cases where a simpler channel-based worker pool is good enough
Workloads that do not justify the added implementation complexity
Dynamic worker scaling

In other words: wpool is most interesting when the workload and hardware make these trade-offs worth studying.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
docs		docs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
affinity.go		affinity.go
benchmark_new_test.go		benchmark_new_test.go
benchmark_test.go		benchmark_test.go
cpu_test.sh		cpu_test.sh
doc.go		doc.go
error_handlers.go		error_handlers.go
executor.go		executor.go
go.mod		go.mod
go.sum		go.sum
metrics.go		metrics.go
options.go		options.go
pool_correctness_test.go		pool_correctness_test.go
pool_fairness_bench_test.go		pool_fairness_bench_test.go
pool_latency_bench_test.go		pool_latency_bench_test.go
pool_test.go		pool_test.go
queue_interface.go		queue_interface.go
segment.go		segment.go
segment_limbo.go		segment_limbo.go
segment_pool.go		segment_pool.go
segment_pool_interface.go		segment_pool_interface.go
segment_stat_debug.go		segment_stat_debug.go
segment_stat_release.go		segment_stat_release.go
segment_state.go		segment_state.go
segmented_queue.go		segmented_queue.go
segmented_test.go		segmented_test.go
test_helpers_test.go		test_helpers_test.go
wpool.go		wpool.go
wpool_test.go		wpool_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wpool — Experimental Worker Pool for Go

Why wpool?

Installation

Quick Start

Architecture

Benchmarks

Empty job (scheduler + queue overhead only)

SHA-256 job (~4 µs CPU work)

CPU-bound job (~40 µs)

Known Limitations

Features

Job Model

Graceful Shutdown

Metrics

Configuration

Roadmap

When to use wpool

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wpool — Experimental Worker Pool for Go

Why wpool?

Installation

Quick Start

Architecture

Benchmarks

Empty job (scheduler + queue overhead only)

SHA-256 job (~4 µs CPU work)

CPU-bound job (~40 µs)

Known Limitations

Features

Job Model

Graceful Shutdown

Metrics

Configuration

Roadmap

When to use wpool

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages