Skip to content

[BUG CLIENT]: Process exits with "Too many open files" when many Mistral instances are created and finalized on Python 3.14 #509

@proxai-dev

Description

@proxai-dev

Python -VV

Python 3.14.2 (main, Dec  5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.6.3.2)]

Pip Freeze

aiohappyeyeballs==2.6.1
  aiohttp==3.13.5
  aiosignal==1.4.0
  annotated-doc==0.0.4
  annotated-types==0.7.0
  anthropic==0.75.0
  anyio==4.13.0
  attrs==26.1.0
  certifi==2026.4.22
  cffi==2.0.0
  charset-normalizer==3.4.7
  click==8.3.3
  cohere==6.1.0
  cryptography==47.0.0
  databricks-sdk==0.73.0
  distro==1.9.0
  docstring_parser==0.18.0
  eval_type_backport==0.3.1
  fastavro==1.12.2
  filelock==3.29.0
  frozenlist==1.8.0
  fsspec==2026.4.0
  google-auth==2.49.2
  google-genai==1.74.0
  googleapis-common-protos==1.74.0
  grpcio==1.80.0
  h11==0.16.0
  hf-xet==1.4.3
  httpcore==1.0.9
  httpx==0.28.1
  huggingface_hub==1.13.0
  idna==3.13
  importlib_metadata==8.7.1
  iniconfig==2.3.0
  invoke==2.2.1
  jiter==0.14.0
  jsonpatch==1.33
  jsonpointer==3.1.1
  langchain-core==1.3.2
  langchain-openai==1.2.1
  langchain-protocol==0.0.14
  langsmith==0.7.38
  markdown-it-py==4.0.0
  mdurl==0.1.2
  mistralai==1.12.4
  multidict==6.7.1
  openai==2.33.0
  opentelemetry-api==1.41.1
  opentelemetry-exporter-otlp-proto-common==1.41.1
  opentelemetry-exporter-otlp-proto-http==1.41.1
  opentelemetry-proto==1.41.1
  opentelemetry-sdk==1.41.1
  opentelemetry-semantic-conventions==0.62b1
  orjson==3.11.8
  packaging==25.0
  platformdirs==4.9.6
  pluggy==1.6.0
  propcache==0.4.1
  protobuf==6.33.6
  proxai==0.3.1
  pyasn1==0.6.3
  pyasn1_modules==0.4.2
  pycparser==3.0
  pydantic==2.12.5
  pydantic_core==2.41.5
  Pygments==2.20.0
  pypdf==6.10.2
  pytest==8.4.2
  python-dateutil==2.9.0.post0
  PyYAML==6.0.3
  regex==2026.4.4
  requests==2.33.1
  requests-toolbelt==1.0.0
  rich==15.0.0
  shellingham==1.5.4
  six==1.17.0
  sniffio==1.3.1
  tenacity==9.1.4
  tiktoken==0.12.0
  tokenizers==0.23.1
  tqdm==4.67.3
  typer==0.25.0
  types-requests==2.31.0.6
  types-urllib3==1.26.25.14
  typing-inspection==0.4.2
  typing_extensions==4.15.0
  urllib3==1.26.20
  uuid_utils==0.14.1
  websockets==16.0
  xai-sdk==1.12.0
  xxhash==3.7.0
  yarl==1.23.0
  zstandard==0.25.0

Reproduction Steps

  1. Save this as leak_repro.py:

from mistralai import Mistral

Build many SDK instances. Each registers a weakref.finalize that

runs close_clients at process exit.

clients = [Mistral(api_key=f"sk-fake-{i}") for i in range(2000)]
del clients
2. Run with Python 3.14:

python3.14 leak_repro.py
3. Watch process exit. You'll see hundreds of repeated:

AttributeError: '_UnixSelectorEventLoop' object has no attribute '_ssock'

  1. followed by a flood of:

OSError: [Errno 24] Too many open files
File ".../mistralai/httpclient.py", line 122, in close_clients
asyncio.run(async_client.aclose())
File ".../socket.py", line 665, in socketpair
a, b = _socket.socketpair(family, type, proto)

(Lower the 2000 if your system has a smaller per-process fd limit. macOS default
kern.maxfilesperproc=12288 typically reproduces around 1500-2000 instances.)

No real API key is needed — just instantiating Mistral(api_key=...) is enough to register the leaky
finalizer.

Expected Behavior

Process exits cleanly, with no OSError errors, no socketpair leaks, and no AttributeError from
_UnixSelectorEventLoop.del. The process should release its file descriptors normally during
interpreter shutdown.

Additional Context

This was hit in the wild by an LLM consensus engine that creates a fresh Mistral SDK instance per
worker job. Across one ~150-batch run (~10 LLM calls/batch), thousands of Mistral instances accumulate;
their weakref finalizers all fire at process exit and the cascade kicks in. The actual experiment data
is fine — the errors only manifest during shutdown — but they spam the terminal and obscure other
diagnostics, and on smaller-limit systems they can prevent the process from exiting cleanly at all.

The bug is in src/mistralai/client/httpclient.py::close_clients, around line 122:

if async_client is not None and not async_client_supplied:
try:
loop = asyncio.get_running_loop()
asyncio.run_coroutine_threadsafe(async_client.aclose(), loop)
except RuntimeError:
try:
asyncio.run(async_client.aclose()) # ← the leaky path
except RuntimeError:
pass

At process exit there's no running loop, so it falls into asyncio.run(...). Each call:

  1. Creates a new event loop, which calls socket.socketpair() (2 fds + a selector fd).
  2. Runs aclose(), then closes the loop.
  3. Later, GC calls BaseEventLoop.del on the already-closed loop. On Python 3.14, del calls
    _close_self_pipe() which assumes the loop is still in init state and crashes on the missing _ssock
    attribute.
  4. That del crash leaks the socketpair fds.

With N pending finalizers at exit, the process leaks ~3·N fds in tight succession. After the
per-process fd ceiling is hit, every subsequent finalizer's socketpair() fails with EMFILE.

Confirmed unchanged on main (2.4.4) — src/mistralai/client/httpclient.py::close_clients is
byte-identical to the version in 1.12.4. No release fixes this so far.

Related: #490 / #504 fixed similar Python 3.14 asyncio compat in the test suite. Same class of issue
(asyncio API contracts changed in 3.14), different file.

Suggested Solutions

In rough order of cleanliness:

  1. Add a real close() method on Mistral. Lets users tear down deterministically (with Mistral(...) as
    client: or explicit client.close()) instead of relying on the weakref finalizer to spin up a fresh
    event loop. Most user-visible fix, removes the leak path entirely for users who close their clients.
  2. Reuse a single shared event loop across finalizers. When close_clients is called with no running
    loop, lazily create one loop the first time, run subsequent aclose() calls on it, and close it once at
    interpreter shutdown via atexit. Keeps the existing finalizer-driven path working, but removes the
    per-finalizer event-loop churn that's leaking fds.
  3. Drop the asyncio.run(...) fallback entirely when there's no running loop. The httpx AsyncClient
    underlying transport can be torn down synchronously (or skipped at process exit, since the OS reaps the
    connections anyway). This is the smallest diff but loses some cleanup work in the no-loop case.
  4. At minimum, suppress the symptom — wrap the inner asyncio.run(...) in a broader except
    (RuntimeError, OSError, AttributeError). The leak still happens, but the cascade doesn't print hundreds
    of errors during shutdown. Stopgap, not a real fix.

(1) is the cleanest user-facing answer; (2) is the smallest behavior change that removes the leak.
Happy to test a fix if one lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions