FIX: Release GIL around blocking ODBC statement, fetch, and transaction calls (#540)#541
FIX: Release GIL around blocking ODBC statement, fetch, and transaction calls (#540)#541saurabh500 merged 8 commits intomainfrom
Conversation
Extends PR #497 (which released the GIL during SQLDriverConnect/ SQLDisconnect) to all statement-level blocking ODBC calls so that other Python threads can run while a query is executing on the server. Wraps the following ODBC calls with py::gil_scoped_release in mssql_python/pybind/ddbc_bindings.cpp: - SQLExecDirect_wrap, SQLExecute_wrap (incl. DAE SQLParamData/ SQLPutData loop), SQLExecuteMany_wrap (incl. DAE loop) - SQLFetch_wrap, SQLFetchScroll_wrap, SQLMoreResults_wrap - FetchOne_wrap, FetchBatchData (SQLFetchScroll), FetchMany_wrap and FetchAll_wrap LOB fallback paths, FetchArrowBatch_wrap - FetchLobColumnData (SQLGetData) - All catalog wrappers: SQLTables, SQLColumns, SQLPrimaryKeys, SQLForeignKeys, SQLStatistics, SQLProcedures, SQLSpecialColumns, SQLGetTypeInfo Also wraps SQLEndTran in Connection::commit/rollback in mssql_python/pybind/connection/connection.cpp. Verified with an asyncio heartbeat test against WAITFOR DELAY '00:00:05': heartbeat now ticks every ~1s throughout the wait (previously stalled for the full 5s). 4 concurrent WAITFOR '00:00:02' queries complete in ~2s (4x speedup vs serial 8s). Note on #500: the multi-row INSERT slowness (~14x vs pyodbc) is CPU-bound parameter binding, not GIL contention; this fix does not change single-threaded throughput for that workload. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the existing “release the GIL around blocking ODBC calls” approach (previously applied to connect/disconnect) to statement execution, fetching, catalog metadata calls, and transaction commit/rollback so other Python threads can continue running while the ODBC driver blocks on network/server I/O.
Changes:
- Release the GIL around blocking statement execution APIs (
SQLExecDirect,SQLPrepare,SQLExecute) including Data-At-Exec (SQLParamData/SQLPutData) loops. - Release the GIL around fetch/result APIs (
SQLFetch,SQLFetchScroll,SQLMoreResults) including LOB streaming viaSQLGetData. - Release the GIL around transaction completion (
SQLEndTran) forcommit()/rollback().
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| mssql_python/pybind/ddbc_bindings.cpp | Adds py::gil_scoped_release scopes around blocking statement, fetch, LOB, and catalog ODBC calls. |
| mssql_python/pybind/connection/connection.cpp | Adds py::gil_scoped_release scopes around SQLEndTran in commit() and rollback(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…current_access_with_db After PR #540 released the GIL around blocking ODBC statement calls, sharing a single HDBC across reader threads in this test caused 'Connection is busy with results for another command' errors from the driver (no MARS). The package documents threadsafety=1 and cursor methods as not thread-safe, so the correct pattern is one connection per thread. Switched the temp table to a uniquely-named global temp table so all per-thread connections can see it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gs.cpp The issue reference adds noise to the inline comments; the rationale is already documented in PR #541 / commit history. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/pybind/ddbc_bindings.cppLines 1744-1759 1744 queryPtr = queryBuffer.data();
1745 #else
1746 queryPtr = const_cast<SQLWCHAR*>(Query.c_str());
1747 #endif
! 1748 SQLRETURN ret;
! 1749 {
1750 // Release the GIL during the blocking ODBC call so that other Python
1751 // threads (e.g. asyncio event loop, heartbeat threads) can run while
1752 // SQL Server executes the query. See issue #540.
! 1753 py::gil_scoped_release release;
! 1754 ret = SQLExecDirect_ptr(StatementHandle->get(), queryPtr, SQL_NTS);
! 1755 }
1756 if (!SQL_SUCCEEDED(ret)) {
1757 LOG("SQLExecDirect: Query execution failed - SQLRETURN=%d", ret);
1758 }
1759 return ret;Lines 1974-1982 1974 ThrowStdException("Unrecognized paramToken returned by SQLParamData");
1975 }
1976 const py::object& pyObj = matchedInfo->dataPtr;
1977 if (pyObj.is_none()) {
! 1978 putData(nullptr, 0);
1979 continue;
1980 }
1981 if (py::isinstance<py::str>(pyObj)) {
1982 if (matchedInfo->paramCType == SQL_C_WCHAR) {Lines 2035-2044 2035 size_t chunkBytes = DAE_CHUNK_SIZE;
2036 while (offset < totalBytes) {
2037 size_t len = std::min(chunkBytes, totalBytes - offset);
2038
! 2039 rc = putData((SQLPOINTER)(dataPtr + offset),
! 2040 static_cast<SQLLEN>(len));
2041 if (!SQL_SUCCEEDED(rc)) {
2042 LOG("SQLExecute: SQLPutData failed for "
2043 "SQL_C_CHAR chunk - offset=%zu",
2044 offset, totalBytes, len, rc);Lines 2853-2874 2853 return rc;
2854 }
2855 LOG("SQLExecuteMany: Parameters bound for row %zu", rowIndex);
2856
! 2857 {
2858 // Release the GIL during the blocking SQLExecute network call.
! 2859 py::gil_scoped_release release;
! 2860 rc = SQLExecute_ptr(hStmt);
! 2861 }
2862 LOG("SQLExecuteMany: SQLExecute for row %zu - initial_rc=%d", rowIndex, rc);
2863 size_t dae_chunk_count = 0;
2864 while (rc == SQL_NEED_DATA) {
2865 SQLPOINTER token;
! 2866 {
2867 // Release the GIL around the blocking SQLParamData call.
! 2868 py::gil_scoped_release release;
! 2869 rc = SQLParamData_ptr(hStmt, &token);
! 2870 }
2871 LOG("SQLExecuteMany: SQLParamData called - chunk=%zu, rc=%d, "
2872 "token=%p",
2873 dae_chunk_count, rc, token);
2874 if (!SQL_SUCCEEDED(rc) && rc != SQL_NEED_DATA) {Lines 2889-2900 2889 SQLLEN data_len = static_cast<SQLLEN>(data.size());
2890 LOG("SQLExecuteMany: Sending string DAE data - chunk=%zu, "
2891 "length=%lld",
2892 dae_chunk_count, static_cast<long long>(data_len));
! 2893 rc = [&] {
! 2894 py::gil_scoped_release release;
! 2895 return SQLPutData_ptr(hStmt, (SQLPOINTER)data.c_str(), data_len);
! 2896 }();
2897 if (!SQL_SUCCEEDED(rc) && rc != SQL_NEED_DATA) {
2898 LOG("SQLExecuteMany: SQLPutData(string) failed - "
2899 "chunk=%zu, rc=%d",
2900 dae_chunk_count, rc);Lines 2905-2916 2905 SQLLEN data_len = static_cast<SQLLEN>(data.size());
2906 LOG("SQLExecuteMany: Sending bytes/bytearray DAE data - "
2907 "chunk=%zu, length=%lld",
2908 dae_chunk_count, static_cast<long long>(data_len));
! 2909 rc = [&] {
! 2910 py::gil_scoped_release release;
! 2911 return SQLPutData_ptr(hStmt, (SQLPOINTER)data.c_str(), data_len);
! 2912 }();
2913 if (!SQL_SUCCEEDED(rc) && rc != SQL_NEED_DATA) {
2914 LOG("SQLExecuteMany: SQLPutData(bytes) failed - "
2915 "chunk=%zu, rc=%d",
2916 dae_chunk_count, rc);Lines 3045-3053 3045 DriverLoader::getInstance().loadDriver(); // Load the driver
3046 }
3047
3048 // Release the GIL during the blocking ODBC call
! 3049 py::gil_scoped_release release;
3050 return SQLFetch_ptr(StatementHandle->get());
3051 }
3052
3053 // Non-static so it can be called from inline functions in header📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.9%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.6%
mssql_python.pybind.connection.connection.cpp: 76.2%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.3%🔗 Quick Links
|
Adds tests/test_022_concurrent_query_gil_release.py with two heartbeat-based tests verifying the statement- and transaction-level GIL release added in PR #541: - test_query_does_not_block_other_python_threads: a Python heartbeat thread keeps ticking while another thread sits in cursor.execute("WAITFOR DELAY '00:00:02'"); without GIL release the heartbeat would be fully starved. - test_commit_does_not_block_other_python_threads: same property across an explicit commit(), exercising the SQLEndTran GIL release in Connection::commit/rollback. These are functional (non-stress) tests asserting a binary correctness property; throughput-style wall-clock parallelism assertions remain in test_021_concurrent_connection_perf.py under @pytest.mark.stress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The test_commit_does_not_block_other_python_threads test was flaking on macOS CI (got 19 ticks, expected >= 20). Two changes to address this: 1. Move connect() out of the heartbeat measurement window. The Python- side connect wrapper legitimately holds the GIL (connstr parsing, handle alloc, attr setup), and including it skewed the tick budget. This matches the test's stated intent of measuring ticks across cursor.execute(WAITFOR) + commit only. 2. Lower the threshold multiplier from 0.5 to 0.4 (20 -> 16 ticks). Gives margin against macOS CI scheduler jitter (sleep overshoot + GIL re-acquisition latency) while still catching real GIL starvation, which would yield <= ~2 ticks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gargsaumya
left a comment
There was a problem hiding this comment.
GIL release scopes are scoped tightly - Python object inspection and type checks stay under the GIL between individual ODBC calls, so the DAE loops are correct too.
This is nice, thanks!
bewithgaurav
left a comment
There was a problem hiding this comment.
lgtm! thanks for adding this :)
Work Item / Issue Reference
Summary
Extends PR #497 (which released the GIL during
SQLDriverConnect/SQLDisconnect) to all statement-level blocking ODBC calls so that other Python threads can run while a query is executing on the server.Problem
Reported in #540 (and seen earlier in #491): while a thread sat inside
cursor.execute(...), no other Python code in the process could run —asyncioheartbeats, FastAPI/Starlette request handlers, background workers, etc. all stalled for the full duration of the ODBC call. A 5-secondWAITFOR DELAY '00:00:05'froze every other thread for the full 5 s.Root cause: pybind11 holds the Python GIL by default for the entire duration of a C++ function call. Our wrappers around
SQLExecute,SQLFetch,SQLEndTran, etc. blocked on the network / ODBC driver while still holding the GIL, starving every other Python thread.PR #497 fixed this for connect/disconnect; this PR extends the same pattern to the statement, fetch, catalog, and transaction paths.
What changed
mssql_python/pybind/ddbc_bindings.cpp— wraps the following blocking ODBC calls withpy::gil_scoped_release:SQLExecDirect_wrap,SQLExecute_wrap(incl.SQLPrepareand the DAESQLParamData/SQLPutDataloop),SQLExecuteMany_wrap(incl. its DAE loop)SQLFetch_wrap,SQLFetchScroll_wrap,SQLMoreResults_wrap,FetchOne_wrap,FetchBatchData,FetchMany_wrapLOB fallback,FetchAll_wrapLOB fallback,FetchArrowBatch_wrap,FetchLobColumnData(SQLGetData)SQLTables,SQLColumns,SQLPrimaryKeys,SQLForeignKeys,SQLStatistics,SQLProcedures,SQLSpecialColumns,SQLGetTypeInfomssql_python/pybind/connection/connection.cpp— wraps the twoSQLEndTrancalls inConnection::commit()andConnection::rollback().Why it works (and is safe)
py::gil_scoped_releasereleases the GIL on construction and re-acquires it in its destructor. The release scope is placed only around the actual blocking ODBC call:py::isinstance,pyObj.cast<>,LOG()which internally re-acquires the GIL) happens outside the release scope, where the GIL is still held.std::vector<SQLWCHAR>,std::wstring,paramBuffers, encoded byte buffers) are not Python objects, so they remain valid across the release without GIL protection.SQLParamData/SQLPutDatacall rather than over the whole loop, so per-iteration Python work (extracting the next chunk, encoding, etc.) still runs under the GIL.checkError()/LOG()/ Python access, matching the established pattern from PR FIX: Release GIL during blocking ODBC connect/disconnect calls #497.This matches PEP 311 / pybind11's documented contract for releasing the GIL around long-running C/C++ work.
The fix does not relax
threadsafety = 1— sharing a single cursor or connection across threads while a call is in flight is still undefined, exactly as before.Verification
Repro from #540 (asyncio heartbeat +
WAITFOR DELAY '00:00:05'):Multi-thread scaling (4× concurrent
WAITFOR '00:00:02'):Test follow-up
tests/test_001_globals.py::test_lowercase_concurrent_access_with_dbwas previously sharing onedb_connectionacross 3 reader threads that each rancursor.execute(...)in a tight loop. Before this PR, the GIL implicitly serialized those calls; after this PR they truly run concurrently on the sameHDBC, which the driver rejects without MARS:Per the package's documented contract (
threadsafety = 1; cursor methods are not thread-safe), the correct pattern is one connection per thread. Updated the test accordingly: each reader thread opens its own connection, and the shared fixture creates a uniquely-named global temp table (##pytest_thread_test_<rand>) so all per-thread connections can see it. The test still exercises the original race (concurrentmssql_python.lowercasetoggling vs. cursor description generation) and now passes against the GIL-release changes in this PR.