Skip to content

Validate UTF-16 surrogate pairs before combining#187

Open
jarvis24young wants to merge 1 commit into
postgresql-interfaces:mainfrom
jarvis24young:fix-surrogate-pair-boundary
Open

Validate UTF-16 surrogate pairs before combining#187
jarvis24young wants to merge 1 commit into
postgresql-interfaces:mainfrom
jarvis24young:fix-surrogate-pair-boundary

Conversation

@jarvis24young
Copy link
Copy Markdown
Contributor

SQLWCHAR-to-UTF-8 conversion currently treats any UTF-16 high surrogate as the start of a surrogate pair. It then advances to the next code unit and reads it unconditionally.

That can read past the caller-supplied length when a wide-character ODBC API receives a dangling high surrogate at the end of its input. The new regression test exercises this through the public SQLPrepareW() path with a guarded one-code-unit SQLWCHAR buffer, so the old implementation faults deterministically if it reads wstr[1].

Fix this by only taking the surrogate-pair path when:

  • the current code unit is a high surrogate,
  • there is another code unit within ilen, and
  • the next code unit is a low surrogate.

Otherwise the existing non-pair path is used, avoiding the out-of-bounds read.

Reproduction on the old implementation, using the same black-box test with ASan and a guarded buffer:

ERROR: AddressSanitizer: SEGV on unknown address
The signal is caused by a READ memory access.
#0 ucs2_to_utf8 win_unicode.c:191
#1 SQLPrepareW odbcapiw.c:439
#2 SQLPrepareW libodbc.so.2
#3 main test/src/surrogate-pair-test.c:109

Tested after the fix:

cd ~/psqlodbc-surrogate-oob-build/test
ODBCSYSINI=. ODBCINSTINI=./odbcinst.ini ODBCINI=./odbc.ini ./runsuite surrogate-pair --inputdir=.
TAP version 13
1..1
ok 1 - surrogate-pair

Also tested the target binary directly under ASan/UBSan with detect_leaks=0; it returns normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant