Fix UB in ColumnStringBlock::AppendUnsafe when appending empty string_view#489
Fix UB in ColumnStringBlock::AppendUnsafe when appending empty string_view#489iliaal wants to merge 1 commit intoClickHouse:masterfrom
Conversation
memcpy's source pointer is declared nonnull regardless of the size
argument. When AppendUnsafe is called with an empty string_view that
was constructed from an empty std::string, str.data() can be null
(the standard allows it), so UBSan flags every empty-string append:
runtime error: null pointer passed as argument 2,
which is declared to never be null
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
clickhouse/columns/string.cpp:139:21
Guard the memcpy with a zero-size short-circuit. Behavior is
unchanged for non-empty inputs and the returned string_view still
points at pos with length 0 for empty inputs.
There was a problem hiding this comment.
Pull request overview
Fixes UBSan-reported undefined behavior in ColumnString::Block::AppendUnsafe when appending an empty std::string_view whose .data() may be nullptr, avoiding memcpy being called with a null source pointer even when the size is 0.
Changes:
- Add a guard to skip
memcpy/size update whenstr.size() == 0. - Document why the guard is needed (UBSan +
memcpynonnull attribute behavior).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const auto pos = &data_[size]; | ||
|
|
||
| memcpy(pos, str.data(), str.size()); | ||
| size += str.size(); | ||
| // memcpy's source pointer is declared nonnull regardless of the | ||
| // size argument, so an empty string_view backed by std::string() | ||
| // (where data() may be null) trips UBSan on every empty append. |
There was a problem hiding this comment.
const auto pos = &data_[size]; can be undefined behavior when size == capacity (e.g., appending an empty string when the block is full, or when capacity==0), because data_[size] conceptually dereferences one-past-the-end. Prefer computing the pointer via data_.get() + size (and consider applying the same change in the other Block helpers that use &data_[size]), so zero-length operations don’t rely on out-of-bounds operator[] evaluation.
| // size argument, so an empty string_view backed by std::string() | ||
| // (where data() may be null) trips UBSan on every empty append. |
There was a problem hiding this comment.
The comment suggests an empty std::string may have data() == nullptr, but standard std::string::data() is generally non-null even when empty. The nullptr case is definitely possible for an empty std::string_view (e.g., default-constructed std::string_view{} or a view built from a nullable C string), so it would be clearer/safer to describe the issue as “empty string_view can have null .data()” rather than tying it specifically to std::string().
| // size argument, so an empty string_view backed by std::string() | |
| // (where data() may be null) trips UBSan on every empty append. | |
| // size argument, so an empty string_view with a null data() pointer | |
| // trips UBSan on every empty append. |
Summary
ColumnStringBlock::AppendUnsafecallsmemcpy(pos, str.data(), str.size())without short-circuiting onstr.size() == 0. When the inputstring_viewwas built from an emptystd::string,str.data()can benullptr, and glibc declaresmemcpy's source pointer with__attribute__((nonnull))regardless of the size argument. UBSan flags every empty-string append:The bug is benign in practice because libc no-ops
memcpy(_, NULL, 0), but the false positive fires on any workload that round-trips empty values throughStringorLowCardinality(String)columns. That makes-fsanitize=undefinedbuilds noisy and hides real findings.Patch
One-line guard around the
memcpy. Behavior is unchanged for non-empty inputs, and the return value matches the previous shape: astring_viewpointing atposwith length 0 for empty inputs.Repro
I hit this in a downstream PHP extension that wraps the library (iliaal/php_clickhouse). Inserting an empty value into a
LowCardinality(String)column under-fsanitize=address,undefinedreproduces it on every read path. Happy to add a unit test inut/if you'd prefer it gated bymake testrather than relying on existing coverage.